A large scale evaluation has been conducted by Google in 2006[2] to compare the performance of Minhash and Simhash[3] algorithms. In 2007 Google reported using Simhash for duplicate detection for web crawling[4] and using Minhash and LSH for Google News personalization.[5]
^Charikar, Moses S. (2002), "Similarity estimation techniques from rounding algorithms", Proceedings of the 34th Annual ACM Symposium on Theory of Computing, pp. 380–388, doi:10.1145/509907.509965, ISBN978-1581134957, S2CID4229473.
^
Das, Abhinandan S.; Datar, Mayur; Garg, Ashutosh; Rajaram, Shyam; et al. (2007), "Google news personalization: scalable online collaborative filtering", Proceedings of the 16th International Conference on World Wide Web, p. 271, doi:10.1145/1242572.1242610, ISBN9781595936547, S2CID207163129.