在数据挖掘领域,Cohen et al. (2001)使用最小哈希技术作为关联规则学习的工具。给定一个数据库,其中每一项都有多个属性(可看作是每行为一个数据库项, 每列为一个属性的0-1矩阵),他们将最小哈希的近似度方法应用于Jaccard系数,用来辨别频繁共同出现的属性候选对,然后仅计算这些候选对的确切系数值,以确定哪些项目共同出现的频度低于一个给定的严格阈值。[8]
^Jaccard, Paul, étude comparative de la distribution florale dans une portion des Alpes et des Jura, Bulletin de la Société Vaudoise des Sciences Naturelles, 1901, 37: 547–579.
^Matou?ek, Ji?í; Stojakovi?, Milo?, On restricted min-wise independence of permutations, Random Structures and Algorithms, 2003, 23 (4): 397–408, doi:10.1002/rsa.10101.
^Chum, Ond?ej; Philbin, James; Isard, Michael; Zisserman, Andrew, Scalable near identical image and shot detection, Proceedings of the 6th ACM International Conference on Image and Cideo Retrieval (CIVR'07), 2007, doi:10.1145/1282280.1282359; Chum, Ond?ej; Philbin, James; Zisserman, Andrew, Near duplicate image detection: min-hash and tf-idf weighting, Proceedings of the British Machine Vision Conference(PDF)3: 4, 2008 [2012-05-18], (原始内容存档(PDF)于2021-02-24).
^Schleimer, Saul; Wilkerson, Daniel S.; Aiken, Alex, Winnowing: local algorithms for document fingerprinting, Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data (SIGMOD '03): 76–85, 2003, doi:10.1145/872757.872770.
^Cohen, E.; Datar, M.; Fujiwara, S.; Gionis, A.; Indyk, P.; Motwani, R.; Ullman, J. D.; Yang, C., Finding interesting associations without support pruning, IEEE Transactions on Knowledge and Data Engineering, 2001, 13 (1): 64–78, doi:10.1109/69.908981.