Similarity Functions Benchmark in Python
Lets create a benchmark between 4 of the most common similarity formulas used in recommendations
The data we are going to use is this simple dictionary of user ratings of movies (sample data is taken from "Programming Collective Intelligence" book by Toby Segaran)
Euclidean Distance Score between Lisa Rose and Michael Philips is 0.387425886723
Pearson Correlation Coefficient Score between Lisa Rose and Michael Philips is 0.254823595719
Tanimoto Correlation Coefficient Score between Lisa Rose and Michael Philips is 0.666666666667
Cosine Similarity Score between Lisa Rose and Michael Philips is 0.975
- Euclidean Distance
- Pearson Coefficients
- Tanimoto
- Cosine
The data we are going to use is this simple dictionary of user ratings of movies (sample data is taken from "Programming Collective Intelligence" book by Toby Segaran)
# Load a dictionary of movie critics and their ratings of a small set of movies
critics = {
'Lisa Rose': {
'Lady in the Water': 2.5,
'Snakes on a Plane': 3.5,
'Just My Luck': 3.0,
'Superman Returns': 3.5,
'You, Me and Dupree': 2.5,
'The Night Listener': 3.0
},
'Gene Seymour': {
'Snakes on a Plane': 3.5,
'Lady in the Water': 3.0,
'Just My Luck': 1.5,
'Superman Returns': 5.0,
'You, Me and Dupree': 3.5,
'The Night Listener': 3.0
},
'Michael Philips': {
'Snakes on a Plane': 3.0,
'Lady in the Water': 2.5,
'Superman Returns': 3.5,
'The Night Listener': 4.5
},
'Claudia Puig': {
'Snakes on a Plane': 3.5,
'Just My Luck': 3.0,
'Superman Returns': 4.0,
'You, Me and Dupree': 2.5,
'The Night Listener': 4.5
},
'Mick LaSalle': {
'Snakes on a Plane': 4.0,
'Lady in the Water': 3.0,
'Just My Luck': 1.5,
'Superman Returns': 3.0,
'You, Me and Dupree': 2.0,
'The Night Listener': 3.0
},
'Jack Matthews': {
'Snakes on a Plane': 4.0,
'Lady in the Water': 3.0,
'Superman Returns': 5.0,
'You, Me and Dupree': 3.5,
'The Night Listener': 3.0
},
'Toby': {
'Snakes on a Plane': 4.5,
'Superman Returns': 4.0,
'You, Me and Dupree': 1.0
}
}
OUTPUT BENCHMARK
Lisa Rose: | [2.5, 3.5, 3.5, 3.0] |
Michael Philips: | [2.5, 3.0, 3.5, 4.5] |
Pearson Correlation Coefficient Score between Lisa Rose and Michael Philips is 0.254823595719
Tanimoto Correlation Coefficient Score between Lisa Rose and Michael Philips is 0.666666666667
Cosine Similarity Score between Lisa Rose and Michael Philips is 0.975
Σχόλια