Similarity Functions Benchmark in Python

Lets create a benchmark between 4 of the most common similarity formulas used in recommendations

  1. Euclidean Distance
  2. Pearson Coefficients
  3. Tanimoto
  4. Cosine

The data we are going to use is this simple dictionary of user ratings of movies (sample data is taken from "Programming Collective Intelligence" book by )



 # Load a dictionary of movie critics and their ratings of a small set of movies  
 critics = {  
   'Lisa Rose': {  
     'Lady in the Water': 2.5,  
     'Snakes on a Plane': 3.5,  
     'Just My Luck': 3.0,  
     'Superman Returns': 3.5,  
     'You, Me and Dupree': 2.5,  
     'The Night Listener': 3.0  
   },  
   'Gene Seymour': {  
     'Snakes on a Plane': 3.5,  
     'Lady in the Water': 3.0,  
     'Just My Luck': 1.5,  
     'Superman Returns': 5.0,  
     'You, Me and Dupree': 3.5,  
     'The Night Listener': 3.0  
   },  
   'Michael Philips': {  
     'Snakes on a Plane': 3.0,  
     'Lady in the Water': 2.5,  
     'Superman Returns': 3.5,  
     'The Night Listener': 4.5  
   },  
   'Claudia Puig': {  
     'Snakes on a Plane': 3.5,  
     'Just My Luck': 3.0,  
     'Superman Returns': 4.0,  
     'You, Me and Dupree': 2.5,  
     'The Night Listener': 4.5  
   },  
   'Mick LaSalle': {  
     'Snakes on a Plane': 4.0,  
     'Lady in the Water': 3.0,  
     'Just My Luck': 1.5,  
     'Superman Returns': 3.0,  
     'You, Me and Dupree': 2.0,  
     'The Night Listener': 3.0  
   },  
   'Jack Matthews': {  
     'Snakes on a Plane': 4.0,  
     'Lady in the Water': 3.0,  
     'Superman Returns': 5.0,  
     'You, Me and Dupree': 3.5,  
     'The Night Listener': 3.0  
   },  
   'Toby': {  
     'Snakes on a Plane': 4.5,  
     'Superman Returns': 4.0,  
     'You, Me and Dupree': 1.0  
   }  
 }  
OUTPUT BENCHMARK
Lisa Rose:[2.5, 3.5, 3.5, 3.0]
Michael Philips:[2.5, 3.0, 3.5, 4.5]
Euclidean Distance Score between Lisa Rose and Michael Philips is 0.387425886723
Pearson Correlation Coefficient Score between Lisa Rose and Michael Philips is 0.254823595719
Tanimoto Correlation Coefficient Score between Lisa Rose and Michael Philips is 0.666666666667
Cosine Similarity Score between Lisa Rose and Michael Philips is 0.975




Σχόλια

Δημοφιλείς αναρτήσεις