Hierarchical Clustering Algorithms Work!!

Today I was playing with Hierarchical Clustering Algorithms. Hierarchical Clustering Algorithms belongs to Unsupervised Learning algorithms. It creates clusters of similar objects.

So I tried to create clusters of the following blog sites:
http://feeds.feedburner.com/37signals/beMH
http://feeds.feedburner.com/blogspot/bRuz
http://battellemedia.com/index.xml
http://blog.guykawasaki.com/index.rdf
http://blog.outer-court.com/rss.xml
http://feeds.searchenginewatch.com/sewblog
http://blog.topix.net/index.rdf
http://blogs.abcnews.com/theblotter/index.rdf
http://feeds.feedburner.com/ConsumingExperienceFull
http://flagrantdisregard.com/index.php/feed/
http://featured.gigaom.com/feed/
http://gizmodo.com/index.xml
http://gofugyourself.typepad.com/go_fug_yourself/index.rdf
http://googleblog.blogspot.com/rss.xml
http://feeds.feedburner.com/GoogleOperatingSystem
http://headrush.typepad.com/creating_passionate_users/index.rdf
http://feeds.feedburner.com/hotair/main
http://feeds.feedburner.com/instapundit/main
http://jeremy.zawodny.com/blog/rss2.xml
http://joi.ito.com/index.rdf
http://journals.aol.com/thecoolerblog/AOLNewsCooler/rss.xml
http://feeds.feedburner.com/Mashable
http://michellemalkin.com/index.rdf
http://moblogsmoproblems.blogspot.com/rss.xml
http://newsbusters.org/node/feed
http://beta.blogger.com/feeds/27154654/posts/full?alt=rss
http://feeds.feedburner.com/paulstamatiou
http://powerlineblog.com/index.rdf
http://feeds.feedburner.com/Publishing20
http://radar.oreilly.com/index.rdf
http://scienceblogs.com/pharyngula/index.xml
http://scobleizer.wordpress.com/feed/
http://sethgodin.typepad.com/seths_blog/index.rdf
http://rss.slashdot.org/Slashdot/slashdot
http://thinkprogress.org/feed/
http://feeds.feedburner.com/andrewsullivan/rApM
http://wilwheaton.typepad.com/wwdnbackup/index.rdf
http://www.43folders.com/feed/
http://www.456bereastreet.com/feed.xml
http://www.autoblog.com/rss.xml
http://www.bloggersblog.com/rss.xml
http://www.bloglines.com/rss/about/news
http://www.blogmaverick.com/rss.xml
http://www.boingboing.net/index.rdf
http://www.buzzmachine.com/index.xml
http://www.captainsquartersblog.com/mt/index.rdf
http://www.coolhunting.com/index.rdf
http://feeds.copyblogger.com/Copyblogger
http://feeds.feedburner.com/crooksandliars/YaCP
http://feeds.dailykos.com/dailykos/index.xml
http://www.deadspin.com/index.xml
http://www.downloadsquad.com/rss.xml
http://www.engadget.com/rss.xml
http://www.gapingvoid.com/index.rdf
http://www.gawker.com/index.xml
http://www.gothamist.com/index.rdf
http://www.huffingtonpost.com/raw_feed_index.rdf
http://www.hyperorg.com/blogger/index.rdf
http://www.joelonsoftware.com/rss.xml
http://www.joystiq.com/rss.xml
http://www.kotaku.com/index.xml
http://feeds.kottke.org/main
http://www.lifehack.org/feed/
http://www.lifehacker.com/index.xml
http://littlegreenfootballs.com/weblog/lgf-rss.php
http://www.makezine.com/blog/index.xml
http://www.mattcutts.com/blog/feed/
http://xml.metafilter.com/rss.xml
http://www.mezzoblue.com/rss/index.xml
http://www.micropersuasion.com/index.rdf
http://www.neilgaiman.com/journal/feed/rss.xml
http://www.oilman.ca/feed/
http://www.perezhilton.com/index.xml
http://www.plasticbag.org/index.rdf
http://www.powazek.com/rss.xml
http://www.problogger.net/feed/
http://feeds.feedburner.com/QuickOnlineTips
http://www.readwriteweb.com/rss.xml
http://www.schneier.com/blog/index.rdf
http://scienceblogs.com/sample/combined.xml
http://www.seroundtable.com/index.rdf
http://www.shoemoney.com/feed/
http://www.sifry.com/alerts/index.rdf
http://www.simplebits.com/xml/rss.xml
http://feeds.feedburner.com/Spikedhumor
http://www.stevepavlina.com/blog/feed
http://www.talkingpointsmemo.com/index.xml
http://www.tbray.org/ongoing/ongoing.rss
http://feeds.feedburner.com/TechCrunch
http://www.techdirt.com/techdirt_rss.xml
http://www.techeblog.com/index.php/feed/
http://www.thesuperficial.com/index.xml
http://www.tmz.com/rss.xml
http://www.treehugger.com/index.rdf
http://www.tuaw.com/rss.xml
http://www.valleywag.com/index.xml
http://www.we-make-money-not-art.com/index.rdf
http://www.wired.com/rss/index.xml
http://www.wonkette.com/index.xml
http://feeds.feedburner.com/blogspot/eXYjt

Using Pythons PIL Library I have created the dendogram for the clusters created:


Another interesting thing is the Word Clustering Dendogram:


Lets zoom in a small sub cluster to get a better idea:
Trump and Elections
 Now if you want to see for yourself or play with the code, please visit my git hub account here: https://github.com/alexopoulos7/collective-intelligence/tree/master/CollectiveIntelligence/Unsupervised%20Learning/Word%20Vectors

Σχόλια

Δημοφιλείς αναρτήσεις