Saturday, February 21, 2009

Data Mining Section 4


After loading the code in the book, played with the data sets that were recommended. What I had difficulty with was making something display once I had everything in place. I am running Linux and I am trying to decipher the README in an effort. Ok, here is the deal, the imaging library don't work on my machine.
After fiddling with some code, I found an image library that works on my box
and somewhere in the mix, I managed to find this output.

import clusters
>>> blognames,words,data=clusters.readfile('blogdata.txt')
>>> clust=clusters.hcluster(data)
>>> reload(clusters)

>>> clusters.printclust(clust,labels=blognames)
-
-
we make money not art
-
Giga Omni Media, Inc.
-
-
-
The Unofficial Apple Weblog (TUAW)
-
Download Squad
-
Autoblog
-
Joystiq
Engadget
-
TMZ.com
Celebrity gossip juicy celebrity rumors Hollywood gossip blog from Perez Hilton
-
Creating Passionate Users
-
gapingvoid: "cartoons drawn on the back of business cards"
-
SimpleBits
-
-
-
kottke.org
plasticbag.org
-
SpikedHumor - Today's Videos and Pictures
Oilman
-
-
blog maverick
-
Copyblogger
-
-
Captain's Quarters
-
Wonkette
-
Eschaton
-
Talking Points Memo
-
The Daily Dish | By Andrew Sullivan
-
Instapundit
Think Progress
-
Seth's Blog
-
-
-
ongoing
-
TreeHugger
-
Wired Top Stories
-
-
Topix.net Weblog
-
Micro Persuasion
-
Publishing 2.0
-
Scobleizer -- Tech geek blogger
-
-
BuzzMachine
Matt Cutts: Gadgets, Google, and SEO
-
Gawker: Valleywag
-
-
The Viral Garden
-
Mashable!
-
ReadWriteWeb
-
The Official Google Blog
-
John Battelle's Searchblog
-
Search Engine Watch Blog
-
Google Blogoscoped
-
Google Operating System
Search Engine Roundtable
-
Bloggers Blog: Blogging the Blogsphere
TechCrunch
-
-
Bloglines | News
-
-
A Consuming Experience (full feed)
-
-
Schneier on Security
-
Gizmodo
Lifehacker
-
Techdirt
-
-
Michelle Malkin
-
-
MAKE Magazine
-
ScienceBlogs : Combined Feed
Pharyngula
-
The Superficial - Because You're Ugly
Slashdot
-
-
MetaFilter
-
-
Crooks and Liars
-
-
Power Line
-
Little Green Footballs
-
Daily Kos
NewsBusters.org - Exposing Liberal Media Bias
-
Gothamist
-
Deadspin
Gawker
-
Derek Powazek
-
Neil Gaiman's Journal
-
Signal vs. Noise
-
43 Folders
Stepcase Lifehack
-
Kotaku
Boing Boing
-
Sifry's Alerts
Jeremy Zawodny's blog
-
456 Berea Street
-
mezzoblue
Joel on Software
-
PaulStamatiou.com
-
Shoemoney - Skills To Pay The Bills
ProBlogger Blog Tips
-
flagrantdisregard
WIL WHEATON dot NET: in exile
-
Joho the Blog
Joi Ito's Web
-
-
The Full Feed from HuffingtonPost.com
-
GoFugYourself
Cool Hunting
-
Steve Pavlina's Personal Development Blog
-
Quick Online Tips
-
Online Marketing Report
How to Change the World
>>>

I have since found the handles, and the rest of my output looks like this

Iteration 3
>>> [blognames[r] for r in kclust[0]]
['Joystiq', 'Download Squad', 'Engadget', 'The Unofficial Apple Weblog (TUAW)', 'Autoblog']

and

[blognames[r] for r in kclust[1]]
['we make money not art', "Neil Gaiman's Journal", 'Online Marketing Report', "Sifry's Alerts", 'Cool Hunting', 'The Full Feed from HuffingtonPost.com', "Joi Ito's Web", 'Bloglines | News']

No comments: