Saturday, April 25, 2009

Number Eight Project

In working with this exercise I found many, many things along may paths. In redoing it, I had to rebuild the python libraries on my decrepit computer.
The first run of the program revealed to me this:

>>> import newsfeatures
>>> allw,artw,artt= newsfeatures.getarticlewords()
>>> wordmatrix,wordvec= newsfeatures.makematrix(allw,artw)
>>> wordvec[0:10]
['crisis', 'from', 'army', 'after', 'president', 'baghdad', 'they', 'saturday', 'health', 'friday']
>>> artt[1]
u'Rumsfeld: Architect of torture'
>>> wordmatrix[1][0:10]
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0]

Then I found my way through to get this

>>> def wordmatrixfeatures(x):
... return [wordvec[w] for w in range(len(x)) if x[w]>0]
File "", line 2
return [wordvec[w] for w in range(len(x)) if x[w]>0]
^
IndentationError: expected an indented block
>>> def wordmatrixfeatures(x):
... return [wordvec[w] for w in range(len(x)) if x[w]>0]
...
>>> wordmatrixfeatures(wordmatrix[0])
['after', 'they', 'said']
>>> import docclass
>>> classifier=docclass.naivebayes(wordmatrixfeatures)
>>> classifier.setdb('newstest.db')
>>> artt[0]
u'Torture planning began in 2001, Senate report reveals'
>>> # Train this as an 'iraq' story
...
>>> classifier.train(wordmatrix[0],'iraq')
>>> artt[1]
u'Rumsfeld: Architect of torture'
>>> #Train this as an 'india' story
...
>>> classifier.train(wordmatrix[0],'india')
>>> artt[2]
u'Obama feels the love at the CIA'
>>> #How is this story classified
...
>>> classifier.classify(wordmatrix[1])
u'iraq'


Then I got this

>>> import clusters
>>> clust=clusters.hcluster(wordmatrix)
>>> clusters.drawdendogram(clust,artt,jpeg='news.jpeg')
Traceback (most recent call last):
File "", line 1, in
AttributeError: 'module' object has no attribute 'drawdendogram'
>>> clusters.drawdendrogram(clust,artt,jpeg='news.jpeg')
>>> from numpy import *
>>> l1=[[1,2,3],[4,5,6]]
>>> l1
[[1, 2, 3], [4, 5, 6]]
>>> m1=matrix[l1]
Traceback (most recent call last):
File "", line 1, in
TypeError: 'type' object is unsubscriptable
>>> m1=matrix(l1)
>>> m1
matrix([[1, 2, 3],
[4, 5, 6]])
>>> m2=matrix([[1,2],[3,4],[5,6]])
>>> m2
matrix([[1, 2],
[3, 4],
[5, 6]])
>>> m1*m2
matrix([[22, 28],
[49, 64]])
>>> shape(m1)
(2, 3)
>>> shape(m2)
(3, 2)
>>> a1=m1.A
>>> a1
array([[1, 2, 3],
[4, 5, 6]])
>>> a2=array([[1,2,3],[1,2,3]])
>>> a1*a2
array([[ 1, 4, 9],
[ 4, 10, 18]])





The Articles Output looks like this
Torture planning began in 2001, Senate report reveals
2.89629172203 ['president', 'after', 'will', 'saturday', 'from', 'said']
2.54603874309 ['they', 'that', 'have', 'could', 'swine', 'said']
1.8468488388 ['they', 'obama', 'have', 'that', 'will', 'diagnose']

Rumsfeld: Architect of torture
0.259813315121 ['saturday', 'could', 'army', 'from', 'obama', 'their']
0.252440004108 ['have', 'from', 'after', 'saturday', 'president', 'dead']
0.19914349176 ['that', 'saturday', 'clinton', 'obama', 'health', 'from']

Obama feels the love at the CIA
3.00069691675 ['they', 'obama', 'have', 'that', 'will', 'diagnose']
2.58098325735 ['have', 'president', 'army', 'obama', 'that', 'they']
2.18238750145 ['saturday', 'they', 'were', 'dead', 'said', 'army']

Will gay marriage still work as a Republican wedge issue?
2.02498979116 ['have', 'that', 'after', 'clinton', 'killed', 'will']
1.13986676577 ['that', 'obama', 'they', 'dead', 'from', 'saturday']
0.979486170438 ['that', 'saturday', 'clinton', 'obama', 'health', 'from']

The prophets of doom
1.34297531817 ['have', 'president', 'army', 'obama', 'that', 'they']
1.00370386661 ['army', 'saturday', 'president', 'disorder', 'have', 'about']
0.958825672077 ['have', 'that', 'could', 'with', 'were', 'into']

"On 9/11, I think they hit the wrong building"
0.857557318577 ['president', 'after', 'will', 'saturday', 'from', 'said']
0.843839354479 ['president', 'from', 'have', 'saturday', 'after', 'friday']
0.79646265001 ['that', 'saturday', 'clinton', 'obama', 'health', 'from']

Gay marriage in the Heartland
0.408968950916 ['from', 'obama', 'said', 'could', 'ptsd', 'after']
0.380999489908 ['president', 'from', 'have', 'saturday', 'after', 'friday']
0.371820943158 ['that', 'saturday', 'clinton', 'obama', 'health', 'from']

Obama gives a speech and ... Look! A puppy!
0.996906528813 ['obama', 'president', 'have', 'they', 'after', 'saturday']
0.830442513473 ['president', 'dead', 'obama', 'army', 'disorder', 'crisis']
0.673295439051 ['army', 'saturday', 'president', 'disorder', 'have', 'about']

"But think of the things that were done to Iranians!"
1.06643693648 ['army', 'saturday', 'president', 'disorder', 'have', 'about']
1.02979443527 ['president', 'army', 'they', 'could', 'that', 'said']
0.920321480994 ['that', 'saturday', 'clinton', 'obama', 'health', 'from']

Will Israel attack Iran?
2.74036508398 ['from', 'obama', 'said', 'could', 'ptsd', 'after']
2.22516916854 ['saturday', 'could', 'army', 'from', 'obama', 'their']
1.38606900563 ['obama', 'into', 'could', 'saturday', 'after', 'baghdad']

Spare change for news
0.857690974678 ['have', 'that', 'after', 'clinton', 'killed', 'will']
0.83044688537 ['have', 'president', 'army', 'obama', 'that', 'they']
0.539108734458 ['have', 'from', 'after', 'saturday', 'president', 'dead']

"I believe that I did have PTSD"
2.27816442165 ['have', 'that', 'could', 'with', 'were', 'into']
2.13044386154 ['after', 'army', 'from', 'that', 'dead', 'have']
1.92555906988 ['army', 'saturday', 'president', 'disorder', 'have', 'about']

While you were out
1.47406844301 ['president', 'after', 'will', 'saturday', 'from', 'said']
1.43700455911 ['from', 'obama', 'said', 'could', 'ptsd', 'after']
1.15221753414 ['have', 'that', 'after', 'clinton', 'killed', 'will']

What motive does the Army have to misdiagnose PTSD?
2.66830191095 ['army', 'saturday', 'president', 'disorder', 'have', 'about']
2.20466762414 ['have', 'that', 'could', 'with', 'were', 'into']
1.97764037083 ['have', 'president', 'army', 'obama', 'that', 'they']

Tale of the secret Army tape
1.76706962388 ['saturday', 'could', 'army', 'from', 'obama', 'their']
1.4338870048 ['have', 'president', 'army', 'obama', 'that', 'they']
1.4031667344 ['president', 'army', 'they', 'could', 'that', 'said']

The leader of the Pakistani Taliban vows to strike America
0.653117286917 ['president', 'from', 'have', 'saturday', 'after', 'friday']
0.621747350405 ['from', 'obama', 'said', 'could', 'ptsd', 'after']
0.558910114288 ['after', 'army', 'from', 'that', 'dead', 'have']

"I am under a lot of pressure to not diagnose PTSD"
2.02145838231 ['army', 'saturday', 'president', 'disorder', 'have', 'about']
1.41391360291 ['after', 'army', 'from', 'that', 'dead', 'have']
1.40967544236 ['they', 'obama', 'have', 'that', 'will', 'diagnose']

Obama's G-20 confession: "I take responsibility"
2.1595940292 ['they', 'obama', 'have', 'that', 'will', 'diagnose']
2.1027497848 ['obama', 'president', 'have', 'they', 'after', 'saturday']
1.82173856536 ['have', 'president', 'obama', 'will', 'they', 'swine']

A global epidemic of violent crime
0.658709030278 ['obama', 'into', 'could', 'saturday', 'after', 'baghdad']
0.479698706847 ['army', 'saturday', 'president', 'disorder', 'have', 'about']
0.475228480034 ['president', 'after', 'will', 'saturday', 'from', 'said']

A message war the Republicans won
1.94935330037 ['have', 'that', 'after', 'clinton', 'killed', 'will']
1.9255064047 ['that', 'obama', 'they', 'dead', 'from', 'saturday']
1.7881085133 ['have', 'president', 'army', 'obama', 'that', 'they']

Joseph Stiglitz: "It's going to be bad, very bad"
1.21255146115 ['obama', 'into', 'could', 'saturday', 'after', 'baghdad']
1.03261496002 ['president', 'dead', 'obama', 'army', 'disorder', 'crisis']
0.955560942691 ['they', 'obama', 'have', 'that', 'will', 'diagnose']

Gated communities of learning
0.373462540004 ['after', 'army', 'from', 'that', 'dead', 'have']
0.328667469302 ['obama', 'into', 'could', 'saturday', 'after', 'baghdad']
0.292511673097 ['have', 'that', 'could', 'with', 'were', 'into']

Would you buy a used car industry from this man?
1.88763619545 ['president', 'after', 'will', 'saturday', 'from', 'said']
1.41336268932 ['have', 'president', 'obama', 'will', 'they', 'swine']
1.29075480756 ['president', 'army', 'they', 'could', 'that', 'said']

The secret war against American workers
1.39834880291 ['they', 'obama', 'have', 'that', 'will', 'diagnose']
1.21281501467 ['have', 'president', 'army', 'obama', 'that', 'they']
1.12564497429 ['have', 'that', 'could', 'with', 'were', 'into']

Obama goes back to the grass roots
0.752800275116 ['president', 'dead', 'obama', 'army', 'disorder', 'crisis']
0.715609580515 ['that', 'obama', 'they', 'dead', 'from', 'saturday']
0.680732504079 ['that', 'saturday', 'clinton', 'obama', 'health', 'from']

Emergency powers in Mexico to fight flu virus
1.7264204709 ['they', 'that', 'have', 'could', 'swine', 'said']
1.54867427266 ['that', 'saturday', 'clinton', 'obama', 'health', 'from']
1.54616704255 ['president', 'from', 'have', 'saturday', 'after', 'friday']

N. Korea 'reprocessing nuke fuel rods'
0.790633579 ['saturday', 'could', 'army', 'from', 'obama', 'their']
0.742727961256 ['president', 'after', 'will', 'saturday', 'from', 'said']
0.601885662715 ['that', 'saturday', 'clinton', 'obama', 'health', 'from']

ANC scores landslide win in South Africa
0.57452631672 ['president', 'army', 'they', 'could', 'that', 'said']
0.535829759831 ['president', 'dead', 'obama', 'army', 'disorder', 'crisis']
0.386901916456 ['have', 'president', 'obama', 'will', 'they', 'swine']

Clinton in Baghdad: U.S. committed to Iraq
2.19893788491 ['have', 'that', 'after', 'clinton', 'killed', 'will']
1.98993668877 ['obama', 'into', 'could', 'saturday', 'after', 'baghdad']
1.97743562888 ['have', 'from', 'after', 'saturday', 'president', 'dead']

Professor sought after 3 shot dead in U.S.
1.46428495573 ['president', 'from', 'have', 'saturday', 'after', 'friday']
1.31805642631 ['president', 'dead', 'obama', 'army', 'disorder', 'crisis']
1.09716934863 ['that', 'obama', 'they', 'dead', 'from', 'saturday']

Jailed journalist on hunger strike in Iran
0.68342677984 ['saturday', 'could', 'army', 'from', 'obama', 'their']
0.380521779594 ['have', 'from', 'after', 'saturday', 'president', 'dead']
0.370494192961 ['army', 'saturday', 'president', 'disorder', 'have', 'about']

Iceland votes in crisis elections
0.853691959209 ['president', 'dead', 'obama', 'army', 'disorder', 'crisis']
0.814792491531 ['saturday', 'they', 'were', 'dead', 'said', 'army']
0.665176867 ['army', 'saturday', 'president', 'disorder', 'have', 'about']

6 dead in Somali parliament mortar attack
1.19688869102 ['that', 'saturday', 'clinton', 'obama', 'health', 'from']
1.03330410682 ['obama', 'into', 'could', 'saturday', 'after', 'baghdad']
1.01348844344 ['army', 'saturday', 'president', 'disorder', 'have', 'about']

Actress Bea Arthur dead at 86
0.962799410095 ['saturday', 'they', 'were', 'dead', 'said', 'army']
0.71553576519 ['army', 'saturday', 'president', 'disorder', 'have', 'about']
0.705282096724 ['president', 'after', 'will', 'saturday', 'from', 'said']

Man Utd in stunning comeback
0.602516241654 ['from', 'obama', 'said', 'could', 'ptsd', 'after']
0.361154292344 ['saturday', 'could', 'army', 'from', 'obama', 'their']
0.337729719126 ['that', 'obama', 'they', 'dead', 'from', 'saturday']

Killer swine flu 'could be global pandemic'
2.89298520351 ['they', 'that', 'have', 'could', 'swine', 'said']
2.50987936958 ['have', 'that', 'after', 'clinton', 'killed', 'will']
2.30298306319 ['saturday', 'could', 'army', 'from', 'obama', 'their']

Clinton in Baghdad amid new reports of attacks
1.56153177082 ['that', 'saturday', 'clinton', 'obama', 'health', 'from']
1.4095684496 ['have', 'that', 'after', 'clinton', 'killed', 'will']
1.39805816337 ['have', 'president', 'obama', 'will', 'they', 'swine']

5 dead in attack targeting Afghan governor
2.0927632611 ['saturday', 'they', 'were', 'dead', 'said', 'army']
1.39123443056 ['president', 'dead', 'obama', 'army', 'disorder', 'crisis']
1.31263976594 ['they', 'that', 'have', 'could', 'swine', 'said']

Probe calls for Bangladesh troop deaths
0.44135557955 ['obama', 'president', 'have', 'they', 'after', 'saturday']
0.419138140003 ['from', 'obama', 'said', 'could', 'ptsd', 'after']
0.365893240532 ['after', 'army', 'from', 'that', 'dead', 'have']

Chavez: U.S. still an imperialist empire
2.51232489594 ['president', 'army', 'they', 'could', 'that', 'said']
2.37639356465 ['president', 'dead', 'obama', 'army', 'disorder', 'crisis']
2.17108091494 ['president', 'from', 'have', 'saturday', 'after', 'friday']

U.N. panel freezes assets of N. Korean firms
1.28555507123 ['have', 'that', 'after', 'clinton', 'killed', 'will']
1.20457450264 ['have', 'from', 'after', 'saturday', 'president', 'dead']
1.00695501678 ['president', 'from', 'have', 'saturday', 'after', 'friday']

Gridiron rivals to settle 1993 score
0.974111511407 ['have', 'that', 'after', 'clinton', 'killed', 'will']
0.912362945037 ['that', 'obama', 'they', 'dead', 'from', 'saturday']
0.651874822584 ['have', 'that', 'could', 'with', 'were', 'into']

Texas teen gets swine flu; family quarantined
1.28239958832 ['have', 'that', 'could', 'with', 'were', 'into']
1.23990281402 ['that', 'saturday', 'clinton', 'obama', 'health', 'from']
1.19205981734 ['have', 'president', 'obama', 'will', 'they', 'swine']

Data show airplane bird strikes not rare
0.84007251373 ['after', 'army', 'from', 'that', 'dead', 'have']
0.770565974589 ['have', 'that', 'could', 'with', 'were', 'into']
0.517306588055 ['have', 'president', 'army', 'obama', 'that', 'they']

Mother of Craigslist victim 'devastated' by loss
1.39495259601 ['have', 'that', 'after', 'clinton', 'killed', 'will']
1.31991156275 ['president', 'after', 'will', 'saturday', 'from', 'said']
1.26199955582 ['have', 'that', 'could', 'with', 'were', 'into']

Veterans get apology over 'extremist' flap
1.46053467524 ['saturday', 'could', 'army', 'from', 'obama', 'their']
0.975168526335 ['president', 'army', 'they', 'could', 'that', 'said']
0.938887893405 ['have', 'that', 'could', 'with', 'were', 'into']

Anti-doping agency clears Armstrong for Tour
0.966439231655 ['president', 'from', 'have', 'saturday', 'after', 'friday']
0.942747869615 ['president', 'after', 'will', 'saturday', 'from', 'said']
0.598717581258 ['after', 'army', 'from', 'that', 'dead', 'have']

Trooper reels in line of Ferraris, Lamborghini
0.0 ['obama', 'into', 'could', 'saturday', 'after', 'baghdad']
0.0 ['army', 'saturday', 'president', 'disorder', 'have', 'about']
0.0 ['have', 'from', 'after', 'saturday', 'president', 'dead']

Three factory workers 'shot dead,' mayor says
1.30439198922 ['after', 'army', 'from', 'that', 'dead', 'have']
1.23860082298 ['saturday', 'they', 'were', 'dead', 'said', 'army']
1.0989853721 ['have', 'that', 'could', 'with', 'were', 'into']

The last of the output from the second portion looks like this:

>>> m1
matrix([[1, 2, 3],
[4, 5, 6]])
>>> m2=matrix([[1,2],[3,4],[5,6]])
>>> m2
matrix([[1, 2],
[3, 4],
[5, 6]])
>>> m1*m2
matrix([[22, 28],
[49, 64]])
>>> shape(m1)
(2, 3)
>>> shape(m2)
(3, 2)
>>> a1=m1.A
>>> a1
array([[1, 2, 3],
[4, 5, 6]])
>>> a2=array([[1,2,3],[1,2,3]])
>>> a1*a2
array([[ 1, 4, 9],
[ 4, 10, 18]])
>>> import nmf
>>> w,h= nmf.factorize(m1*m2,pc=3,iter=100)
7622.76240347
7622.76240347
7622.76240347
7622.76240347
7622.76240347
7622.76240347
7622.76240347
7622.76240347
7622.76240347
7622.76240347
>>> w*h
matrix([[ 24.37089834, 25.59837689],
[ 41.82979429, 68.33633783]])
>>> m1*m2
matrix([[22, 28],
[49, 64]])
>>> v=matrix(wordmatrix)
>>> weights,feat=nmf.factorize(v,pc=20,iter=50)
41566.4618877
41566.4618877
41566.4618877
41566.4618877
41566.4618877
>>> reload(newsfeatures)

>>> topp,pn= newsfeatures.showfeatures(weights,feat,artt,wordvec)
>>> reload(newsfeatures)

>>> newsfeatures.showarticles(artt,topp,pn)

Here is my stock market file after I toyed with the ticker symbols

2.18864116109e+18
2.18864116109e+18
2.18864116109e+18
2.18864116109e+18
2.18864116109e+18
[[ 2.37666693e+07 1.34520061e+06 2.62779967e+06 3.11656611e+06
5.67418244e+05 7.01015109e+06 2.92443132e+05 2.52662385e+06
1.11638309e+07 1.01723901e+05 5.19575560e+01 2.24454074e+06
4.68321537e+05 1.39908384e+06 6.70165095e+06 3.67869950e+06
1.36283348e+06]
[ 7.32380312e+06 1.77742373e+06 1.37089810e+05 4.19907883e+06
1.84139062e+06 3.21503878e+06 1.58867082e+06 5.45477728e+05
2.13756184e+07 7.65879459e+04 9.79990523e+04 2.96635135e+06
2.61511881e+06 3.21994135e+06 8.07859461e+06 1.71176781e+07
4.80978463e+06]
[ 7.31448471e+06 1.68610150e+06 2.68834864e+05 4.44721706e+06
8.18098519e+05 7.16104424e+06 2.20732425e+06 1.02730214e+05
2.33750061e+07 1.46506828e+05 9.91507392e+04 2.93492311e+05
2.42360363e+06 2.13632897e+06 1.06336554e+06 1.01049313e+07
4.62805315e+06]
[ 1.16375868e+07 1.51554923e+06 3.09746162e+05 2.40701117e+04
1.39633096e+06 3.24063455e+06 9.18670942e+05 1.69931626e+06
1.09720095e+07 1.38900611e+05 6.57939575e+03 2.50346837e+06
4.94576783e+05 2.62410929e+06 6.73035286e+06 1.67573107e+07
5.55052027e+06]
[ 3.89129521e+04 1.67544745e+06 3.61149806e+06 5.78181352e+05
1.84666719e+06 8.78442435e+06 2.88067850e+05 1.02684638e+06
3.24116653e+07 3.71425867e+04 2.64776382e+04 1.57373825e+06
1.40295786e+06 1.35709632e+06 7.65444330e+06 1.88210697e+07
3.23896403e+06]]
[[ 0.21575133 0.41244662 0.7611537 0.64448275 0.26808093]
[ 0.12021876 0.82897867 0.50603432 0.66638657 0.27620903]
[ 0.81443072 0.61973661 0.0344404 0.59095715 0.68482145]
...,
[ 0.27682508 0.02819674 0.34829965 0.87178094 0.53360722]
[ 0.59688167 0.38735095 0.3671287 0.48936043 0.34759439]
[ 0.21552508 0.52542043 0.44772381 0.17131841 0.57059154]]
Feature 1
(23766669.29600687, 'YHOO')
(11163830.908536306, 'PFE')
(7010151.0916381348, 'CVX')
(6701650.9536335012, 'PG')
(3678699.4954650267, 'XOM')
(3116566.1078192662, 'BP')
(2627799.6681819353, 'BIIB')
(2526623.8459756342, 'D')
(2244540.7368489737, 'DNA')
(1399083.8425632557, 'GOOG')
(1362833.47919346, 'AMGN')
(1345200.6101376782, 'AVP')

[(2.4858386130635974, '2008-05-05'), (2.01164169279535, '2008-06-13'), (1.9833321155276236, '2008-06-12')]

Feature 1
(21375618.441576935, 'PFE')
(17117678.134262577, 'XOM')
(8078594.6123901652, 'PG')
(7323803.1170607973, 'YHOO')
(4809784.6293510608, 'AMGN')
(4199078.8320319587, 'BP')
(3219941.3530922248, 'GOOG')
(3215038.7793839704, 'CVX')
(2966351.3519640886, 'DNA')
(2615118.8114477443, 'EXPE')
(1841390.6153257531, 'CL')
(1777423.7312149594, 'AVP')

[(3.6571803519576398, '2009-01-23'), (2.065513906734485, '2008-10-09'), (1.7966713944385531, '2009-03-06')]

Feature 1
(23375006.123443626, 'PFE')
(10104931.329456437, 'XOM')
(7314484.7133820131, 'YHOO')
(7161044.2435260415, 'CVX')
(4628053.1515141223, 'AMGN')
(4447217.0644083731, 'BP')
(2423603.6282038381, 'EXPE')
(2207324.2490521944, 'LMT')
(2136328.9675750155, 'GOOG')
(1686101.4959016384, 'AVP')
(1063365.5357479909, 'PG')
(818098.51854017749, 'CL')

[(1.7003112043653801, '2009-02-26'), (1.6315850517851636, '2008-10-09'), (1.3317713851210469, '2009-03-19')]

Feature 1
(16757310.711177969, 'XOM')
(11637586.813257042, 'YHOO')
(10972009.485707108, 'PFE')
(6730352.8644809993, 'PG')
(5550520.2650987795, 'AMGN')
(3240634.5484751821, 'CVX')
(2624109.2945769709, 'GOOG')
(2503468.3707587714, 'DNA')
(1699316.2628020733, 'D')
(1515549.228245277, 'AVP')
(1396330.9604749633, 'CL')
(918670.94171835261, 'LMT')

[(2.4853251616509144, '2008-10-06'), (2.0616984621300793, '2008-05-05'), (2.0257449468565722, '2008-11-11')]

Feature 1
(32411665.250238225, 'PFE')
(18821069.746516082, 'XOM')
(8784424.3454179335, 'CVX')
(7654443.2956001945, 'PG')
(3611498.059203208, 'BIIB')
(3238964.0300923302, 'AMGN')
(1846667.188258782, 'CL')
(1675447.4463336298, 'AVP')
(1573738.2528519328, 'DNA')
(1402957.8566456696, 'EXPE')
(1357096.3244690241, 'GOOG')
(1026846.377866357, 'D')

[(1.9466486368672224, '2009-03-05'), (1.7736154551482939, '2009-03-09'), (1.668865393095329, '2008-11-20')]

No comments: