collections之DataFrame和Series
DataFrame:用于把json字符串转化成表格形式
frame如果是DataFrame类型,那么可以把他看成一个表
其中frame['列名']得到的就是一列数据,也称之为Series
使用series.value_counts()可以得到数据出现的频度
frame
Out[64]:
a al c \
0 Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKi... en-US,en;q=0.8 US
1 GoogleMaps/RochesterNY NaN US
2 Mozilla/4.0 (compatible; MSIE 8.0; Windows NT ... en-US US
3 Mozilla/5.0 (Macintosh; Intel Mac OS X 10_6_8)... pt-br BR
4 Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKi... en-US,en;q=0.8 US
cy g gr h hc hh l \
0 Danvers A6qOVH MA wfLQtf 1331822918 1.usa.gov orofrog
1 Provo mwszkS UT mwszkS 1308262393 j.mp bitly
2 Washington xxr3Qb DC xxr3Qb 1331919941 1.usa.gov bitly
3 Braz zCaLwp 27 zUtuOu 1331923068 1.usa.gov alelex88
4 Shrewsbury 9b6kNl MA 9b6kNl 1273672411 bit.ly bitly
ll nk \
0 [42.576698, -70.954903] 1
1 [40.218102, -111.613297] 0
2 [38.9007, -77.043098] 1
3 [-23.549999, -46.616699] 0
4 [42.286499, -71.714699] 0
r t \
0 http://www.facebook.com/l/7AQEFzjSi/1.usa.gov/... 1331923247
1 http://www.AwareMap.com/ 1331923249
2 http://t.co/03elZC4Q 1331923250
3 direct 1331923249
4 http://www.shrewsbury-ma.gov/selco/ 1331923251
tz u
0 America/New_York http://www.ncbi.nlm.nih.gov/pubmed/22415991
1 America/Denver http://www.monroecounty.gov/etc/911/rss.php
2 America/New_York http://boxer.senate.gov/en/press/releases/0316...
3 America/Sao_Paulo http://apod.nasa.gov/apod/ap120312.html
4 America/New_York http://www.shrewsbury-ma.gov/egov/gallery/1341...
In [65]: frame['tz']
Out[65]:
0 America/New_York
1 America/Denver
2 America/New_York
3 America/Sao_Paulo
4 America/New_York
Name: tz, dtype: object
In [66]: frame['tz'].value_counts()
Out[66]:
America/New_York 3
America/Sao_Paulo 1
America/Denver 1
Name: tz, dtype: int64
补上未知值的两个方法
clean_tz = frame['tz'].fillna("Missing")
clean_tz[clean_tz == ''] = "unknown"
clean_tz[clean_tz == ''] = "unknown"