Wikipedia

This project is analysis Wikipedia dump data by Hadoop and get which title the users most interested in.

and output.zip which foramt is <key : title + "\t" + year + month value : count> 4GB

Clone this repository

run hadoop WikiProject.java with 10GB dump data

move result file of the hadoop (4GB one) to /wordcloud

run wordcloud (ex : 'python ./wordcloud/simple.py 201103')

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
res		res
wordcloud		wordcloud
README.md		README.md
WikiProject.java		WikiProject.java
summary report.docx		summary report.docx

Provide feedback