Skip to content

sungsu7437/Wikipedia

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Wikipedia

This project is analysis Wikipedia dump data by Hadoop and get which title the users most interested in.

Here is input.txt which format is 10GB https://drive.google.com/open?id=0B0k1asiqBohoTEd4RDlqLWRLNzg

and output.zip which foramt is <key : title + "\t" + year + month value : count> 4GB

https://drive.google.com/open?id=0B0k1asiqBohoaE1wb3pVZTFCRFE

Clone this repository

run hadoop WikiProject.java with 10GB dump data

move result file of the hadoop (4GB one) to /wordcloud

run wordcloud (ex : 'python ./wordcloud/simple.py 201103')

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published