AG's News Topic Classification Dataset Version 3, Updated 09/09/2015 ORIGIN AG is a collection of more than 1 million news articles. News articles have been gathered from more than 2000 news sources by ComeToMyHead in more than 1 year of activity. ComeToMyHead is an academic news search engine which has been running since July, 2004. The dataset is provided by the academic comunity for research purposes in data mining (clustering, classification, etc), information retrieval (ranking, search, etc), xml, data compression, data streaming, and any other non-commercial activity. For more information, please refer to the link http://www.di.unipi.it/~gulli/AG_corpus_of_news_articles.html . The AG's news topic classification dataset is constructed by Xiang Zhang ([email protected]) from the dataset above. It is used as a text classification benchmark in the following paper: Xiang Zhang, Junbo Zhao, Yann LeCun. Character-level Convolutional Networks for Text Classification. Advances in Neural Information Processing Systems 28 (NIPS 2015). DESCRIPTION The AG's news topic classification dataset is constructed by choosing 4 largest classes from the original corpus. Each class contains 30,000 training samples and 1,900 testing samples. The total number of training samples is 120,000 and testing 7,600. The file classes.txt contains a list of classes corresponding to each label. The files train.csv and test.csv contain all the training samples as comma-sparated values. There are 3 columns in them, corresponding to class index (1 to 4), title and description. The title and description are escaped using double quotes ("), and any internal double quote is escaped by 2 double quotes (""). New lines are escaped by a backslash followed with an "n" character, that is "\n".



















- 1


- 粉丝: 0
我的内容管理 展开
我的资源 快来上传第一个资源
我的收益
登录查看自己的收益我的积分 登录查看自己的积分
我的C币 登录后查看C币余额
我的收藏
我的下载
下载帮助


最新资源
- 海康网络监控方案(可编辑修改word版).docx
- 物联网系统课程设计.doc
- 基于51单片机的超声波测距仪之倒车雷达作品设计毕业论文.doc
- 知之为知之不知为不知MicrosoftPowerPoint演示文稿.ppt
- 系统安全评价.pptx
- litemall-移动应用开发资源
- 基于sas软件以北大光华管理学院教学评估为例.pptx
- 中远集团电子商务发展战略.pptx
- 51单片机-单片机开发资源
- 企业信息化的规划与实施.doc
- 网络的安全教育主题班会国旗下讲话发言建议书.docx
- 广州市财政局计算机网络设备采购工程技术规范书.doc
- 如何撰写有吸引力的网络推广文案.docx
- 算法初步程序框图与算法的基本逻辑结构.pptx
- 物联网产业发展规划纲要.docx
- 微型计算机控制技术试卷.doc


