数据集

数据集

大数据

https://delicious.com/pskomoroch/dataset 

http://stackoverflow.com/questions/10843892/download-large-data-for-hadoop

http://konect.uni-koblenz.de/

搜狗实验室

http://www.sogou.com/labs/resources.html?v=1

气象数据集

https://www.ncdc.noaa.gov/data-access/quick-links

气候监测数据集

http://cdiac.ornl.gov/ftp/ndp026b

机器学习

亚马逊网络服务数据: http://aws.amazon.com/datasets 
航空公司数据(2009年ASA挑战): http://stat-computing.org/dataexpo/2009/the-data.html
澳大利亚天气: http://www.bom.gov.au/climate/dwo/ 
因果关系工作台: http://www.causality.inf.ethz.ch/repository.php
Kaggle竞争数据: https://www.kaggle.com/datasets
KDNuggets竞争网站: www.kdnuggets.com/datasets/ 
机器学习的数据集存储库: http://mldata.org/
医疗保险数据文件: http://go.cms.gov/19xxPN4 
微软研究院: http://research.microsoft.com/apps/dp/dl/downloads.aspx
百万歌曲数据集: http://blog.echonest.com/post/3639160982/million-song-dataset
更多的歌曲数据集: http://labrosa.ee.columbia.edu/millionsong/pages/additional-datasets
RDataMining.com R和数据挖掘电子书数据: http://www.rdatamining.com/data 
革命分析集合: http://www.revolutionanalytics.com/subscriptions/datasets/
社交网络: http://www.cs.cmu.edu//ancestry.com/ ~ jelsas /数据 
UCI机器学习库: http://archive.ics.uci.edu/ml/ 
535亿点击: http://cnets.indiana.edu/groups/nan/webtraffic/click-dataset

http://archive.ics.uci.edu/ml/
http://www.ics.uci.edu/~mlearn//MLRepository.htm
机器学习样本数据库
http://kdd.ics.uci.edu/
http://www.ics.uci.edu/~mlearn/MLRepository.html
关于基金的数据挖掘的网站
http://www.gotofund.com/index.asp
数据生成器的链接
http://www.cse.cuhk.edu.hk/~kdd/data_collection.html
癌症基因:
http://www.broad.mit.edu/cgi-bin/cancer/datasets.cgi
金融数据:
http://lisp.vse.cz/pkdd99/Challenge/chall.htm

网络

斯坦福大学大型网络数据收集: http://snap.stanford.edu/data/

微软匿名网络数据

MSNBC匿名网络数据

SyskillWebert Web数据

图像

1、ImageNet 
http://www.image-net.org/ 
包含1400万的图像。 
2、Tiny Images Dataset 
http://horatio.cs.nyu.edu/mit/tiny/data/index.html 
包含8000万的32x32图像。 
3、 MirFlickr1M 
http://press.liacs.nl/mirflickr/ 
Flickr中的100万的图像集。 
4、 CoPhIR 
http://cophir.isti.cnr.it/whatis.html 
Flickr中的1亿600万的图像 
5、SBU captioned photo dataset 
http://dsl1.cewit.stonybrook.edu/~vicente/sbucaptions/ 
Flickr中的100万的图像集。 
6、Large-Scale Image Annotation using Visual Synset(ICCV 2011) 
http://cpl.cc.gatech.edu/projects/VisualSynset/ 
包含2亿图像 
7、NUS-WIDE 
http://lms.comp.nus.edu.sg/research/NUS-WIDE.htm 
Flickr中的27万的图像集。 
8、SUN dataset 
http://people.csail.mit.edu/jxiao/SUN/ 
包含13万的图像 
9、MSRA-MM 
http://research.microsoft.com/en-us/projects/msrammdata/ 
包含100万的图像,23000视频 
10、TRECVID 
http://trecvid.nist.gov/ 

卡耐基-梅隆的脸图片

金星上的火山 
7.3G stackoverflow.com-Posts.7z 
573.1K stackoverflow.com-Tags.7z 
153.0M stackoverflow.com-Users.7z 
2.2G stackoverflow.com-Comments.7z 
2014/07/07 雅虎发布超大Flickr数据集 1亿的图片+视频 
http://yahoolabs.tumblr.com/post/89783581601/one-hundred-million-creative-commons-flickr-images-for 
100多个有趣的数据集 
http://www.csdn.net/article/2014-06-06/2820111-100-Interesting-Data-Sets-for-Statistics

   图像处理相关个人主页、研究组及公开数据集网址

http://blog.sciencenet.cn/blog-673472-759786.html

Public Domain Collections

Data360: http://www.data360.org/index.aspx
Datamob.org: http://datamob.org/datasets
Factual: http://www.factual.com/topics/browse
Freebase: http://www.freebase.com/
Google: http://www.google.com/publicdata/directory
infochimps: http://www.infochimps.com/
numbray: http://numbrary.com/
Quora: http://www.quora.com/Data/Where-can-I-find-large-datasets-open-to-the-pu...
RS Collection 100+ : http://rs.io/2014/05/29/list-of-data-sets.html
Sample R data sets: http://stat.ethz.ch/R-manual/R-patched/library/datasets/html/00Index.html(右) 
SourceForge研究数据: http://www.nd.edu/ oss /数据/研究司 
StatSci.org: http://www.statsci.org/datasets.html
UFO报告: http://www.nuforc.org/webreports.html
维基解密911寻呼机截取: http://911.wikileaks.org/files/index.html
Stats4Stem.org:R数据集: http://www.stats4stem.org/data-sets.html(右) 
《华盛顿邮报》名单: http://www.washingtonpost.com/wp-srv/metro/data/datapost.html

科学

农业实验: http://www.inside-r.org/packages/cran/agridat/docs/agridat(右) 
气候数据: http://www.cru.uea.ac.uk/cru/data/temperature/#datter
and ftp://ftp.cmdl.noaa.gov/
Gene Expression Omnibus: http://www.ncbi.nlm.nih.gov/geo/
Geo Spatial Data: http://geodacenter.asu.edu/datalist/
Human Microbiome Project: http://www.hmpdacc.org/reference_genomes/reference_genomes.php
MIT Cancer Genomics Data: http://www.broadinstitute.org/cgi-bin/cancer/datasets.cgi
NASA: http://nssdc.gsfc.nasa.gov/nssdc/obtaining_data.html
NIH Microarray data: ftp://ftp.ncbi.nih.gov/pub/geo/DATA/supplementary/series/GSE6532/(R)
Protein structure: http://www.infobiotic.net/PSPbenchmarks/
Public Gene Data: http://www.pubgene.org/
斯坦福大学的微阵列数据: http://smd.stanford.edu//

社会科学

综合社会调查: http://www3.norc.org/GSS +网站/ 
ICPSR: http://www.icpsr.umich.edu/icpsrweb/ICPSR/access/index.jsp
皮尤研究: http://www.pewinternet.org/datasets/pages/2/
提前: http://snap.stanford.edu/data/index.html
加州大学洛杉矶分校的社会科学档案: http://dataarchives.ss.ucla.edu/Home.DataPortals.htm
UPJOHN本月: http://www.upjohn.org/erdc/erdc.html

时间序列

时间序列数据库: http://robjhyndman.com/TSDL/

澳大利亚手语数据

高质量的澳大利亚手语数据

脑电图数据

日本的元音

Pioneer-1移动机器人数据

伪周期合成时间序列

机器人执行失败

合成控制图时间序列

http://www.stat.wisc.edu/~reinsel/bjr-data/

大学

卡内基梅隆大学安然电子邮件: http://www.cs.cmu.edu/~安然/ 
卡内基梅隆大学StatLab: http://lib.stat.cmu.edu/datasets/ 
龙骨存储库: http://sci2s.ugr.es/keel/datasets.php
卡内基梅隆大学JASA数据归档: http://lib.stat.cmu.edu/jasadata/ 
俄亥俄州立大学财务数据: http://fisher.osu.edu/fin/osudata.htm
加州大学伯克利分校: http://ucdata.berkeley.edu/ 
加州大学洛杉矶分校: http://wiki.stat.ucla.edu/socr/index.php/SOCR_Data
加州大学河滨分校时间序列: 方http://www.cs.ucr.edu/ / time_series_data / 
多伦多大学: http://www.cs.toronto.edu/深入/数据/datasets.html

UCI知识发现(KDD)归档 
信息和计算机科学 
加州大学欧文分校

互联网相关数据集

Dataset for "Statistics and SocialNetwork of YouTube Videos" 
http://netsg.cs.sfu.ca/youtubedata/ 
2、1998 World Cup Web Site Access Logs 
http://ita.ee.lbl.gov/html/contrib/WorldCup.html 
这个是1998年世界杯期间的数据集。从1998/04/26 到 1998/07/26 的92天中,发生了 1,352,804,107次请求。 
3、Page view statistics for Wikimedia projects 
http://dammit.lt/wikistats/ 
4、AOL Search Query Logs - RP 
http://www.researchpipeline.com/mediawiki/index.php?title=AOL_Search_Query_Logs 
5、livedoor gourmet 
http://blog.livedoor.jp/techblog/archives/65836960.html 

离散序列数据

多元数据

关系数据

时空数据

文本

20新闻组数据

路透社- 21578文本分类收集

路透转录子集

摘要1990- 2003年NSF研究奖项

http://www-2.cs.cmu.edu/afs/cs/project/theo-11/www/naive-bayes.html
http://www.w3.org/TR/WD-logfile-960221.html
http://www.w3.org/Daemon/User/Config/Logging.html#AccessLog
http://www.w3.org/1998/11/05/WC-workshop/Papers/bala2.html
http://www-2.cs.cmu.edu/afs/cs.cmu.edu/project/theo-11/www/wwkb/
http://www.web-caching.com/traces-logs.html
http://www-2.cs.cmu.edu/webkb
http://www.cs.auc.dk/research/DP/tdb/TimeCenter/TimeCenterPublications/TR-75.pdf
http://www.cs.cornell.edu/projects/kddcup/index.html

数据集推荐(网站、博客)
http://kdd.ics.uci.edu/summary.data.type.html
http://www.mlnet.org/cgi-bin/mlnetois.pl/?File=datasets.html
http://lib.stat.cmu.edu/datasets/
http://fimi.cs.helsinki.fi/data/

1、Public Data Sets onAmazon Web Services (AWS) 
http://aws.amazon.com/datasets 
Amazon从2008年开始就为开发者提供几十TB的开发数据。

2、Yahoo!Webscope 
http://webscope.sandbox.yahoo.com/index.php 
3、Konect is a collection of network datasets 
http://konect.uni-koblenz.de/ 
4、Stanford Large Network Dataset Collection 
http://snap.stanford.edu/data/index.html 

评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值