Published August 25, 2021 | Version v1
Dataset Open

OHSUMED

Creators

Description

OHSUMED collection contains medical documents collected in 1991 related to 23 cardiovascular disease categories. The version we used has 18,302 documents, distributed very irregularly among the categories varying from 56 to 2876 documents per category

The files:
texts.txt: Document set (text). One per line.
score.txt: Document class whose index is associated with texts.txt
split_<k>.pkl:  pandas DataFrame with k-cross validation partition.

Files

ohsumed.zip

Files (177.3 MB)

Name Size Download all
md5:2d839d39886c4f8064ea4ad9bdf0d743
153.0 MB Preview Download
md5:b2a9c4958e140aa0a226415ad42cfbe7
47.7 kB Preview Download
md5:461fe78886904dde1f50f8fedaa820d8
547.9 kB Download
md5:f886f6ea270ad2544f5dc0fa336c9ca5
274.4 kB Download
md5:8090204908da6930f96a13959ea12346
23.5 MB Preview Download