Skip to content

Latest commit

 

History

History
102 lines (54 loc) · 2.24 KB

loading_methods.mdx

File metadata and controls

102 lines (54 loc) · 2.24 KB

Loading methods

Methods for listing and loading datasets:

Datasets

[[autodoc]] datasets.load_dataset

[[autodoc]] datasets.load_from_disk

[[autodoc]] datasets.load_dataset_builder

[[autodoc]] datasets.get_dataset_config_names

[[autodoc]] datasets.get_dataset_infos

[[autodoc]] datasets.get_dataset_split_names

From files

Configurations used to load data files. They are used when loading local files or a dataset repository:

  • local files: load_dataset("parquet", data_dir="path/to/data/dir")
  • dataset repository: load_dataset("allenai/c4")

You can pass arguments to load_dataset to configure data loading. For example you can specify the sep parameter to define the [~datasets.packaged_modules.csv.CsvConfig] that is used to load the data:

load_dataset("csv", data_dir="path/to/data/dir", sep="\t")

Text

[[autodoc]] datasets.packaged_modules.text.TextConfig

[[autodoc]] datasets.packaged_modules.text.Text

CSV

[[autodoc]] datasets.packaged_modules.csv.CsvConfig

[[autodoc]] datasets.packaged_modules.csv.Csv

JSON

[[autodoc]] datasets.packaged_modules.json.JsonConfig

[[autodoc]] datasets.packaged_modules.json.Json

XML

[[autodoc]] datasets.packaged_modules.xml.XmlConfig

[[autodoc]] datasets.packaged_modules.xml.Xml

Parquet

[[autodoc]] datasets.packaged_modules.parquet.ParquetConfig

[[autodoc]] datasets.packaged_modules.parquet.Parquet

Arrow

[[autodoc]] datasets.packaged_modules.arrow.ArrowConfig

[[autodoc]] datasets.packaged_modules.arrow.Arrow

SQL

[[autodoc]] datasets.packaged_modules.sql.SqlConfig

[[autodoc]] datasets.packaged_modules.sql.Sql

Images

[[autodoc]] datasets.packaged_modules.imagefolder.ImageFolderConfig

[[autodoc]] datasets.packaged_modules.imagefolder.ImageFolder

Audio

[[autodoc]] datasets.packaged_modules.audiofolder.AudioFolderConfig

[[autodoc]] datasets.packaged_modules.audiofolder.AudioFolder

Videos

[[autodoc]] datasets.packaged_modules.videofolder.VideoFolderConfig

[[autodoc]] datasets.packaged_modules.videofolder.VideoFolder

Pdf

[[autodoc]] datasets.packaged_modules.pdffolder.PdfFolderConfig

[[autodoc]] datasets.packaged_modules.pdffolder.PdfFolder

WebDataset

[[autodoc]] datasets.packaged_modules.webdataset.WebDataset