torch_frame.datasets

Real-World Datasets

Titanic

The Titanic dataset from the Titanic Kaggle competition.

AdultCensusIncome

The Adult Census Income dataset from Kaggle.

ForestCoverType

The Forest Cover Type dataset from Kaggle.

Dota2

The Dota2 Game Results dataset.

Mushroom

The Mushroom classification Kaggle competition dataset.

PokerHand

The Poker Hand dataset.

BankMarketing

The Bank Marketing dataset.

TabularBenchmark

A collection of Tabular benchmark datasets introduced in "Why do tree-based models still outperform deep learning on tabular data?".

Yandex

The Yandex dataset collections used by "Revisiting Deep Learning Models for Tabular Data".

KDDCensusIncome

The KDD Census Income dataset.

MultimodalTextBenchmark

The tabular data with text columns benchmark datasets used by "Benchmarking Multimodal AutoML for Tabular Data with Text Fields".

DataFrameBenchmark

A collection of standardized datasets for tabular learning, covering categorical and numerical features.

DataFrameTextBenchmark

A collection of datasets for tabular learning with text columns, covering categorical, numerical, multi-categorical and timestamp features.

Mercari

The Mercari Price Suggestion Challenge dataset from Kaggle.

AmazonFineFoodReviews

The Amazon Fine Food Reviews dataset.

DiamondImages

The Diamond Images dataset from Kaggle.

Synthetic Datasets

FakeDataset

A fake dataset for testing purpose.

Other Datasets

HuggingFaceDatasetDict

Load a Hugging Face datasets.DatasetDict dataset to a torch_frame.data.Dataset with pre-defined split information.