torch_frame.datasets.TabularBenchmark

class TabularBenchmark(root: str, name: str)[source]

Bases: Dataset

A collection of Tabular benchmark datasets introduced in “Why do tree-based models still outperform deep learning on tabular data?”.

STATS:

Name

#rows

#cols (numerical)

#cols (categorical)

#classes

Task

Missing value ratio

albert

58,252

23

8

2

binary_classification

0.0%

compas-two-years

4,966

2

9

2

binary_classification

0.0%

covertype

423,680

10

44

2

binary_classification

0.0%

default-of-credit-card-clients

13,272

20

1

2

binary_classification

0.0%

electricity

38,474

7

1

2

binary_classification

0.0%

eye_movements

7,608

18

5

2

binary_classification

0.0%

road-safety

111,762

24

8

2

binary_classification

0.0%

Bioresponse

3,434

419

0

2

binary_classification

0.0%

Diabetes130US

71,090

7

0

2

binary_classification

0.0%

Higgs

940,160

24

0

2

binary_classification

0.0%

MagicTelescope

13,376

10

0

2

binary_classification

0.0%

MiniBooNE

72,998

50

0

2

binary_classification

0.0%

bank-marketing

10,578

7

0

2

binary_classification

0.0%

california

20,634

8

0

2

binary_classification

0.0%

credit

16,714

10

0

2

binary_classification

0.0%

heloc

10,000

22

0

2

binary_classification

0.0%

house_16H

13,488

16

0

2

binary_classification

0.0%

jannis

57,580

54

0

2

binary_classification

0.0%

pol

10,082

26

0

2

binary_classification

0.0%

analcatdata_supreme

4,052

1

6

1

regression

0.0%

Airlines_DepDelay_1M

1,000,000

5

0

1

regression

0.0%

Allstate_Claims_Severity

188,318

25

99

1

regression

0.0%

Bike_Sharing_Demand

17,379

6

5

1

regression

0.0%

Brazilian_houses

10,692

7

4

1

regression

0.0%

Mercedes_Benz_Greener_Manufacturing

4,209

1

358

1

regression

0.0%

SGEMM_GPU_kernel_performance

241,600

3

6

1

regression

0.0%

diamonds

53,940

6

3

1

regression

0.0%

house_sales

21,613

15

2

1

regression

0.0%

medical_charges

163,065

3

0

1

regression

0.0%

particulate-matter-ukair-2017

394,299

4

2

1

regression

0.0%

seattlecrime6

52,031

3

1

1

regression

0.0%

topo_2_1

8,885

252

3

1

regression

0.0%

visualizing_soil

8,641

3

1

1

regression

0.0%

cpu_act

8,192

21

0

1

regression

0.0%

elevators

16,599

16

0

1

regression

0.0%

houses

20,640

8

0

1

regression

0.0%

delays_zurich_transport

5,465,575

8

0

1

regression

0.0%

nyc-taxi-green-dec-2016

581,835

9

0

1

regression

0.0%

sulfur

10,081

6

0

1

regression

0.0%

superconduct

21,263

79

0

1

regression

0.0%

wine_quality

6,497

11

0

1

regression

0.0%

yprop_4_1

8,885

42

0

1

regression

0.0%