torch_frame.datasets.TabularBenchmark
- class TabularBenchmark(root: str, name: str)[source]
Bases:
DatasetA collection of Tabular benchmark datasets introduced in “Why do tree-based models still outperform deep learning on tabular data?”.
STATS:
Name
#rows
#cols (numerical)
#cols (categorical)
#classes
Task
Missing value ratio
albert
58,252
23
8
2
binary_classification
0.0%
compas-two-years
4,966
2
9
2
binary_classification
0.0%
covertype
423,680
10
44
2
binary_classification
0.0%
default-of-credit-card-clients
13,272
20
1
2
binary_classification
0.0%
electricity
38,474
7
1
2
binary_classification
0.0%
eye_movements
7,608
18
5
2
binary_classification
0.0%
road-safety
111,762
24
8
2
binary_classification
0.0%
Bioresponse
3,434
419
0
2
binary_classification
0.0%
Diabetes130US
71,090
7
0
2
binary_classification
0.0%
Higgs
940,160
24
0
2
binary_classification
0.0%
MagicTelescope
13,376
10
0
2
binary_classification
0.0%
MiniBooNE
72,998
50
0
2
binary_classification
0.0%
bank-marketing
10,578
7
0
2
binary_classification
0.0%
california
20,634
8
0
2
binary_classification
0.0%
credit
16,714
10
0
2
binary_classification
0.0%
heloc
10,000
22
0
2
binary_classification
0.0%
house_16H
13,488
16
0
2
binary_classification
0.0%
jannis
57,580
54
0
2
binary_classification
0.0%
pol
10,082
26
0
2
binary_classification
0.0%
analcatdata_supreme
4,052
1
6
1
regression
0.0%
Airlines_DepDelay_1M
1,000,000
5
0
1
regression
0.0%
Allstate_Claims_Severity
188,318
25
99
1
regression
0.0%
Bike_Sharing_Demand
17,379
6
5
1
regression
0.0%
Brazilian_houses
10,692
7
4
1
regression
0.0%
Mercedes_Benz_Greener_Manufacturing
4,209
1
358
1
regression
0.0%
SGEMM_GPU_kernel_performance
241,600
3
6
1
regression
0.0%
diamonds
53,940
6
3
1
regression
0.0%
house_sales
21,613
15
2
1
regression
0.0%
medical_charges
163,065
3
0
1
regression
0.0%
particulate-matter-ukair-2017
394,299
4
2
1
regression
0.0%
seattlecrime6
52,031
3
1
1
regression
0.0%
topo_2_1
8,885
252
3
1
regression
0.0%
visualizing_soil
8,641
3
1
1
regression
0.0%
cpu_act
8,192
21
0
1
regression
0.0%
elevators
16,599
16
0
1
regression
0.0%
houses
20,640
8
0
1
regression
0.0%
delays_zurich_transport
5,465,575
8
0
1
regression
0.0%
nyc-taxi-green-dec-2016
581,835
9
0
1
regression
0.0%
sulfur
10,081
6
0
1
regression
0.0%
superconduct
21,263
79
0
1
regression
0.0%
wine_quality
6,497
11
0
1
regression
0.0%
yprop_4_1
8,885
42
0
1
regression
0.0%