torch_frame.nn.models.ExcelFormer

class ExcelFormer(in_channels: int, out_channels: int, num_cols: int, num_layers: int, num_heads: int, col_stats: dict[str, dict[StatType, Any]], col_names_dict: dict[torch_frame.stype, list[str]], stype_encoder_dict: dict[torch_frame.stype, StypeEncoder] | None = None, diam_dropout: float = 0.0, aium_dropout: float = 0.0, residual_dropout: float = 0.0, mixup: str | None = None, beta: float = 0.5)[source]

Bases: Module

The ExcelFormer model introduced in the “ExcelFormer: A Neural Network Surpassing GBDTs on Tabular Data” paper.

ExcelFormer first converts the categorical features with a target statistics encoder (i.e., CatBoostEncoder in the paper) into numerical features. Then it sorts the numerical features with mutual information sort. So the model itself limits to numerical features.

Note

For an example of using ExcelFormer, see examples/excelformer.py.

Parameters:

in_channels (int) – Input channel dimensionality
out_channels (int) – Output channels dimensionality
num_cols (int) – Number of columns
num_layers (int) – Number of torch_frame.nn.conv.ExcelFormerConv layers.
num_heads (int) – Number of attention heads used in DiaM
col_stats (dict[str,dict[torch_frame.data.stats.StatType,Any]]) – A dictionary that maps column name into stats. Available as dataset.col_stats.
col_names_dict (dict[torch_frame.stype, list[str]]) – A dictionary that maps stype to a list of column names. The column names are sorted based on the ordering that appear in tensor_frame.feat_dict. Available as tensor_frame.col_names_dict.
stype_encoder_dict – (dict[torch_frame.stype, torch_frame.nn.encoder.StypeEncoder], optional): A dictionary mapping stypes into their stype encoders. (default: None, will call ExcelFormerEncoder() for numerical feature)
diam_dropout (float, optional) – diam_dropout. (default: 0.0)
aium_dropout (float, optional) – aium_dropout. (default: 0.0)
residual_dropout (float, optional) – residual dropout. (default: 0.0)
mixup (str, optional) – mixup type. None, feature, or hidden. (default: None)
beta (float, optional) – Shape parameter for beta distribution to calculate shuffle rate in mixup. Only useful when mixup is not None. (default: 0.5)

forward(tf: TensorFrame, mixup_encoded: bool = False) → Tensor | tuple[Tensor, Tensor][source]

Transform TensorFrame object into output embeddings. If mixup_encoded is True, it produces the output embeddings together with the mixed-up targets in self.mixup manner.

Parameters:

tf (torch_frame.TensorFrame) – Input TensorFrame object.
mixup_encoded (bool) – Whether to mixup on encoded numerical features, i.e., FEAT-MIX and HIDDEN-MIX.

Returns:

The output embeddings of size [batch_size, out_channels]. If mixup_encoded is True, return the mixed-up targets of size [batch_size, num_classes] as well.

Return type:

torch.Tensor | tuple[Tensor, Tensor]