torch_frame.transforms
Transforms
PyTorch Frame allows for data transformation across different stype
’s or within the same stype
. Transforms takes in both TensorFrame
and column stats.
Let’s look an example, where we apply CatToNumTransform to transform the categorical features into numerical features.
from torch_frame.datasets import Yandex
from torch_frame.transforms import CatToNumTransform
from torch_frame import stype
dataset = Yandex(root='/tmp/adult', name='adult')
dataset.materialize()
transform = CatToNumTransform()
train_dataset = dataset.get_split('train')
train_dataset.tensor_frame.col_names_dict[stype.categorical]
>>> ['C_feature_0', 'C_feature_1', 'C_feature_2', 'C_feature_3', 'C_feature_4', 'C_feature_5', 'C_feature_6', 'C_feature_7']
test_dataset = dataset.get_split('test')
transform.fit(train_dataset.tensor_frame, dataset.col_stats)
transformed_col_stats = transform.transformed_stats
transformed_col_stats.keys()
>>> dict_keys(['C_feature_0_0', 'C_feature_1_0', 'C_feature_2_0', 'C_feature_3_0', 'C_feature_4_0', 'C_feature_5_0', 'C_feature_6_0', 'C_feature_7_0'])
transformed_col_stats['C_feature_0_0']
>>> {<StatType.MEAN: 'MEAN'>: 0.6984029484029484, <StatType.STD: 'STD'>: 0.45895127199411595, <StatType.QUANTILES: 'QUANTILES'>: [0.0, 0.0, 1.0, 1.0, 1.0]}
transform(test_dataset.tensor_frame)
>>> TensorFrame(
num_cols=14,
num_rows=16281,
numerical (14): ['N_feature_0', 'N_feature_1', 'N_feature_2', 'N_feature_3', 'N_feature_4', 'N_feature_5', 'C_feature_0_0', 'C_feature_1_0', 'C_feature_2_0', 'C_feature_3_0', 'C_feature_4_0', 'C_feature_5_0', 'C_feature_6_0', 'C_feature_7_0'],
has_target=True,
device=cpu,
)
You can see that after the transform, the column names of the categorical features changes and the categorical features are transformed into numerical features.
An abstract base class for writing transforms. |
|
An abstract base class for writing fittable transforms. |
|
Transforms categorical features in |
|
A transform that sorts the numerical features of input |