torch_frame.transforms

Transforms 

PyTorch Frame allows for data transformation across different stype’s or within the same stype. Transforms takes in both TensorFrame and column stats.

Let’s look an example, where we apply CatToNumTransform to transform the categorical features into numerical features.

from torch_frame.datasets import Yandex
from torch_frame.transforms import CatToNumTransform
from torch_frame import stype

dataset = Yandex(root='/tmp/adult', name='adult')
dataset.materialize()
transform = CatToNumTransform()
train_dataset = dataset.get_split('train')

train_dataset.tensor_frame.col_names_dict[stype.categorical]
>>> ['C_feature_0', 'C_feature_1', 'C_feature_2', 'C_feature_3', 'C_feature_4', 'C_feature_5', 'C_feature_6', 'C_feature_7']

test_dataset = dataset.get_split('test')
transform.fit(train_dataset.tensor_frame, dataset.col_stats)

transformed_col_stats = transform.transformed_stats

transformed_col_stats.keys()
>>> dict_keys(['C_feature_0_0', 'C_feature_1_0', 'C_feature_2_0', 'C_feature_3_0', 'C_feature_4_0', 'C_feature_5_0', 'C_feature_6_0', 'C_feature_7_0'])

transformed_col_stats['C_feature_0_0']
>>> {<StatType.MEAN: 'MEAN'>: 0.6984029484029484, <StatType.STD: 'STD'>: 0.45895127199411595, <StatType.QUANTILES: 'QUANTILES'>: [0.0, 0.0, 1.0, 1.0, 1.0]}

transform(test_dataset.tensor_frame)
>>> TensorFrame(
      num_cols=14,
      num_rows=16281,
      numerical (14): ['N_feature_0', 'N_feature_1', 'N_feature_2', 'N_feature_3', 'N_feature_4', 'N_feature_5', 'C_feature_0_0', 'C_feature_1_0', 'C_feature_2_0', 'C_feature_3_0', 'C_feature_4_0', 'C_feature_5_0', 'C_feature_6_0', 'C_feature_7_0'],
      has_target=True,
      device=cpu,
    )

You can see that after the transform, the column names of the categorical features changes and the categorical features are transformed into numerical features.

`BaseTransform`	An abstract base class for writing transforms.
`FittableBaseTransform`	An abstract base class for writing fittable transforms.
`CatToNumTransform`	A transform that encodes the categorical features of the `TensorFrame` object using target statistics.
`MutualInformationSort`	A transform that sorts the numerical features of input

Read the Docs v: latest

Versions: latest; stable

Downloads

On Read the Docs: Project Home; Builds