torch_frame.transforms.CatToNumTransform

class CatToNumTransform[source]

Bases: FittableBaseTransform

Transforms categorical features in TensorFrame using target statistics. The original transform is explained in A preprocessing scheme for high-cardinality categorical attributes in classification and prediction problems paper.

Specifically, each categorical feature is transformed into numerical feature using m-probability estimate, defined by

\[\frac{n_c + p \cdot m}{n + m}\]

where \(n_c\) is the count of the category, \(n\) is the total count, \(p\) is the prior probability and \(m\) is a smoothing factor.