torch_frame.data.DataFrameToTensorFrameConverter
- class DataFrameToTensorFrameConverter(col_to_stype: dict[str, torch_frame.stype], col_stats: dict[str, dict[StatType, Any]], target_col: str | None = None, col_to_sep: dict[str, str | None] | None = None, col_to_text_embedder_cfg: dict[str, TextEmbedderConfig] | None = None, col_to_text_tokenizer_cfg: dict[str, TextTokenizerConfig] | None = None, col_to_image_embedder_cfg: dict[str, ImageEmbedderConfig] | None = None, col_to_time_format: dict[str, str | None] = None)[source]
Bases:
object
A data frame to
TensorFrame
converter.Note that this object is supposed be constructed inside
Dataset
object viadataset.convert_to_tensor_frame
.- Parameters:
col_to_stype (Dict[str,
torch_frame.stype
]) – A dictionary that maps each column in the data frame to a semantic type.col_stats (Dict[str, Dict[StatType, Any]]) – A dictionary that maps column name into stats. Available as
dataset.col_stats
.target_col (str, optional) – The column used as target. (default:
None
)col_to_sep (Dict[str, Optional[str]], optional) – A dictionary specifying the separator/delimiter for the multi-categorical columns. (default:
None
)col_to_text_embedder_cfg (Dict[str, TextEmbedderConfig, optional]) – A dictionary of configurations specifying
text_embedder
that embeds texts into vectors andbatch_size
that specifies the mini-batch size fortext_embedder
. (default:None
)col_to_text_tokenizer_cfg (Dict[str, TextTokenizerConfig], optional) – A dictionary of text tokenizer configurations, specifying
text_tokenizer
that maps sentences into a list of dictionary of tensors. Each element in the list corresponds to each sentence, keys are input arguments to the model such asinput_ids
, and values are tensors such as tokens.batch_size
specifies the mini-batch size fortext_tokenizer
. (default:None
)col_to_time_format (Dict[str, Optional[str]], optional) – A dictionary of the time format for the timestamp columns. See strfttime for more information on formats. If a string is specified, then the same format will be used throughout all the timestamp columns. If a dictionary is given, we use a different format specified for each column. If not specified, Pandas’ internal to_datetime function will be used to auto parse time columns. (default:
None
)