torch_frame.data.DataFrameToTensorFrameConverter
- class DataFrameToTensorFrameConverter(col_to_stype: dict[str, torch_frame._stype.stype], col_stats: dict[str, dict[torch_frame.data.stats.StatType, Any]], target_col: Optional[str] = None, col_to_sep: Optional[dict[str, str | None]] = None, col_to_text_embedder_cfg: Optional[dict[str, torch_frame.config.text_embedder.TextEmbedderConfig]] = None, col_to_text_tokenizer_cfg: Optional[dict[str, torch_frame.config.text_tokenizer.TextTokenizerConfig]] = None, col_to_image_embedder_cfg: Optional[dict[str, torch_frame.config.image_embedder.ImageEmbedderConfig]] = None, col_to_time_format: Optional[dict[str, str | None]] = None)[source]
Bases:
objectA data frame to
TensorFrameconverter.Note that this object is supposed be constructed inside
Datasetobject viadataset.convert_to_tensor_frame.- Parameters:
col_to_stype (Dict[str,
torch_frame.stype]) – A dictionary that maps each column in the data frame to a semantic type.col_stats (Dict[str, Dict[StatType, Any]]) – A dictionary that maps column name into stats. Available as
dataset.col_stats.target_col (str, optional) – The column used as target. (default:
None)col_to_sep (Dict[str, Optional[str]], optional) – A dictionary specifying the separator/delimiter for the multi-categorical columns. (default:
None)col_to_text_embedder_cfg (Dict[str, TextEmbedderConfig, optional]) – A dictionary of configurations specifying
text_embedderthat embeds texts into vectors andbatch_sizethat specifies the mini-batch size fortext_embedder. (default:None)col_to_text_tokenizer_cfg (Dict[str, TextTokenizerConfig], optional) – A dictionary of text tokenizer configurations, specifying
text_tokenizerthat maps sentences into a list of dictionary of tensors. Each element in the list corresponds to each sentence, keys are input arguments to the model such asinput_ids, and values are tensors such as tokens.batch_sizespecifies the mini-batch size fortext_tokenizer. (default:None)col_to_time_format (Dict[str, Optional[str]], optional) – A dictionary of the time format for the timestamp columns. See strfttime for more information on formats. If a string is specified, then the same format will be used throughout all the timestamp columns. If a dictionary is given, we use a different format specified for each column. If not specified, Pandas’ internal to_datetime function will be used to auto parse time columns. (default:
None)