torch_frame.data.DataFrameToTensorFrameConverter

class DataFrameToTensorFrameConverter(col_to_stype: dict[str, torch_frame.stype], col_stats: dict[str, dict[StatType, Any]], target_col: str | None = None, col_to_sep: dict[str, str | None] | None = None, col_to_text_embedder_cfg: dict[str, TextEmbedderConfig] | None = None, col_to_text_tokenizer_cfg: dict[str, TextTokenizerConfig] | None = None, col_to_image_embedder_cfg: dict[str, ImageEmbedderConfig] | None = None, col_to_time_format: dict[str, str | None] = None)[source]

Bases: object

A data frame to TensorFrame converter.

Note that this object is supposed be constructed inside Dataset object via dataset.convert_to_tensor_frame.

Parameters:
  • col_to_stype (Dict[str, torch_frame.stype]) – A dictionary that maps each column in the data frame to a semantic type.

  • col_stats (Dict[str, Dict[StatType, Any]]) – A dictionary that maps column name into stats. Available as dataset.col_stats.

  • target_col (str, optional) – The column used as target. (default: None)

  • col_to_sep (Dict[str, Optional[str]], optional) – A dictionary specifying the separator/delimiter for the multi-categorical columns. (default: None)

  • col_to_text_embedder_cfg (Dict[str, TextEmbedderConfig, optional]) – A dictionary of configurations specifying text_embedder that embeds texts into vectors and batch_size that specifies the mini-batch size for text_embedder. (default: None)

  • col_to_text_tokenizer_cfg (Dict[str, TextTokenizerConfig], optional) – A dictionary of text tokenizer configurations, specifying text_tokenizer that maps sentences into a list of dictionary of tensors. Each element in the list corresponds to each sentence, keys are input arguments to the model such as input_ids, and values are tensors such as tokens. batch_size specifies the mini-batch size for text_tokenizer. (default: None)

  • col_to_time_format (Dict[str, Optional[str]], optional) – A dictionary of the time format for the timestamp columns. See strfttime for more information on formats. If a string is specified, then the same format will be used throughout all the timestamp columns. If a dictionary is given, we use a different format specified for each column. If not specified, Pandas’ internal to_datetime function will be used to auto parse time columns. (default: None)