torch_frame.config.TextTokenizerConfig
- class TextTokenizerConfig(text_tokenizer: Callable[[list[str]], list[collections.abc.Mapping[str, torch.Tensor]] | collections.abc.Mapping[str, torch.Tensor]], batch_size: Optional[int] = None)[source]
Bases:
objectText tokenizer that maps a list of strings/sentences into a dictionary of
MultiNestedTensor.- Parameters:
text_tokenizer (callable) – A callable text tokenizer that takes a list of strings as input and outputs a list of dictionaries. Each dictionary contains keys that are arguments to the text encoder model and values are corresponding tensors such as tokens and attention masks.
batch_size (int, optional) – Batch size to use when tokenizing the sentences. If set to
None, the text embeddings will be obtained in a full-batch manner. (default:None)