torch_frame.data.MultiNestedTensor

class MultiNestedTensor(num_rows: int, num_cols: int, values: Tensor, offset: Tensor)[source]

Bases: _MultiTensor

A read-only PyTorch tensor-based data structure that stores [num_rows, num_cols, *], where the size of last dimension can be different for different row/column. Internally, we store the object in an efficient flattened format: (values, offset), where the PyTorch Tensor at (i, j) is accessed by values[offset[i*num_cols+j]:offset[i*num_cols+j+1]]. It supports various advanced indexing, including slicing and list indexing along both row and column.

Parameters:

Example

>>> import torch
>>> from torch_frame.data import MultiNestedTensor
>>> tensor_mat = [
...    [torch.tensor([1, 2]), torch.tensor([3])],
...    [torch.tensor([4]), torch.tensor([5, 6, 7])],
...    [torch.tensor([8, 9]), torch.tensor([10])],
... ]
>>> mnt = MultiNestedTensor.from_tensor_mat(tensor_mat)
>>> mnt
MultiNestedTensor(num_rows=3, num_cols=2, device='cpu')
>>> mnt.values
tensor([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10])
>>> mnt.offset
tensor([ 0,  2,  3,  4,  7,  9, 10])
>>> mnt[0, 0]
torch.tensor([1, 2])
>>> mnt[1, 1]
tensor([5, 6, 7])
>>> mnt[0] # Row integer indexing
MultiNestedTensor(num_rows=1, num_cols=2, device='cpu')
>>> mnt[:, 0] # Column integer indexing
MultiNestedTensor(num_rows=3, num_cols=1, device='cpu')
>>> mnt[:2] # Row integer slicing
MultiNestedTensor(num_rows=2, num_cols=2, device='cpu')
>>> mnt[[2, 1, 2, 0]] # Row list indexing
MultiNestedTensor(num_rows=4, num_cols=2, device='cpu')
>>> mnt.to_dense(fill_value=-1) # Map to a dense matrix with padding
tensor([[[ 1,  2, -1],
         [ 3, -1, -1]],
        [[ 4, -1, -1],
         [ 5,  6,  7]],
        [[ 8,  9, -1],
         [10, -1, -1]]])
classmethod from_tensor_mat(tensor_mat: list[list[torch.Tensor]]) MultiNestedTensor[source]

Construct MultiNestedTensor object from tensor_mat.

Parameters:

tensor_mat (List[List[Tensor]]) – A matrix of torch.Tensor objects. tensor_mat[i][j] contains 1-dim torch.Tensor of i-th row and j-th column, varying in size.

Returns:

A MultiNestedTensor instance.

Return type:

MultiNestedTensor

fillna_col(col_index: int, fill_value: int | float | Tensor) None[source]

Fill the index-th column in MultiTensor with fill_value in-place.

Parameters:
  • col_index (int) – A column index of the tensor to select.

  • fill_value (Union[int, float, Tensor]) – Scalar values to replace NaNs.

to_dense(fill_value: int | float) Tensor[source]

Map MultiNestedTensor into dense Tensor representation with padding.

Parameters:

fill_value (Union[int, float]) – Fill values.

Returns:

Padded PyTorch Tensor object with shape

(num_rows, num_cols, max_length)

Return type:

Tensor

static cat(xs: Sequence[MultiNestedTensor], dim: int = 0) MultiNestedTensor[source]

Concatenates a sequence of MultiNestedTensor along the specified dimension.

Parameters:
Returns:

Concatenated multi nested tensor.

Return type:

MultiNestedTensor