Transform Data#
VegaFusion can be used to evaluate datasets in a Vega spec and return them as arrow tables or DataFrames. This is the foundation for Vega-Altair’s chart.transformed_data
method.
Python#
- VegaFusionRuntime.pre_transform_datasets(spec: dict[str, Any] | str, datasets: list[str | tuple[str, list[int]]], local_tz: str | None = None, default_input_tz: str | None = None, row_limit: int | None = None, inline_datasets: dict[str, Any] | None = None, trim_unused_columns: bool = False, dataset_format: Literal['auto', 'polars', 'pandas', 'pyarrow', 'arro3'] = 'auto') tuple[list[Any], list[PreTransformWarning]] #
Extract the fully evaluated form of the requested datasets from a Vega specification.
- Parameters:
spec – A Vega specification dict or JSON string.
datasets –
A list with elements that are either:
The name of a top-level dataset as a string
A two-element tuple where the first element is the name of a dataset as a string and the second element is the nested scope of the dataset as a list of integers
local_tz – Name of timezone to be considered local. E.g.
'America/New_York'
. Defaults to the value of vf.get_local_tz(), which defaults to the system timezone if one can be determined.default_input_tz – Name of timezone (e.g.
'America/New_York'
) that naive datetime strings should be interpreted in. Defaults tolocal_tz
.row_limit – Maximum number of dataset rows to include in the returned datasets. If exceeded, datasets will be truncated to this number of rows and a RowLimitExceeded warning will be included in the resulting warnings list.
inline_datasets – A dict from dataset names to pandas DataFrames or pyarrow Tables. Inline datasets may be referenced by the input specification using the following url syntax ‘vegafusion+dataset://{dataset_name}’ or ‘table://{dataset_name}’.
trim_unused_columns – If True, unused columns are removed from returned datasets.
dataset_format –
Format for returned datasets. One of:
"auto"
: (default) Infer the result type based on the types of inline datasets. If no inline datasets are provided, return type will depend on installed packages."polars"
: polars.DataFrame"pandas"
: pandas.DataFrame"pyarrow"
: pyarrow.Table"arro3"
: arro3.Table
- Returns:
Two-element tuple of
List of pandas DataFrames corresponding to the input datasets list
A list of warnings as dictionaries. Each warning dict has a ‘type’ key indicating the warning type, and a ‘message’ key containing a description of the warning.
- Return type:
tuple[list[DataFrameLike], list[PreTransformWarning]]
Example: See pre_transform_data.py for a complete example.
Rust#
The Rust API provides a slightly more general pre_transform_values
method that can extract dataset or signal values.
See pre_transform_data.rs for a complete example of extracting dataset values as arrow tables.