Transform Data#

VegaFusion can be used to evaluate datasets in a Vega spec and return them as arrow tables or DataFrames. This is the foundation for Vega-Altair’s chart.transformed_data method.

Python#

VegaFusionRuntime.pre_transform_datasets(spec: dict[str, Any] | str, datasets: list[str | tuple[str, list[int]]], local_tz: str | None = None, default_input_tz: str | None = None, row_limit: int | None = None, inline_datasets: dict[str, Any] | None = None, trim_unused_columns: bool = False, dataset_format: Literal['auto', 'polars', 'pandas', 'pyarrow', 'arro3'] = 'auto') tuple[list[Any], list[PreTransformWarning]]#

Extract the fully evaluated form of the requested datasets from a Vega specification.

Parameters:
  • spec – A Vega specification dict or JSON string.

  • datasets

    A list with elements that are either:

    • The name of a top-level dataset as a string

    • A two-element tuple where the first element is the name of a dataset as a string and the second element is the nested scope of the dataset as a list of integers

  • local_tz – Name of timezone to be considered local. E.g. 'America/New_York'. Defaults to the value of vf.get_local_tz(), which defaults to the system timezone if one can be determined.

  • default_input_tz – Name of timezone (e.g. 'America/New_York') that naive datetime strings should be interpreted in. Defaults to local_tz.

  • row_limit – Maximum number of dataset rows to include in the returned datasets. If exceeded, datasets will be truncated to this number of rows and a RowLimitExceeded warning will be included in the resulting warnings list.

  • inline_datasets – A dict from dataset names to pandas DataFrames or pyarrow Tables. Inline datasets may be referenced by the input specification using the following url syntax ‘vegafusion+dataset://{dataset_name}’ or ‘table://{dataset_name}’.

  • trim_unused_columns – If True, unused columns are removed from returned datasets.

  • dataset_format

    Format for returned datasets. One of:

    • "auto": (default) Infer the result type based on the types of inline datasets. If no inline datasets are provided, return type will depend on installed packages.

    • "polars": polars.DataFrame

    • "pandas": pandas.DataFrame

    • "pyarrow": pyarrow.Table

    • "arro3": arro3.Table

Returns:

Two-element tuple of

  • List of pandas DataFrames corresponding to the input datasets list

  • A list of warnings as dictionaries. Each warning dict has a ‘type’ key indicating the warning type, and a ‘message’ key containing a description of the warning.

Return type:

tuple[list[DataFrameLike], list[PreTransformWarning]]

Example: See pre_transform_data.py for a complete example.

Rust#

The Rust API provides a slightly more general pre_transform_values method that can extract dataset or signal values.

See pre_transform_data.rs for a complete example of extracting dataset values as arrow tables.