Transform Extract#

The pre_transform_extract method generates a transformed spec like the pre_transform_spec method, but instead of inlining the transformed datasets in the spec, these datasets are returned separately in arrow table format. This can be useful in contexts where the inline datasets are large, and it’s possible to transmit them more efficiently in arrow format.

Python#

VegaFusionRuntime.pre_transform_extract(spec: dict[str, Any] | str, local_tz: str | None = None, default_input_tz: str | None = None, preserve_interactivity: bool = True, extract_threshold: int = 20, extracted_format: str = 'arro3', inline_datasets: dict[str, DataFrameLike] | None = None, keep_signals: list[str | tuple[str, list[int]]] | None = None, keep_datasets: list[str | tuple[str, list[int]]] | None = None) tuple[dict[str, Any], list[tuple[str, list[int], pa.Table]], list[PreTransformWarning]]#

Evaluate supported transforms in an input Vega specification.

Produces a new specification with small pre-transformed datasets (under extract_threshold rows) included inline and larger inline datasets (extract_threshold rows or more) extracted into arrow tables.

Parameters:
  • spec – A Vega specification dict or JSON string.

  • local_tz – Name of timezone to be considered local. E.g. ‘America/New_York’. Defaults to the value of vf.get_local_tz(), which defaults to the system timezone if one can be determined.

  • default_input_tz – Name of timezone (e.g. ‘America/New_York’) that naive datetime strings should be interpreted in. Defaults to local_tz.

  • preserve_interactivity – If True (default) then the interactive behavior of the chart will be preserved. This requires that all the data that participates in interactions be included in the resulting spec rather than being pre-transformed. If False, then all possible data transformations are applied even if they break the original interactive behavior of the chart.

  • extract_threshold – Datasets with length below extract_threshold will be inlined.

  • extracted_format

    The format for the extracted datasets. Options are:

    • "arro3": (default) arro3.Table

    • "pyarrow": pyarrow.Table

    • "arrow-ipc": bytes in arrow IPC format

    • "arrow-ipc-base64": base64 encoded arrow IPC format

  • inline_datasets – A dict from dataset names to pandas DataFrames or pyarrow Tables. Inline datasets may be referenced by the input specification using the following url syntax ‘vegafusion+dataset://{dataset_name}’ or ‘table://{dataset_name}’.

  • keep_signals

    Signals from the input spec that must be included in the pre-transformed spec, even if they are no longer referenced. A list with elements that are either:

    • The name of a top-level signal as a string

    • A two-element tuple where the first element is the name of a signal as a string and the second element is the nested scope of the dataset as a list of integers

  • keep_datasets

    Datasets from the input spec that must be included in the pre-transformed spec even if they are no longer referenced. A list with elements that are either:

    • The name of a top-level dataset as a string

    • A two-element tuple where the first element is the name of a dataset as a string and the second element is the nested scope of the dataset as a list of integers

Returns:

Three-element tuple of

  • The Vega specification as a dict with pre-transformed datasets included but left empty.

  • Extracted datasets as a list of three element tuples
    • dataset name

    • dataset scope list

    • arrow data

  • A list of warnings as dictionaries. Each warning dict has a 'type' key indicating the warning type, and a 'message' key containing a description of the warning. Potential warning types include:

    • 'RowLimitExceeded': Some datasets in resulting Vega specification have been truncated to the provided row limit

    • 'BrokenInteractivity': Some interactive features may have been broken in the resulting Vega specification

    • 'Unsupported': No transforms in the provided Vega specification were eligible for pre-transforming

Return type:

tuple[dict[str, Any], list[tuple[str, list[int], pa.Table]], list[PreTransformWarning]]

Example: See pre_transform_extract.py for a complete example.

Rust#

See pre_transform_extract.rs for a complete example.