Column Usage#

VegaFusion provides a function for introspecting a Vega specification and determining which columns are referenced from each root dataset. A root dataset is one defined at the top-level of the spec that includes a url or values properties. This is useful in contexts where it’s more efficient to minimize the number of columns provided to the Vega specification. For example, the Python library uses this function to determine how to downsample the input DataFrame columns prior to converting to Arrow.

When VegaFusion cannot precisely determine which columns are referenced from each root dataset, this function returns None or null for the corresponding dataset.

Python#

vegafusion.get_column_usage(spec: dict[str, Any]) dict[str, list[str] | None]#

Compute the columns from each root dataset that are referenced in a Vega spec.

Parameters:

spec – Vega spec

Returns:

Dict from root-level dataset name to either

  • A list of columns that are referenced in this dataset if this can be determined precisely

  • None if it was not possible to determine the full set of columns that are referenced from this dataset

Return type:

dict[str, list[str] | None]

See column_usage.py for a complete example.

Rust#

See column_usage.rs for a complete example.

JavaScript#

See the Editor Demo for example usage of the getColumnUsage function in the vegafusion-wasm package.