VegaFusion 1.4#

Improved Vega coverage, external data source foundations, extended architecture support

By: Jon Mease


The VegaFusion team is happy to announce the release of version 1.4. Along with the usual bug fixes and updates to the core Arrow and DataFusion dependencies, this release improves coverage of Vega’s features by supporting q1/q3 aggregation functions and bitwise operators. It also lays important foundations for supporting external data sources and compute engines, and adds additional architectures for pip and conda packages.

Improved Vega coverage#

VegaFusion 1.4 adds support for the q1 and q3 aggregation functions. This makes it possible for VegaFusion to evaluate all the transforms associated with a Vega-Lite boxplot. Here’s a Vega-Altair example:

import altair as alt
from vega_datasets import data
import vegafusion as vf
vf.enable()

source = data.cars()

chart = alt.Chart(source).mark_boxplot(extent="min-max").encode(
    alt.X("Miles_per_Gallon:Q").scale(zero=False),
    alt.Y("Origin:N"),
)
chart

boxplot

An easy way to see that the transforms are supported is to extract the transformed data with vf.transformed_data.

vf.transformed_data(chart)

Origin

lower_box_Miles_per_Gallon

upper_box_Miles_per_Gallon

mid_box_Miles_per_Gallon

lower_whisker_Miles_per_Gallon

upper_whisker_Miles_per_Gallon

0

USA

15

24

18.5

9

39

1

Europe

24

30.65

26.5

16.2

44.3

2

Japan

25.7

34.05

31.6

18

46.6

In addition, the full complement of bitwise operators are now supported in the Vega expression language including |, &, ^, <<, and >>.

External Data Source Foundations#

VegaFusion 1.4 lays some important foundations toward the goal of supporting external data sources and compute engines. The vegafusion.dataset.sql.SqlDataset abstract class defines the interface for implementing SQL data sources in Python for any of VegaFusion’s 10 supported SQL dialects. Implementations for DuckDB and Snowpark are also provided. In addition, the vegafusion.dataset.DataFrameDataset abstract class defines the interface for implementing VegaFusion’s data transformations with external DataFrame libraries. Motivating examples include the future ability to dispatch VegaFusion data transformations to the Ibis and Polars Python libraries.

In the coming release of Vega-Altair 5.1, it will be possible to pass implementations of SqlDataset and DataFrameDataset to Altair Chart objects instead of pandas DataFrames. Stay tuned for more information and examples after the release of Altair 5.1!

Extended Architecture Support#

VegaFusion wheels are now built and published to PyPI for the aarch64 Linux architecture. In addition, conda-forge packages are now published for the Apple Silicon architecture.

Updates to Arrow and DataFusion dependencies#

VegaFusion 1.4 updates the dependency on arrow-rs to version 42.0.0 and DataFusion to version 27.0.0.

Looking ahead#

The coming release of Vega-Altair 5.1 will include first-class integration with VegaFusion to support extracting transformed data from a chart with chart.transformed_data(). It will also include a "vegafusion" data transformer that will cause Altair to use VegaFusion to pre-evaluate data transformations and remove unused columns when saving or displaying charts. The timeline has not been decided on yet, but the plan is to eventually deprecate VegaFusion’s transformed_data and save_* functions and the VegaFusion mime renderer in favor of the integrations built into Altair.

Another near-term focus is on lowering the barrier to contributing to VegaFusion by adopting the Pixi environment manager.

Learn more#

Check out these resources to learn more: