Announcing VegaFusion 1.2
Pre-transform and Save Altair Charts
By: Jon Mease
The VegaFusion team is happy to announce the release of version 1.2. Along with usual bug fixes and updates to the core Arrow and DataFusion dependencies, this release includes support for saving pre-transformed Altair charts to HTML, JSON, PNG, and SVG files.
Standard Altair save
Altair Charts provide a
Chart.save() method that may be used to save Altair charts to HTML, JSON, or static image (PNG or SVG) files. When the chart references a pandas DataFrame, the full DataFrame is serialized to JSON and included in the chart specification that is saved. As dataset sizes get larger, the time it takes to save these files increases rapidly, and for the case of HTML and JSON formats, the file size increases rapidly as well.
VegaFusion’s new save functions improve the situation by pre-applying data transformations and removing unused columns before inlining the resulting data in the chart specification for saving.
HTML performance comparison
Here are two examples of the benefits of the new
save_html function compared to the standard Altair
Let’s save a one million row histogram chart to an HTML file. First create and display the histogram with the VegaFusion Mime Renderer enabled.
import pandas as pd import altair as alt import vegafusion as vf vf.enable() flights = pd.read_parquet( "https://vegafusion-datasets.s3.amazonaws.com/vega/flights_1m.parquet" ) delay_hist = alt.Chart(flights).mark_bar().encode( alt.X("delay", bin=alt.Bin(maxbins=30)), alt.Y("count()") ) delay_hist
Next, save the chart to an HTML file with the standard Altair
CPU times: user 5.72 s, sys: 388 ms, total: 6.11 s Wall time: 6.18 s
This results in a ~116MB file, as all columns from the entire million row dataset are included in the HTML file. It also takes over 6 seconds to perform the save. Now, use VegaFusion’s
%%time vf.save_html(delay_hist, "delay_hist_vf.html")
CPU times: user 227 ms, sys: 119 ms, total: 347 ms Wall time: 427 ms
The resulting file is now only ~5KB, because only ~30 rows have been included (one per histogram bin). It also takes under half a second to perform the save.
The benefits of VegaFusion’s new save functions are most dramatic for charts that use aggregations. Even so, VegaFusion’s ability to remove unused columns still results in smaller file sizes for unaggregated charts, especially when the input datasets have many unused columns.
Here’s an example scatter chart using the movies dataset from the vega-datasets package, which has 3201 rows and 16 columns.
import vegafusion as vf import altair as alt from vega_datasets import data source = data.movies() scatter_chart = alt.Chart(source).mark_point().encode( alt.X("IMDB_Rating:Q"), alt.Y("Rotten_Tomatoes_Rating:Q"), ) scatter_chart
Save the chart to an HTML file with Altair’s standard
CPU times: user 69.7 ms, sys: 14.7 ms, total: 84.4 ms Wall time: 85 ms
This results in a 1.4MB file and takes ~80ms.
Now, save the file with VegaFusion’s
%%time vf.save_html(scatter_chart, "scatter_vf.html")
CPU times: user 59.8 ms, sys: 15.2 ms, total: 75 ms Wall time: 57.1 ms
This results in a 125K file and takes under 60ms. The file is more than 10x smaller because it only includes data for the two columns that are referenced by the scatter plot.
Updates to Arrow and DataFusion dependencies
VegaFusion 1.2 updates the dependency on arrow-rs to version 36.0 and DataFusion to version 22.0.
Check out these resources if you’d like to learn more: