VegaFusion 1.2#
Pre-transform and Save Altair Charts
By: Jon Mease
The VegaFusion team is happy to announce the release of version 1.2. Along with usual bug fixes and updates to the core Arrow and DataFusion dependencies, this release includes support for saving pre-transformed Altair charts to HTML, JSON, PNG, and SVG files.
Standard Altair save#
Altair Charts provide a Chart.save()
method that may be used to save Altair charts to HTML, JSON, or static image (PNG or SVG) files. When the chart references a pandas DataFrame, the full DataFrame is serialized to JSON and included in the chart specification that is saved. As dataset sizes get larger, the time it takes to save these files increases rapidly, and for the case of HTML and JSON formats, the file size increases rapidly as well.
VegaFusion’s new save functions improve the situation by pre-applying data transformations and removing unused columns before inlining the resulting data in the chart specification for saving.
HTML performance comparison#
Here are two examples of the benefits of the new save_html
function compared to the standard Altair Chart.save
method.
With aggregations#
Let’s save a one million row histogram chart to an HTML file. First create and display the histogram with the VegaFusion Mime Renderer enabled.
import pandas as pd
import altair as alt
import vegafusion as vf
vf.enable()
flights = pd.read_parquet(
"https://vegafusion-datasets.s3.amazonaws.com/vega/flights_1m.parquet"
)
delay_hist = alt.Chart(flights).mark_bar().encode(
alt.X("delay", bin=alt.Bin(maxbins=30)),
alt.Y("count()")
)
delay_hist
Next, save the chart to an HTML file with the standard Altair Chart.save
method.
%%time
delay_hist.save("delay_hist_standard.html")
CPU times: user 5.72 s, sys: 388 ms, total: 6.11 s
Wall time: 6.18 s
This results in a ~116MB file, as all columns from the entire million row dataset are included in the HTML file. It also takes over 6 seconds to perform the save. Now, use VegaFusion’s save_html
function.
%%time
vf.save_html(delay_hist, "delay_hist_vf.html")
CPU times: user 227 ms, sys: 119 ms, total: 347 ms
Wall time: 427 ms
The resulting file is now only ~5KB, because only ~30 rows have been included (one per histogram bin). It also takes under half a second to perform the save.
Without aggregations#
The benefits of VegaFusion’s new save functions are most dramatic for charts that use aggregations. Even so, VegaFusion’s ability to remove unused columns still results in smaller file sizes for unaggregated charts, especially when the input datasets have many unused columns.
Here’s an example scatter chart using the movies dataset from the vega-datasets package, which has 3201 rows and 16 columns.
import vegafusion as vf
import altair as alt
from vega_datasets import data
source = data.movies()
scatter_chart = alt.Chart(source).mark_point().encode(
alt.X("IMDB_Rating:Q"),
alt.Y("Rotten_Tomatoes_Rating:Q"),
)
scatter_chart
Save the chart to an HTML file with Altair’s standard Chart.save
method.
%%time
scatter_chart.save("scatter_standard.html")
CPU times: user 69.7 ms, sys: 14.7 ms, total: 84.4 ms
Wall time: 85 ms
This results in a 1.4MB file and takes ~80ms.
Now, save the file with VegaFusion’s save_html
function.
%%time
vf.save_html(scatter_chart, "scatter_vf.html")
CPU times: user 59.8 ms, sys: 15.2 ms, total: 75 ms
Wall time: 57.1 ms
This results in a 125K file and takes under 60ms. The file is more than 10x smaller because it only includes data for the two columns that are referenced by the scatter plot.
Updates to Arrow and DataFusion dependencies#
VegaFusion 1.2 updates the dependency on arrow-rs to version 36.0 and DataFusion to version 22.0.
Learn more#
Check out these resources if you’d like to learn more: