--- date: 2023-04-12 category: Release author: Jon Mease --- # VegaFusion 1.2 **Pre-transform and Save Altair Charts** By: Jon Mease --- The VegaFusion team is happy to announce the release of version 1.2. Along with usual bug fixes and updates to the core Arrow and DataFusion dependencies, this release includes support for saving pre-transformed Altair charts to HTML, JSON, PNG, and SVG files. ## Standard Altair save Altair Charts provide a [`Chart.save()`](https://altair-viz.github.io/user_guide/saving_charts.html) method that may be used to save Altair charts to HTML, JSON, or static image (PNG or SVG) files. When the chart references a pandas DataFrame, the full DataFrame is serialized to JSON and included in the chart specification that is saved. As dataset sizes get larger, the time it takes to save these files increases rapidly, and for the case of HTML and JSON formats, the file size increases rapidly as well. VegaFusion's new save functions improve the situation by pre-applying data transformations and removing unused columns before inlining the resulting data in the chart specification for saving. ## HTML performance comparison Here are two examples of the benefits of the new `save_html` function compared to the standard Altair `Chart.save` method. ### With aggregations Let's save a one million row histogram chart to an HTML file. First create and display the histogram with the VegaFusion Mime Renderer enabled. ``` import pandas as pd import altair as alt import vegafusion as vf vf.enable() flights = pd.read_parquet( "https://vegafusion-datasets.s3.amazonaws.com/vega/flights_1m.parquet" ) delay_hist = alt.Chart(flights).mark_bar().encode( alt.X("delay", bin=alt.Bin(maxbins=30)), alt.Y("count()") ) delay_hist ``` ![histogram](https://user-images.githubusercontent.com/15064365/230728055-64b05777-9925-4711-b7c4-3e4cc52e1c83.png) Next, save the chart to an HTML file with the standard Altair `Chart.save` method. ```python %%time delay_hist.save("delay_hist_standard.html") ``` ``` CPU times: user 5.72 s, sys: 388 ms, total: 6.11 s Wall time: 6.18 s ``` This results in a ~116MB file, as all columns from the entire million row dataset are included in the HTML file. It also takes over 6 seconds to perform the save. Now, use VegaFusion's `save_html` function. ```python %%time vf.save_html(delay_hist, "delay_hist_vf.html") ``` ``` CPU times: user 227 ms, sys: 119 ms, total: 347 ms Wall time: 427 ms ``` The resulting file is now only ~5KB, because only ~30 rows have been included (one per histogram bin). It also takes under half a second to perform the save. ### Without aggregations The benefits of VegaFusion's new save functions are most dramatic for charts that use aggregations. Even so, VegaFusion's ability to remove unused columns still results in smaller file sizes for unaggregated charts, especially when the input datasets have many unused columns. Here's an example scatter chart using the movies dataset from the vega-datasets package, which has 3201 rows and 16 columns. ```python import vegafusion as vf import altair as alt from vega_datasets import data source = data.movies() scatter_chart = alt.Chart(source).mark_point().encode( alt.X("IMDB_Rating:Q"), alt.Y("Rotten_Tomatoes_Rating:Q"), ) scatter_chart ``` ![movies scatter chart](https://user-images.githubusercontent.com/15064365/230728307-6d8a2c6c-e45e-483b-a207-3abf0c26449b.png) Save the chart to an HTML file with Altair's standard `Chart.save` method. ```python %%time scatter_chart.save("scatter_standard.html") ``` ``` CPU times: user 69.7 ms, sys: 14.7 ms, total: 84.4 ms Wall time: 85 ms ``` This results in a 1.4MB file and takes ~80ms. Now, save the file with VegaFusion's `save_html` function. ```python %%time vf.save_html(scatter_chart, "scatter_vf.html") ``` ``` CPU times: user 59.8 ms, sys: 15.2 ms, total: 75 ms Wall time: 57.1 ms ``` This results in a 125K file and takes under 60ms. The file is more than 10x smaller because it only includes data for the two columns that are referenced by the scatter plot. ## Updates to Arrow and DataFusion dependencies VegaFusion 1.2 updates the dependency on arrow-rs to [version 36.0](https://github.com/apache/arrow-rs/blob/master/CHANGELOG-old.md#3400-2023-02-24) and DataFusion to [version 22.0](https://github.com/apache/arrow-datafusion/blob/main/dev/changelog/22.0.0.md). ## Learn more Check out these resources if you'd like to learn more: - [VegaFusion Documentation](https://vegafusion.io/) - [VegaFusion GitHub](https://github.com/hex-inc/vegafusion) - [Report and Issue](https://github.com/hex-inc/vegafusion/issues) - [Start a Discussions](https://github.com/hex-inc/vegafusion/discussions)