VegaFusion provides serverside scaling for the Vega visualization library. While not limited to Python, an initial application of VegaFusion is the scaling of the Altair Python interface to Vega-Lite.
As of version 1.0, VegaFusion is released under the same license as Vega, Vega-Lite, and Altair: BSD-3.
For more info on the future direction of the project, see the Roadmap.
Quickstart 1: Overcome MaxRowsError
The VegaFusion mime renderer can be used to overcome the Altair MaxRowsError
by performing data-intensive aggregations on the server and pruning unused columns from the source dataset. First install the vegafusion
Python package with the embed
extras enabled
pip install "vegafusion[embed]"
Then open a Jupyter notebook (either the classic notebook or a notebook inside JupyterLab), and create an Altair histogram of a 1 million row flights dataset
import pandas as pd
import altair as alt
flights = pd.read_parquet(
"https://vegafusion-datasets.s3.amazonaws.com/vega/flights_1m.parquet"
)
delay_hist = alt.Chart(flights).mark_bar().encode(
alt.X("delay", bin=alt.Bin(maxbins=30)),
alt.Y("count()")
)
delay_hist
---------------------------------------------------------------------------
MaxRowsError Traceback (most recent call last)
...
MaxRowsError: The number of rows in your dataset is greater than the maximum allowed (5000). For information on how to plot larger datasets in Altair, see the documentation
This results in an Altair MaxRowsError
, as by default Altair is configured to allow no more than 5,000 rows of data to be sent to the browser. This is a safety measure to avoid crashing the user’s browser. The VegaFusion mime renderer can be used to overcome this limitation by performing data intensive transforms (e.g. filtering, binning, aggregation, etc.) in the Python kernel before the resulting data is sent to the web browser.
Run these two lines to import and enable the VegaFusion mime renderer
import vegafusion as vf
vf.enable()
Now the chart displays quickly without errors
delay_hist
Quickstart 2: Extract transformed data
By default, data transforms in an Altair chart (e.g. filtering, binning, aggregation, etc.) are performed by the Vega JavaScript library running in the browser. This has the advantage of making the charts produced by Altair fully standalone, not requiring access to a running Python kernel to render properly. But it has the disadvantage of making it difficult to access the transformed data (e.g. the histogram bin edges and count values) from Python. Since VegaFusion evaluates these transforms in the Python kernel, it’s possible to access then from Python using the vegafusion.transformed_data()
function.
For example, the following code demonstrates how to access the histogram bin edges and counts for the example above:
import pandas as pd
import altair as alt
import vegafusion as vf
flights = pd.read_parquet(
"https://vegafusion-datasets.s3.amazonaws.com/vega/flights_1m.parquet"
)
delay_hist = alt.Chart(flights).mark_bar().encode(
alt.X("delay", bin=alt.Bin(maxbins=30)),
alt.Y("count()")
)
vf.transformed_data(delay_hist)
bin_maxbins_30_delay |
bin_maxbins_30_delay_end |
__count |
|
---|---|---|---|
0 |
-20 |
0 |
419400 |
1 |
80 |
100 |
11000 |
2 |
0 |
20 |
392700 |
3 |
40 |
60 |
38400 |
4 |
60 |
80 |
21800 |
5 |
20 |
40 |
92700 |
6 |
100 |
120 |
5300 |
7 |
-40 |
-20 |
9900 |
8 |
120 |
140 |
3300 |
9 |
140 |
160 |
2000 |
10 |
160 |
180 |
1800 |
11 |
320 |
340 |
100 |
12 |
180 |
200 |
900 |
13 |
240 |
260 |
100 |
14 |
-60 |
-40 |
100 |
15 |
260 |
280 |
100 |
16 |
200 |
220 |
300 |
17 |
360 |
380 |
100 |
Quickstart 3: Accelerate interactive charts
While the VegaFusion mime renderer works great for non-interactive Altair charts, it’s not as well suited for interactive charts visualizing large datasets. This is because the mime renderer does not maintain a live connection between the browser and the python kernel, so all the data that participates in an interaction must be sent to the browser.
To address this situation, VegaFusion provides a Jupyter Widget based renderer that does maintain a live connection between the chart in the browser and the Python kernel. In this configuration, selection operations (e.g. filtering to the extents of a brush selection) can be evaluated interactively in the Python kernel, which eliminates the need to transfer the full dataset to the client in order to maintain interactivity.
The VegaFusion widget renderer is provided by the vegafusion-jupyter
package.
pip install "vegafusion-jupyter[embed]"
Instead of enabling the mime render with vf.enable()
, the widget renderer is enabled with vf.enable_widget()
. Here is a full example that uses the widget renderer to display an interactive Altair chart that implements linked histogram brushing for a 1 million row flights dataset.
import pandas as pd
import altair as alt
import vegafusion as vf
vf.enable_widget()
flights = pd.read_parquet(
"https://vegafusion-datasets.s3.amazonaws.com/vega/flights_1m.parquet"
)
brush = alt.selection_interval(encodings=['x'])
# Define the base chart, with the common parts of the
# background and highlights
base = alt.Chart().mark_bar().encode(
x=alt.X(alt.repeat('column')).bin(maxbins=20),
y='count()'
).properties(
width=160,
height=130
)
# gray background with selection
background = base.encode(
color=alt.value('#ddd')
).add_params(brush)
# blue highlights on the selected data
highlight = base.transform_filter(brush)
# layer the two charts & repeat
chart = alt.layer(
background,
highlight,
data=flights
).transform_calculate(
"time",
"hours(datum.date)"
).repeat(column=["distance", "delay", "time"])
chart
Histogram binning, aggregation, and selection filtering are now evaluated in the Python kernel process with efficient parallelization, and only the aggregated data (one row per histogram bar) is sent to the browser.
You can see that the VegaFusion widget renderer maintains a live connection to the Python kernel by noticing that the Python kernel is running as the selection region is created or moved. You can also notice the VegaFusion logo in the dropdown menu button.
Stewardship
The VegaFusion project was created by Jon Mease and is now stewarded by Hex Technologies, which uses VegaFusion in production to accelerate its Vega-Lite powered chart editor. Hex is committed to supporting VegaFusion’s ongoing development and is excited to collaborate with the community to make VegaFusion useful throughout the Vega ecosystem.
Recent Posts
2023-08-21 - Announcing VegaFusion 1.4
2023-06-10 - Announcing VegaFusion 1.3
2023-04-12 - Announcing VegaFusion 1.2
2023-03-25 - Announcing VegaFusion 1.1
2023-01-21 - Announcing VegaFusion 1.0
2022-01-27 - Announcing VegaFusion 0.1
2022-01-19 - Announcing VegaFusion 0.0.1