Transformed Data
VegaFusion supports extracting the transformed data for an Altair Chart using the vegafusion.transformed_data()
function. This is particularly useful when building a chart that includes a pipeline of transforms, as it’s now possible to see the intermediate results of each transform.
Example: Top K
Here is an example, based on the Top-K plot with Others example from the Altair documentation, of how transformed_data()
can be helpful when building a complex chart.
First, create an Altair Chart
wrapping the data source URL.
import altair as alt
import vegafusion as vf
source = "https://cdn.jsdelivr.net/npm/vega-datasets@v1.29.0/data/movies.json"
chart = alt.Chart(source)
The transformed_data()
function can be used on this empty chart to access a preview of the data that is available at the URL. Here the row_limit
argument is used to limit the result to 3 rows and the DataFrame is transposed to make it easier to read.
vf.transformed_data(chart, row_limit=3).T
0 |
1 |
2 |
|
---|---|---|---|
Title |
The Land Girls |
First Love, Last Rites |
I Married a Strange Person |
US_Gross |
146083 |
10876 |
203134 |
Worldwide_Gross |
146083 |
10876 |
203134 |
Production_Budget |
8000000 |
300000 |
250000 |
Release_Date |
Jun 12 1998 |
Aug 07 1998 |
Aug 28 1998 |
MPAA_Rating |
R |
R |
|
Distributor |
Gramercy |
Strand |
Lionsgate |
IMDB_Rating |
6.1 |
6.9 |
6.8 |
IMDB_Votes |
1071.0 |
207.0 |
865.0 |
Major_Genre |
Drama |
Comedy |
|
Rotten_Tomatoes_Rating |
nan |
nan |
nan |
Source |
|||
Creative_Type |
|||
Director |
|||
US_DVD_Sales |
nan |
nan |
nan |
Running_Time_min |
nan |
nan |
nan |
The first step of making this chart is to compute the average worldwide gross of all the movies for each director. This can be accomplished with the Altair Aggregate Transform.
chart = (
alt.Chart(source)
.transform_aggregate(
aggregate_gross='mean(Worldwide_Gross)',
groupby=["Director"],
)
)
vf.transformed_data(chart, row_limit=5)
Director |
aggregate_gross |
|
---|---|---|
0 |
3.59284e+07 |
|
1 |
Christopher Nolan |
3.44251e+08 |
2 |
Roman Polanski |
5.13407e+07 |
3 |
Richard Fleischer |
2.27635e+07 |
4 |
Blake Edwards |
5e+06 |
Next, the directors are ranked by average gross in descending order. This can be accomplished with the Altair Window Transform
chart = (
alt.Chart(source)
.transform_aggregate(
aggregate_gross='mean(Worldwide_Gross)',
groupby=["Director"],
).transform_window(
rank='row_number()',
sort=[alt.SortField("aggregate_gross", order="descending")],
)
)
vf.transformed_data(chart, row_limit=5)
Director |
aggregate_gross |
rank |
|
---|---|---|---|
0 |
David Yates |
9.37984e+08 |
1 |
1 |
James Cameron |
8.29781e+08 |
2 |
2 |
Carlos Saldanha |
7.69293e+08 |
3 |
3 |
Pete Docter |
7.31305e+08 |
4 |
4 |
Andrew Stanton |
7.00319e+08 |
5 |
Then, a new column is added that contains the director’s name for the top 9 ranked directors and “All Others” for the remaining directors. This can be accomplished using the Altair Calculate Transform.
chart = (
alt.Chart(source)
.transform_aggregate(
aggregate_gross='mean(Worldwide_Gross)',
groupby=["Director"],
).transform_window(
rank='row_number()',
sort=[alt.SortField("aggregate_gross", order="descending")],
).transform_calculate(
ranked_director="datum.rank < 10 ? datum.Director : 'All Others'"
)
)
vf.transformed_data(chart, row_limit=12)
Director |
aggregate_gross |
rank |
ranked_director |
|
---|---|---|---|---|
0 |
David Yates |
9.37984e+08 |
1 |
David Yates |
1 |
James Cameron |
8.29781e+08 |
2 |
James Cameron |
2 |
Carlos Saldanha |
7.69293e+08 |
3 |
Carlos Saldanha |
3 |
Pete Docter |
7.31305e+08 |
4 |
Pete Docter |
4 |
Andrew Stanton |
7.00319e+08 |
5 |
Andrew Stanton |
5 |
David Slade |
6.88155e+08 |
6 |
David Slade |
6 |
George Lucas |
6.73577e+08 |
7 |
George Lucas |
7 |
Andrew Adamson |
6.43134e+08 |
8 |
Andrew Adamson |
8 |
Peter Jackson |
5.95566e+08 |
9 |
Peter Jackson |
9 |
Richard Marquand |
5.727e+08 |
10 |
All Others |
10 |
Eric Darnell |
5.66099e+08 |
11 |
All Others |
11 |
Roland Emmerich |
4.5506e+08 |
12 |
All Others |
Finally, this dataset is ready to be encoded as a bar mark:
import altair as alt
import vegafusion as vf
vf.enable()
source = "https://cdn.jsdelivr.net/npm/vega-datasets@v1.29.0/data/movies.json"
chart = (
alt.Chart(source)
.transform_aggregate(
aggregate_gross='mean(Worldwide_Gross)',
groupby=["Director"],
).transform_window(
rank='row_number()',
sort=[alt.SortField("aggregate_gross", order="descending")],
).transform_calculate(
ranked_director="datum.rank < 10 ? datum.Director : 'All Others'"
).mark_bar().encode(
x=alt.X("aggregate_gross:Q", aggregate="mean", title=None),
y=alt.Y(
"ranked_director:N",
sort=alt.Sort(op="mean", field="aggregate_gross", order="descending"),
title=None,
),
)
)
chart
The exact value of each bar can be accessed by applying transformed_data()
to the final chart (which includes the implicit transforms in the bar mark encoding).
vf.transformed_data(chart)
ranked_director |
mean_aggregate_gross |
|
---|---|---|
0 |
David Yates |
9.37984e+08 |
1 |
James Cameron |
8.29781e+08 |
2 |
Carlos Saldanha |
7.69293e+08 |
3 |
Pete Docter |
7.31305e+08 |
4 |
Andrew Stanton |
7.00319e+08 |
5 |
David Slade |
6.88155e+08 |
6 |
George Lucas |
6.73577e+08 |
7 |
Andrew Adamson |
6.43134e+08 |
8 |
Peter Jackson |
5.95566e+08 |
9 |
All Others |
8.87602e+07 |
Datetime Timezone
Datetime columns will be returned in the local timezone returned by the vegafusion.get_local_tz()
function. If not overridden using vegafusion.set_local_tz()
, this will be the local timezone of the Python kernel.
For example:
import vegafusion as vf
import altair as alt
from vega_datasets import data
# Manually set timezone to Seattle's since this a seattle weather
# dataset
vf.set_local_tz("America/Los_Angeles")
source = data.seattle_weather()
chart = alt.Chart(source).mark_bar(
cornerRadiusTopLeft=3,
cornerRadiusTopRight=3
).encode(
x='month(date):O',
y='count():Q',
color='weather:N'
)
chart
tx_df = vf.transformed_data(chart, row_limit=5)
tx_df
weather |
month_date |
__count |
__count_start |
__count_end |
|
---|---|---|---|---|---|
0 |
drizzle |
2012-01-01 00:00:00-08:00 |
10 |
114 |
124 |
1 |
rain |
2012-01-01 00:00:00-08:00 |
35 |
41 |
76 |
2 |
sun |
2012-01-01 00:00:00-08:00 |
33 |
0 |
33 |
3 |
snow |
2012-01-01 00:00:00-08:00 |
8 |
33 |
41 |
4 |
rain |
2012-02-01 00:00:00-08:00 |
40 |
33 |
73 |
tx_df.dtypes
weather object
month_date datetime64[ns, America/Los_Angeles]
__count int64
__count_start int64
__count_end int64
dtype: object
Supported Transforms
Here is the current set of supported Vega-Lite/Vega transforms:
Unsupported Transforms
VegaFusion’s coverage of Vega transforms is not complete, but it is growing with each release. If a chart makes use of a transform that is not yet supported, an error will be raised by the transformed_data()
function.
Note: Charts with unsupported transforms will still render properly using the mime and widget renderers as these transforms will be pushed to the client for evaluation by the Vega JavaScript library.