Heuristics for Translating Ggplot2 Code to Plotnine Code
Blog article by Jeroen Janssens.
Dec 13, 2019 • 6 min read.
Because ggplot2 is the de-facto package for creating high-quality data
visualizations in R, and has been for a long time, there exists many
excellent resources for learning ggplot2, including:
Two days ago, I published the tutorial Plotnine: Grammar of Graphics
for Python, which is a translation of the visualization
chapters from “R for Data Science” to Python using plotnine and pandas.
plotnine code is bound to be different from ggplot2 code, due to Python
and R having different syntax and mechanics. Moreover, since plotnine is
still young (but actively being developed) some features are not yet
implemented.
Does that mean we cannot make use of the above-mentioned resources? Of
course not! First of all, the underlying grammar of graphics is still
the same. Secondly, when it comes to the syntax, you can easily
translate 95% of ggplot2 code to plotnine code if you take into account
the heuristics listed below. But first, an example.
Change boolean values, i.e., replace TRUE with True and FALSE
with False.
Replace NULL with None.
Quote all column names, e.g., replace Species with "Species".
Python unfortunately doesn’t have this thing called non-standard
evaluation.
Remove spaces around equal signs, e.g., replace mapping = aes(...)
with mapping=aes(...). Style is important.
Replace the assignment operator, i.e., <- with =.
Replace dots with underscores, e.g., replace show.legend with
show_legend. In Python, names cannot contain dots.
Replace hjust and vjust with ha and va, respectively. This is
inherited from matplotlib, which is used under the hood by plotnine.
If the code consists of multiple lines, add a continuation character,
i.e., replace + with +\. Alternatively, wrap the entire expression
in parentheses.
Miscellaneous
Quote inline expressions in its entirety, such as "factor(col)" and
"col < 5".
Quote the facet specification in its entirety, such as
facet_wrap("~ class") and facet_grid("drv ~ cyl").
To suppress labels you cannot use labels=None but you need to pass a
list with as many empty strings as there are values. A helper function
is useful here:
defno_labels(values): return[""]*len(values)
To prevent text labels from overlapping in ggplot2, you would use
geom_text_repel or geom_label_repel functions from the ggrepel
package. In plotnine, you simply use geom_text or geom_label and
specify the adjust_text argument. For example:
geom_label(adjust_text={'expand_points': (1.5, 1.5), 'arrowprops': {'arrowstyle': '-'}}).
Features not yet implemented
Unlike with ggplot2, in plotnine you cannot assign literal values to
your aesthetics; all values need to refer column names. For example,
aes(color="blue") results in an error if blue is not a column in
the DataFrame.
plotnine is currently missing the following functions:
coord_quickmap() and coord_polar().
The function labs() does not support a subtitle or a caption.
Let me know if you think anything can be added to (or removed from!)
this list of heuristics. Now go plot!
— Jeroen
Would you like to receive an email whenever I have a new blog post, organize an event, or have an important announcement to make? Sign up to my newsletter: