About
Jeroen Janssens is an independent data science consultant and an RStudio-certified instructor. He enjoys visualizing data, implementing machine learning models, and building solutions using Python, R, JavaScript, and Bash. He’s passionate about helping others to do such things.
Jeroen runs Data Science Workshops, a training and coaching firm that organizes open enrollment workshops, in-company courses, inspiration sessions, hackathons, and meetups. Clients include Amazon, DPD, eHealth Africa, KPN, Schiphol Airport, and The New York Times.
Previously, he was an assistant professor at Jheronimus Academy of Data Science and a data scientist at Elsevier in Amsterdam and various startups in New York City. He is the author of Data Science at the Command Line, published by O’Reilly Media. Jeroen holds a PhD in machine learning from Tilburg University and an MSc in artificial intelligence from Maastricht University.
He lives with his wife and two kids in Rotterdam, the Netherlands. For more information about Jeroen’s experience and education, download his CV.
Contact
Jeroen is available to provide consulting and training in the areas of data science, data engineering, and machine learning. He’s also available to speak at private and public events. If you would like to know more about his services, fees, and availability, then please email Jeroen. You can also find him on Twitter, GitHub, and LinkedIn.
Projects
Data Science at the Command Line. Currently writing the second edition.
Data Science Toolbox. Complete environments for busy polyglot data scientists.
tmuxr. An R package for managing tmux and interacting with the processes it runs.
scikit-sos. A Python implementation of the Stochastic Outlier Selection algorithm.
Talks
Visualizing High-Dimensional Data with Python. O’Reilly Live Training. August 17, 2020.
Scalable Anomaly Detection With Spark and SOS. Strata Data Conference. New York, NY. September 26, 2019.
Zo Blijft je Data Science Team Scherp en Geïnspireerd. Big Data Expo. Utrecht, the Netherlands. September 19, 2019.
Interview met Transavia, DPD en Data Science Workshops. Studio Data at Big Data Expo. Utrecht, the Netherlands. September 19, 2019.
50 Reasons to Learn the Shell for Doing Data Science. Strata Data Conference. New York, NY. September 13, 2018.
Data Science with Unix Power Tools. NLUUG Spring Conference. Utrecht, the Netherlands. May 15, 2018.
Everybody Can Knit With {knitractive}. amst-R-dam. Amsterdam, the Netherlands. March 1, 2018.
SE-Radio Episode 315: Jeroen Janssens on Tools for Data Science. Interview with Felienne Hermans for Software Engineering Radio. January 23, 2018.
Python for Data Science. TU Delft. Delft, the Netherlands. October 16, 2017.
Create Interactive Maps in Seconds with R and Leaflet. Strata Data Conference. London, UK. May 24, 2017.
Crunching Data at the Command Line. Crunch Data Conference. Budapest, Hungary. October 5, 2016.
The Polyglot Data Scientist. New York Open Statistical Programming Meetup. New York, NY. June 23, 2016.
The Polyglot Data Scientist. Strata + Hadoop World. London, UK. June 2, 2016.
Vowpal Wabbit: The Essence of Speed in Machine Learning. Strata + Hadoop World. San Jose, CA. March 31, 2016.
Poor Man’s Parallel Pipelines. Strata + Hadoop World. London, UK. May 7, 2015.
Data Science Toolbox and the Importance of Reproducible Research. Strata + Hadoop World. Barcelona, Spain. November 20, 2014.
Predicting at the Command Line. 1st International Conference on Predictive APIs and Apps. Barcelona, Spain. November 17, 2014.
Building a Data Science Toolbox. Data Science London Meetup. London, UK. April 10, 2014.
Obtaining, Scrubbing, and Exploring Data at the Command Line. New York Open Statistical Programming Meetup. New York, NY. January 29, 2014.
Sudo Make Me a Visualization! Strata Ignite. New York, NY. October 28, 2013.
Algorithms for Outlier Selection and One-Class Classification. NYC Machine Learning. New York, NY. November 21, 2013.
Publications
Data Science from the Shell. The command line is a great environment for inspecting a dataset, automating data science tasks, and more. Hit the ground running with this playlist. June 12, 2020.
Heuristics for Translating Ggplot2 Code to Plotnine Code. Leverage existing ggplot2 resources to produce high-quality data visualisations in Python. December 13, 2019.
Plotnine: Grammar of Graphics for Python. A translation of the visualisation chapters from “R for Data Science” to Python using Plotnine and Pandas. December 11, 2019.
Dimensionality Reduction at the Command Line. Introducing Tapkee, an efficient command-line tool and C++ library for linear and nonlinear dimensionality reduction. June 8, 2015.
Anomalies, Concerts, and The Command Line. Data Science Weekly interviews Jeroen Janssens. May 18, 2015.
IBash Notebook‽ A Bash kernel for Jupyter Notebook. Now with inline images. February 19, 2015.
Data Science at the Command Line. Published by O’Reilly. October 1, 2014.
Lean, Mean Data Science Machine. A virtual environment that enables you to get up and running quickly. December 7, 2013.
Stochastic Outlier Selection. An algorithm for detecting anomalous patterns. Includes a demo and a Python implementation. November 24, 2013.
7 Command-Line Tools for Data Science. Obtain, scrub, and explore data with jq, json2csv, csvkit, scrape, xml2json, sample, and Rio. September 19, 2013.
Quickly Navigate your Filesystem from the Command Line. Bookmark and jump to important directories using symbolic links. August 16, 2013.
Outlier Selection and One-Class Classification. PhD thesis. Supervised by Eric Postma and Jaap van den Herik. Tilburg University, June 11, 2013.
Ranking Images on Semantic Attributes using Human Computation. Computational Social Science and the Wisdom of Crowds (NIPS 2010). Whistler, Canada. October 8, 2010.
Outlier Detection with One-Class Classifiers from ML and KDD. International Conference on Machine Learning and Applications. Miami, FL. December 13, 2009.