Data Science Workshops with Jeroen Janssens

Python workshops

In this two-day workshop, we will help you get started learning how to program in Python, one of the most popular languages for quick scripts, production software, and doing data science.

Through realistic examples, you’ll be introduced to various fundamental programming concepts, like variables, functions, and control flow. The workshop will be hands-on, with challenging exercises to complete. Unique about this workshop is that we’ll be using JupyterLab, a popular environment to run code interactively and do data science.

This workshop will not only prepare you for more advanced Python workshops but will also provide you with a solid and reliable foundation upon which to base your data science journey.

2 days
online or on location

Data Analysis with Python and Pandas

Learn how to accelerate your data analyses using Pandas, a Python library specifically designed for working with medium-sized data sets. Together with JupyterLab it enables a convenient environment for interactive data analysis.

Pandas is part of the so-called PyData ecosystem, and in this workshop we’ll start by providing an overview of PyData and explain where Pandas stands and how it interacts with other libraries such as NumPy and Seaborn. Pandas introduces a few new data structures, most importantly the DataFrame, which are essential to understand how to work with tabular data efficiently.

Pandas offers many features, and in one day, through a good balance of presentation and interactive exercises, we’re going to cover the most important ones, including: importing, filtering, grouping, joining, exploring, and visualising data. By the end of this workshop, you’ll understand the fundamentals of Pandas, be aware of common pitfalls, and be ready to perform your own analyses.

1 day
online or on location

Web Scraping and Crawling with Python

The internet is not just a collection of webpages, it’s a gigantic resource of interesting data. Being able to extract that data is a valuable skill. It’s certainly challenging, but with the right knowledge and tools, you’ll be able to leverage a wealth of information for your personal and professional projects.

Imagine building a web scraper that legally gathers information about potential houses to buy, a process that automatically fills in that tedious form to download a report, or a crawler that enriches an existing data set with weather information. In this hands-on workshop we’ll teach you how to accomplish just that using Python and a handful of packages.

You’ll learn about the concepts underlying HTML, CSS selectors, and HTTP requests; and how to inspect those using the developer tools of your browser. We’ll show you how to turn messy HTML into structured data sets, how to automate interacting with dynamic websites and forms, and how to set up crawlers that can traverse thousands or million of websites. Through plenty of exercises you’ll be able to apply this new knowledge to your own projects in no time.

1 day
online or on location

Machine Learning with Python

Machine learning has become an essential component in many applications and projects that involve data. With the power of Python and the scikit-learn package, this exciting field is no longer exclusive to large companies with extensive research teams. If you use Python, even as a beginner, machine learning applications are limited only by your imagination.

During this workshop, we will take a hands-on approach to learning about machine learning algorithms. Topics include: regression, classification, outlier detection, dimensionality reduction, and clustering. During two days, we’ll explore various algorithms such as linear regression, logistic regression, decision trees, neural networks, and many more.

By the end of this workshop you’ll confidently select and employ machine learning algorithms using Python and scikit-learn. You’ll have gained a new understanding of the inner workings of machine learning algorithms and know how to leverage them to produce valuable results and insights.

2 days
online or on location

Data Science with Python and Spark

Apache Spark is an open-source distributed engine for querying and processing data. In this three-day hands-on workshop, you will learn how to leverage Spark from Python to process large amounts of data.

After a presentation of the Spark architecture, we’ll begin manipulating Resilient Distributed Datasets (RDDs) and work our way up to Spark DataFrames. The concept of lazy execution is discussed in detail and we demonstrate various transformations and actions specific to RDDs and DataFrames. You’ll learn how DataFrames can be manipulated using SQL queries.

We’ll show you how to apply supervised machine learning models such as linear regression, logistic regression, decision trees, and random forests. You’ll also see unsupervised machine learning models such as PCA and K-means clustering.

By the end of this workshop, you will have a solid understanding of how to process data using PySpark and you will understand how to use Spark’s machine learning library to build and train various machine learning models.

3 days
online or on location

R workshops

Programming in R

R is a statistical environment and programming language that is widely used among statisticians and data scientists to work with data. This one-day workshop will be your guide, taking you through different programming aspects with R.

You will learn to work with powerful R tools and techniques. You’ll be able to boost your productivity with the most popular R packages and tackle data structures such as data frames, lists, and matrices. You’ll see how to create vectors, handle variables, and perform other core functions. You’ll be able to tackle issues with data input/output and will learn to work with strings and dates.

Moving forward, we’ll look into more advanced concepts such as metaprogramming with R and functional programming. Finally, you’ll get a glimpse of R’s data visualization and data manipulation capabilities.

1 day
online or on location

Transforming Data with R and the Tidyverse

In this one-day hands-on workshop, RStudio certified instructor Jeroen Janssens will walk you through the so-called tidyverse to transform data. The tidyverse is an ecosystem of R packages that share an underlying design philosophy, grammar, and data structures.

We’ll start at the beginning, with importing CSV data using readr and spreadsheets using readxl. We’ll cover the most important functions from dplyr and tidyr for generic data wrangling and cleaning. We’ll also look at dealing with dates, factors, and textual data specifically using the packages lubridate, forcats, and stringr, respectively. Note that this workshop does not cover ggplot2; for that we recommend our one-day workshop Data Visualisation with R and ggplot2.

By the end of this workshop, you’ll have a good understanding of the tidyverse ecosystem and you’ll be able to apply many of its packages to your own data.

1 day
online or on location

Data Visualization with R and ggplot2

In this one-day hands-on workshop, we’re going to have a close look at ggplot2, a widely used R package that implements the so-called grammar of graphics. Its concise and consistent syntax allows you to create high-quality data visualisations in a quick and iterative manner that are suitable for both exploration and communication.

By the end of this workshop you’ll have a solid understanding of the grammar of graphics and how to create data visualisations in R for your daily work. But beware: there’s a good chance you will want to learn more about R.

1 day
online or on location

Other popular workshops

Data Science at the Command Line

The unix command line, although invented decades ago, is an amazing environment for efficiently performing tedious but essential data science tasks. By combining small, powerful, command-line tools (like parallel, jq, and csvkit), you can quickly scrub and explore your data and hack together prototypes.

This hands-on workshop is based on the O’Reilly book Data Science at the Command Line, written by instructor Jeroen Janssens. You’ll learn how to build fast data pipelines, how to leverage R and Python at the command line, and how to quickly visualise data. No prior knowledge about the unix command line is required.

By the end of this workshop you will have a solid understanding of how to integrate the command line in your data science workflow. Even if you’re already comfortable processing data with, for example, R or Python, being able to also leverage the power of the command line can make you a more effective and efficient data scientist.

2 days
online or on location

Under the Hood of Data Science

Ask a dozen people what “data science” means, and you get back thirteen different answers. This vagueness is unfortunately accompanied with a lot of hype and misaligned expectations. In this inspiration session, we aim to mitigate this by taking a good look under the hood of data science.

In three hours, we not only explain in clear terms what data science entails, but we also let participants experience what a typical data scientist does by working through a practical use case using a real-world dataset and a programming language such as Python or R. This session is meant for everybody who wants to know what data science is (and isn’t) about. Even when you’ll never intend to work with data yourself, it can be eye-opening to have experienced it. Caution: there’s a chance you’ll want to learn more afterwards!

3 hours
online or on location

Version Control with Git and GitHub

Description will be added soon.

1 day
online or on location

About Jeroen

Jeroen Janssens, PhD, is a data science consultant and certified instructor. His expertise lies in visualizing data, implementing machine learning models, and building solutions using Python, R, JavaScript, and Bash. He’s passionate about helping and teaching others to do such things.

Since 2013, Jeroen runs Data Science Workshops, a training and coaching firm that organizes open enrollment workshops, in-company courses, inspiration sessions, hackathons, and meetups. Clients include Amazon, eHealth Africa, Schiphol Airport, The New York Times, and T-Mobile.

Previously, he was an assistant professor at Jheronimus Academy of Data Science and a data scientist at Elsevier in Amsterdam and various startups in New York City. He is the author of Data Science at the Command Line (O’Reilly Media, 2021). Jeroen holds a PhD in machine learning from Tilburg University and an MSc in artificial intelligence from Maastricht University.

He lives with his wife and two kids in Rotterdam, the Netherlands.
If you would like to know more about his services, fees, and availability, then please email Jeroen. You can also find him on Twitter, GitHub, and LinkedIn.