Python workshops
In this two-day workshop, we will help you get started learning how to program in Python, one of the most popular languages for quick scripts, production software, and doing data science.
Through realistic examples, you’ll be introduced to various fundamental programming concepts, like variables, functions, and control flow. The workshop will be hands-on, with challenging exercises to complete. Unique about this workshop is that we’ll be using JupyterLab, a popular environment to run code interactively and do data science.
This workshop will not only prepare you for more advanced Python workshops but will also provide you with a solid and reliable foundation upon which to base your data science journey.
online or on location
Learn how to accelerate your data analyses using Pandas, a Python library specifically designed for working with medium-sized data sets. Together with JupyterLab it enables a convenient environment for interactive data analysis.
Pandas is part of the so-called PyData ecosystem, and in this workshop we’ll start by providing an overview of PyData and explain where Pandas stands and how it interacts with other libraries such as NumPy and Seaborn. Pandas introduces a few new data structures, most importantly the DataFrame, which are essential to understand how to work with tabular data efficiently.
Pandas offers many features, and in one day, through a good balance of presentation and interactive exercises, we’re going to cover the most important ones, including: importing, filtering, grouping, joining, exploring, and visualising data. By the end of this workshop, you’ll understand the fundamentals of Pandas, be aware of common pitfalls, and be ready to perform your own analyses.
online or on location
The internet is not just a collection of webpages, it’s a gigantic resource of interesting data. Being able to extract that data is a valuable skill. It’s certainly challenging, but with the right knowledge and tools, you’ll be able to leverage a wealth of information for your personal and professional projects.
Imagine building a web scraper that legally gathers information about potential houses to buy, a process that automatically fills in that tedious form to download a report, or a crawler that enriches an existing data set with weather information. In this hands-on workshop we’ll teach you how to accomplish just that using Python and a handful of packages.
You’ll learn about the concepts underlying HTML, CSS selectors, and HTTP requests; and how to inspect those using the developer tools of your browser. We’ll show you how to turn messy HTML into structured data sets, how to automate interacting with dynamic websites and forms, and how to set up crawlers that can traverse thousands or million of websites. Through plenty of exercises you’ll be able to apply this new knowledge to your own projects in no time.
online or on location
Machine learning has become an essential component in many applications and projects that involve data. With the power of Python and the scikit-learn package, this exciting field is no longer exclusive to large companies with extensive research teams. If you use Python, even as a beginner, machine learning applications are limited only by your imagination.
During this workshop, we will take a hands-on approach to learning about machine learning algorithms. Topics include: regression, classification, outlier detection, dimensionality reduction, and clustering. During two days, we’ll explore various algorithms such as linear regression, logistic regression, decision trees, neural networks, and many more.
By the end of this workshop you’ll confidently select and employ machine learning algorithms using Python and scikit-learn. You’ll have gained a new understanding of the inner workings of machine learning algorithms and know how to leverage them to produce valuable results and insights.
online or on location
Apache Spark is an open-source distributed engine for querying and processing data. In this three-day hands-on workshop, you will learn how to leverage Spark from Python to process large amounts of data.
After a presentation of the Spark architecture, we’ll begin manipulating Resilient Distributed Datasets (RDDs) and work our way up to Spark DataFrames. The concept of lazy execution is discussed in detail and we demonstrate various transformations and actions specific to RDDs and DataFrames. You’ll learn how DataFrames can be manipulated using SQL queries.
We’ll show you how to apply supervised machine learning models such as linear regression, logistic regression, decision trees, and random forests. You’ll also see unsupervised machine learning models such as PCA and K-means clustering.
By the end of this workshop, you will have a solid understanding of how to process data using PySpark and you will understand how to use Spark’s machine learning library to build and train various machine learning models.
online or on location
R workshops
R is a statistical environment and programming language that is widely used among statisticians and data scientists to work with data. This one-day workshop will be your guide, taking you through different programming aspects with R.
You will learn to work with powerful R tools and techniques. You’ll be able to boost your productivity with the most popular R packages and tackle data structures such as data frames, lists, and matrices. You’ll see how to create vectors, handle variables, and perform other core functions. You’ll be able to tackle issues with data input/output and will learn to work with strings and dates.
Moving forward, we’ll look into more advanced concepts such as metaprogramming with R and functional programming. Finally, you’ll get a glimpse of R’s data visualization and data manipulation capabilities.
online or on location
In this one-day hands-on workshop, RStudio certified instructor Jeroen Janssens will walk you through the so-called tidyverse to transform data. The tidyverse is an ecosystem of R packages that share an underlying design philosophy, grammar, and data structures.
We’ll start at the beginning, with importing CSV data using readr
and spreadsheets using readxl
. We’ll cover the most important functions from dplyr
and tidyr
for generic data wrangling and cleaning. We’ll also look at dealing with dates, factors, and textual data specifically using the packages lubridate
, forcats
, and stringr
, respectively. Note that this workshop does not cover ggplot2
; for that we recommend our one-day workshop Data Visualisation with R and ggplot2.
By the end of this workshop, you’ll have a good understanding of the tidyverse ecosystem and you’ll be able to apply many of its packages to your own data.
online or on location
In this one-day hands-on workshop, we’re going to have a close look at ggplot2, a widely used R package that implements the so-called grammar of graphics. Its concise and consistent syntax allows you to create high-quality data visualisations in a quick and iterative manner that are suitable for both exploration and communication.
By the end of this workshop you’ll have a solid understanding of the grammar of graphics and how to create data visualisations in R for your daily work. But beware: there’s a good chance you will want to learn more about R.
online or on location
Other popular workshops
The unix command line, although invented decades ago, is an amazing environment for efficiently performing tedious but essential data science tasks. By combining small, powerful, command-line tools (like parallel
, jq
, and csvkit
), you can quickly scrub and explore your data and hack together prototypes.
This hands-on workshop is based on the O’Reilly book Data Science at the Command Line, written by instructor Jeroen Janssens. You’ll learn how to build fast data pipelines, how to leverage R and Python at the command line, and how to quickly visualise data. No prior knowledge about the unix command line is required.
By the end of this workshop you will have a solid understanding of how to integrate the command line in your data science workflow. Even if you’re already comfortable processing data with, for example, R or Python, being able to also leverage the power of the command line can make you a more effective and efficient data scientist.
online or on location
Ask a dozen people what “data science” means, and you get back thirteen different answers. This vagueness is unfortunately accompanied with a lot of hype and misaligned expectations. In this inspiration session, we aim to mitigate this by taking a good look under the hood of data science.
In three hours, we not only explain in clear terms what data science entails, but we also let participants experience what a typical data scientist does by working through a practical use case using a real-world dataset and a programming language such as Python or R. This session is meant for everybody who wants to know what data science is (and isn’t) about. Even when you’ll never intend to work with data yourself, it can be eye-opening to have experienced it. Caution: there’s a chance you’ll want to learn more afterwards!
online or on location
Description will be added soon.
online or on location