Books
Python Polars: The Definitive Guide
With Thijs Nieuwdorp. Expected to be published by O’Reilly Media in February 2025. Foreword by Ritchie Vink.
data:image/s3,"s3://crabby-images/a8027/a802754ee7a6026d48a44f3861e7d8a03fb99054" alt=""
Unlock the power of Polars, a Python package for transforming, analyzing, and visualizing data. In this hands-on guide, Jeroen Janssens and Thijs Nieuwdorp walk you through every feature of Polars, showing you how to use it for real-world tasks like data wrangling, exploratory data analysis, building pipelines, and more.
Whether you’re a seasoned data professional or new to data science, you’ll quickly master Polars’ expressive API and its underlying concepts. You don’t need to have experience with pandas, but if you do, this book will help you make a seamless transition. The many practical examples and real-world datasets are available on GitHub, so you can easily follow along.
Did you know? Thijs and I used to be colleagues at Xomnia, the very birthplace of Polars!
Data Science at the Command Line
Second edition. Published by O’Reilly Media in October 2021. Foreword by Tim O’Reilly.
data:image/s3,"s3://crabby-images/1ad4a/1ad4ac1e5e0cb5647fe609517ffc6d8decb95dfa" alt=""
This thoroughly revised guide demonstrates how the flexibility of the command line can help you become a more efficient and productive data scientist. You’ll learn how to combine small yet powerful command-line tools to quickly obtain, scrub, explore, and model your data. To get you started, author Jeroen Janssens provides a Docker image packed with over 100 Unix power tools—useful whether you work with Windows, macOS, or Linux. You’ll quickly discover why the command line is an agile, scalable, and extensible technology. Even if you’re comfortable processing data with Python or R, you’ll learn how to greatly improve your data science workflow by leveraging the command line’s power. This book is ideal for data scientists, analysts, engineers, system administrators, and researchers.
- Read Data Science at the Command Line online
- Buy a physical copy of Data Science at the Command Line at Amazon
Did you know? The first edition, which came out in 2014, eventually led me to start Data Science Workshops. You can read more about this in the Acknowledgments.
Outlier Selection and One-Class Classification
My PhD thesis. Defended on June 11, 2013 at Tilburg University, the Netherlands.
data:image/s3,"s3://crabby-images/f51c8/f51c8ad984143c84c52cbe2fe970fdac259ef445" alt=""
What is common in a terrorist attack, a forged painting, and a rotten apple? The answer is: all three are anomalies; they are real-world observations that deviate from what is considered to be normal. Detecting anomalies is of utmost importance because an undetected anomaly can be dangerous or expensive. A human domain expert may suffer from three cognitive limitations: fatigue, information overload, and emotional bias. The cognitive limitations will hamper the detection of anomalies. Outlier-selection and one-class classification algorithms are capable of automatically classifying data points as outliers in large amounts of data. In this thesis we study to what extent outlier-selection and one-class classification algorithms can support domain experts with real-world anomaly detection.
Did you know? The Stochastic Outlier Selection algorithm, which is covered in Chapter 4, is available in the PyOD package for Python.