Open Source

R Packages

  • tmuxr tmuxr. An R package for managing tmux and interacting with the processes it runs. It features a pipeable API with which you can create, control, and capture tmux sessions, windows, and panes.
  • rexpect rexpect. An R package that allows you to automate interactions with programs that expose a text terminal interface. The API is inspired by the original Expect tool by Don Libes. Programs are optionally run inside a Docker container. Sessions can be recorded using asciinema.
  • knitractive knitractive. An R package that provides a knitr engine which allows you to simulate interactive sessions (e.g., Python console, Bash shell) across multiple code chunks. Interactive sessions are run inside a tmux session through the tmuxr and rexpect packages.
  • raylibr raylibr. An R package that wraps Raylib, a simple and easy-to-use library to enjoy videogames programming. Features real-time 2D & 3D graphics, keyboard & mouse interactivity, music, sound effects, and shaders. Presented at NYR Conference 2022.
  • rush. Run R expressions, create ggplot2 visualizations, and install R packages directly from the shell.

Python Packages

  • SOSscikit-sos. A Python package for Stochastic Outlier Selection (SOS), compatible with scikit-learn. SOS is an unsupervised outlier selection algorithm that uses the concept of affinity to compute an outlier probability for each data point. SOS is explained and compared to other algorithms in Chapter 4 of my PhD thesis. It’s also available as part of the PyOD package.
  • sample-stream. Filter lines from standard input according to some probability, with a given delay, and for a certain duration. Useful for debugging pipelines that consume streaming data at a high rate.

Miscellaneous Projects and Scripts

  • Data Science Toolbox Data Science Toolbox. A batteries-included Docker image for polyglot data scientists. Based on Packer, Ansible, and Docker. Includes Python, R, many packages, and command-line tools such as jq, xmlstarlet, parallel, and xsv.
  • dsutils. A collection of command-line tools for working with data.
  • Embrace the Command Line. Archive of my three-week online course Embrace the Command Line.
  • tidytree and tidynaivebayes. Understandable but slow implementations in R of a Decision Tree classifier and a Naive Bayes classifier, respectively.
  • cache.R. Cache the result of an expression in R. The discussion is at least as interesting as the code itself.
  • Generate iTerm Key Mappings with Python. Discussed in Scripting iTerm Key Mappings.

Various Contributions

  • pola-rs/polars. Polars is a DataFrame interface on top of an OLAP Query Engine implemented in Rust. I contributed a fix to file globbing.
  • pola-rs/tpch. Runs the TPC-standardised benchmark suite to evaluate the performance of Polars, Pandas, Dask, DuckDB, and Spark. I implemented a Python script that creates a dot plot of the results using Plotnine.
  • has2k1/plotnine. Plotnine is an implementation of a grammar of graphics in Python based on ggplot2. I made a small fix that allowed the alignment of text to be based on data. This was needed for the blog post Plotnine: Grammar of Graphics for Python.
  • wireservice/csvkit. A suite of utilities for converting to and working with CSV, the king of tabular file formats. I extended csvsql such that it can execute SQL queries directly on CSV files. Mentioned in Data Science at the Command Line.
  • takluyver/bash_kernel. A Jupyter kernel for Bash. I added the ability to show inline images. More information in the blog post IBash Notebook.
  • ohmyzsh/ohmyzsh. Oh My Zsh is an open source, community-driven framework for managing your zsh configuration. I contributed the jump plugin, which allows you to easily jump around the file system.
  • r-lib/pkgdown. Easily generate a static website for an R package. I added a fix that ensures example code inside \dontshow{} is not skipped.
  • rstudio/concept-maps. A collection of mental models used in introductory data science lessons. I added a concept map for the pipe operator (%>%) that I created as part of the RStudio Instructor Training.
  • hadley/r4ds. Contains the source of the book R for Data Science by Hadley Wickham, Mine Çetinkaya-Rundel, and Garrett Grolemund. I fixed a typo because I was curious to see how the process of editing an open source book works. This miniscule contribution still got me mentioned in the acknowledgments.
  • jehiah/json2csv. A command-line tool, written in Go, that converts a stream of newline-separated JSON data to CSV format. I added support for nested fields.
© 2013–2024  Jeroen Janssens