
R Packages

  • tmuxr tmuxr. An R package for managing tmux and interacting with the processes it runs. It features a pipeable API with which you can create, control, and capture tmux sessions, windows, and panes.
  • rexpect rexpect. An R package that allows you to automate interactions with programs that expose a text terminal interface. The API is inspired by the original Expect tool by Don Libes. Programs are optionally run inside a Docker container. Sessions can be recorded using asciinema.
  • knitractive knitractive. An R package that provides a knitr engine which allows you to simulate interactive sessions (e.g., Python console, Bash shell) across multiple code chunks. Interactive sessions are run inside a tmux session through the tmuxr and rexpect packages.
  • raylibr raylibr. An R package that wraps Raylib, a simple and easy-to-use library to enjoy videogames programming. Features real-time 2D & 3D graphics, keyboard & mouse interactivity, music, sound effects, and shaders. Presented at NYR Conference 2022.
  • rush. Run R expressions, create ggplot2 visualizations, and install R packages directly from the shell.

Python Packages

  • SOSscikit-sos. A Python implementation of the Stochastic Outlier Selection (SOS) algorithm. The algorithm is covered in Chapter 4 of my PhD thesis. SOS is also available in the PyOD package.
  • sample-stream. Filter lines from standard input according to some probability, with a given delay, and for a certain duration.

Miscellaneous Projects and Scripts

  • Data Science Toolbox Data Science Toolbox. A batteries-included Docker image for polyglot data scientists. Based on Packer, Ansible, and Docker. Includes Python, R, many packages, and command-line tools such as jq, xmlstarlet, parallel, and xsv.
  • dsutils. A collection of command-line tools for working with data.
  • Embrace the Command Line. Archive of my three-week online course Embrace the Command Line.
  • tidytree and tidynaivebayes. Understandable but slow implementations in R of a Decision Tree classifier and a Naive Bayes classifier, respectively.
  • cache.R. Cache the result of an expression in R. The discussion is at least as interesting as the code itself.
  • Generate iTerm Key Mappings with Python. Discussed in Scripting iTerm Key Mappings.

Various Contributions

  • pola-rs/polars. Polars is a DataFrame interface on top of an OLAP Query Engine implemented in Rust. I contributed a fix to file globbing.
  • pola-rs/tpch. Runs the TPC-standardised benchmark suite to evaluate the performance of Polars, Pandas, Dask, DuckDB, and Spark. I implemented a Python script that creates a dot plot of the results using Plotnine.
  • has2k1/plotnine. Plotnine is an implementation of a grammar of graphics in Python based on ggplot2. I made a small fix that allowed the alignment of text to be based on data. This was needed for the blog post Plotnine: Grammar of Graphics for Python.
  • wireservice/csvkit. A suite of utilities for converting to and working with CSV, the king of tabular file formats. I extended csvsql such that it can execute SQL queries directly on CSV files. Mentioned in Data Science at the Command Line.
  • takluyver/bash_kernel. A Jupyter kernel for Bash. I added the ability to show inline images. More information in the blog post IBash Notebook.
  • ohmyzsh/ohmyzsh. Oh My Zsh is an open source, community-driven framework for managing your zsh configuration. I contributed the jump plugin, which allows you to easily jump around the file system.
  • r-lib/pkgdown. Easily generate a static website for an R package. I added a fix to ensure that example code inside \dontshow{} is not skipped.
  • rstudio/concept-maps. A collection of mental models used in introductory data science lessons. I added a concept map for the pipe operator (%>%) that I created as part of the RStudio Instructor Training.
  • hadley/r4ds. Contains the source of the book R for Data Science by Hadley Wickham, Mine Çetinkaya-Rundel, and Garrett Grolemund. I fixed a typo because I was curious to see how the process of editing an open source book works. This miniscule contribution still got me mentioned in the acknowledgments.
  • jehiah/json2csv. A command-line tool, written in Go, that converts a stream of newline-separated JSON data to CSV format. I added support for nested fields.
