Convert CSV to Vowpal Wabbit’s Input Format

Jeroen Janssens
Mar 29, 2016 • 2 min read

I’ve created a Python script called csv2vw which, as the name implies, converts CSV data to Vowpall Wabbit’s input format. csv2vw is available on GitHub in my dsutils repository.

A screenshot of csv2vw applied to the Iris dataset
A screenshot of csv2vw applied to the Iris dataset

Here are some examples to give you an idea of what it can do:

Leave label values as is:

$ csv2vw spam.csv --label target

Relabel values ‘ham’ to 0 and ‘spam’ to 1:

$ csv2vw spam.csv --label target --classes ham,spam

Relabel values ‘ham’ to -1 and ‘spam’ to +1 (needed for logistic loss):

$ csv2vw spam.csv --label target --classes ham,spam --minus-plus-one

Relabel first label value to 0, second to 1, and ignore the rest:

$ csv2vw iris.csv -lspecies --auto-relabel --ignore-extra-classes

Relabel first label value to 1, second to 2, and so on:

$ < iris.csv csv2vw -lspecies --multiclass --auto-relabel

Relabel ‘versicolor’ to 1, ‘virginica’ to 2, and ‘setosa’ to 3:

$ < iris.csv csv2vw -lspecies --multiclass -cversicolor,virginica,setosa

Note that csv2vw does not support namespaces.

— Jeroen


Would you like to receive an email whenever I have a new blog post, organize an event, or have an important announcement to make? Sign up to my newsletter:
© 2013–2024  Jeroen Janssens