Convert CSV to Vowpal Wabbit’s Input Format
Blog article by Jeroen Janssens.
Mar 29, 2016 • 2 min read
.
I’ve created a Python script called csv2vw
which, as the name implies,
converts CSV data to Vowpall Wabbit’s input
format .
csv2vw
is available on GitHub in my dsutils
repository .
A screenshot of csv2vw applied to the Iris dataset
Here are some examples to give you an idea of what it can do:
Leave label values as is:
$ csv2vw spam.csv --label target
Relabel values ‘ham’ to 0 and ‘spam’ to 1:
$ csv2vw spam.csv --label target --classes ham,spam
Relabel values ‘ham’ to -1 and ‘spam’ to +1 (needed for logistic loss):
$ csv2vw spam.csv --label target --classes ham,spam --minus-plus-one
Relabel first label value to 0, second to 1, and ignore the rest:
$ csv2vw iris.csv -lspecies --auto-relabel --ignore-extra-classes
Relabel first label value to 1, second to 2, and so on:
$ < iris.csv csv2vw -lspecies --multiclass --auto-relabel
Relabel ‘versicolor’ to 1, ‘virginica’ to 2, and ‘setosa’ to 3:
$ < iris.csv csv2vw -lspecies --multiclass -cversicolor,virginica,setosa
Note that csv2vw
does not support namespaces.
— Jeroen
Would you like to receive an email whenever I have a new blog post, organize an event, or have an important announcement to make? Sign up to my newsletter:
© 2013–2024 Jeroen Janssens