Convert CSV to Vowpal Wabbit’s Input Format
data:image/s3,"s3://crabby-images/23a6b/23a6bf3b17dd63bb4c1c621e16e4553366e865c5" alt=""
Blog article by Jeroen Janssens.
Mar 29, 2016 • 2 min read.
Mar 29, 2016 • 2 min read.
I’ve created a Python script called csv2vw
which, as the name implies,
converts CSV data to Vowpall Wabbit’s input
format.
csv2vw
is available on GitHub in my dsutils
repository.
data:image/s3,"s3://crabby-images/845d3/845d30f80ec61877f6c1e79caa829053f552db8d" alt="A screenshot of csv2vw applied to the Iris dataset"
Here are some examples to give you an idea of what it can do:
Leave label values as is:
$ csv2vw spam.csv --label target
Relabel values ‘ham’ to 0 and ‘spam’ to 1:
$ csv2vw spam.csv --label target --classes ham,spam
Relabel values ‘ham’ to -1 and ‘spam’ to +1 (needed for logistic loss):
$ csv2vw spam.csv --label target --classes ham,spam --minus-plus-one
Relabel first label value to 0, second to 1, and ignore the rest:
$ csv2vw iris.csv -lspecies --auto-relabel --ignore-extra-classes
Relabel first label value to 1, second to 2, and so on:
$ < iris.csv csv2vw -lspecies --multiclass --auto-relabel
Relabel ‘versicolor’ to 1, ‘virginica’ to 2, and ‘setosa’ to 3:
$ < iris.csv csv2vw -lspecies --multiclass -cversicolor,virginica,setosa
Note that csv2vw
does not support namespaces.
— Jeroen
Would you like to receive an email whenever I have a new blog post, organize an event, or have an important announcement to make? Sign up to my newsletter:
© 2013–2025 Jeroen Janssens