reading csv benchmarks

Just a quick set of benchmarks on lisp libraries reading csv files. Note that csv is an underspecified file type. See some specifications at these locations: http://www.creativyst.com/Doc/Articles/CSV/CSV01.htm, http://www.rfc-editor.org/rfc/rfc4180.txt and http://edoceo.com/utilitas/csv-file-format.

Libraries tested: csv-parser, cl-csv, read-csv, fare-csv and cl-simple-table.

Libraries

Quick Summary:

cl-simple-table:read-csv returns a vector of vectors with each element in the row as a string. If the item was a string in the csv file, the item will be enclosed in escaped strings.The rest of the libraries return a list of lists, with each list member as a string.

Of the four libraries, cl-simple-table is by far the fastest. read-csv comes in second and, cl-csv coming in third. Cl-simple-table also wins for memory footprint. Cl-csv comes in second for memory footprint and read-csv coming in third. cl-csv wins for error checking. Neither cl-simple-table nor read-csv triggered error messages on the sample bad data. Fare-csv is probably the most flexible when it comes to specifying data requirements such as types of line feeds, allowing or disallowing binary data, etc.

Using any of these libraries will require that you parse the data to convert it to the data type you need. You will likely also need utility functions to rotate the list, convert to array, etc. They can all use separators other than a comma, although csv-parser is a little more complicated in that regard.

Functions

* Need to change a special variable rather than just passing a non-default parameter.

Error Checking

sample file with missing quotes, uneven number of fields, partial quotes

"Column 1 Row 1",49,"52",24.3,"24.3"

"Column 1, Row 2",49,"52",24.3,"24.3",Geprge

"dfs3s ,34.2,"twenty2a",,nil,

* File was misread

I note that throwing a file with malformed UTF-8 characters at the libraries triggered stream-decoding errors in sbcl before the data even got to the csv libraries.

Time Results

Using sbcl version 1.3.3 on a linux box.

Sample function calls

(defun csv-parser-read (file times)
  (format t "CSV-parser-read ~a ~a" file times)
  (time (dotimes (i times)
          (let ((lst nil))
            (csv-parser:do-csv-file ((filds num-filds) file)
              (add-row-to-csv-list filds lst))))))
(defun cl-csv-read (file times)
  (format t "CL-CSV-read ~a ~a" file times)
  (time (dotimes (i times)
          (with-open-file (s file)
            (cl-csv:read-csv s )))))
(defun read-csv-read (file times)
  (format t "read-csv-read ~a ~a" file times)
  (time (dotimes (i times)
          (with-open-file (s file) (read-csv:parse-csv s)))))
(defun fare-csv-read (file times)
  (format t "fare-csv-read ~a ~a" file times)
  (time (dotimes (i times)
          (with-open-file (s file) (fare-csv:read-csv-stream s)))))
(defun cl-simple-table-read (file times)
  (format t "cl-simple-table-read ~a ~a" file times)
  (time (dotimes (i times)
          (cl-simple-table:read-csv file))))

File Size 500 with 11 Fields 50000 reps

File Size 15,000 with 5 Fields 2000 reps

File Size 157,000 with 82 Fields 250 Reps

File Size 13,260,000 with 8 Fields 10 Reps

Benchmark Results on 1 reads file size 181,132,541

Oops. No library managed to read the file. (Complete county file from https://www.census.gov/econ/cbp/download/). No error messages were thrown.