package csv

  1. Overview
  2. Docs
A pure OCaml library to read and write CSV files.

Install

Dune Dependency

Authors

Maintainers

Sources

csv-2.0.tbz
sha256=f7077c3928c3e36b207c06bd2e82625f5a92c480144f48ecc92e16c697d7ddbc
md5=b21dff064ec2151923cce97564688ea5

doc/csv/Csv/index.html

Module CsvSource

Read and write the CSV (comma separated values) format.

This library should be compatible with RFC4180 if one sets strip=false in the creation functions.

  • version 2.0
  • author Richard Jones <rjones\@redhat.com>
  • author Christophe Troestler <Christophe.Troestler\@umons.ac.be>
Sourcetype t = string list list

Representation of CSV data in memory. This is a list of rows (also called records), each row being a list of columns.

Input/output objects

Sourceclass type in_obj_channel = object ... end

The most basic input object for best interoperability.

Sourceclass type out_obj_channel = object ... end

The most basic output object for best interoperability.

Input

Sourceexception Failure of int * int * string

Failure(nrecord, nfield, msg) is raised to indicate a parsing error for the field number nfield on the record number nrecord, the description msg says what is wrong. The first record and the first field of a record are numbered 1 (to correspond to the usual spreadsheet numbering but differing from List.nth of the OCaml representation).

Sourcetype in_channel

Stateful handle to input CSV files.

Sourceval of_in_obj : ?separator:char -> ?strip:bool -> ?has_header:bool -> ?header:string list -> ?backslash_escape:bool -> ?excel_tricks:bool -> ?fix:bool -> in_obj_channel -> in_channel

of_in_obj ?separator ?excel_tricks in_chan creates a new "channel" to access the data in CSV form available from the channel in_chan.

  • parameter separator

    What character the separator is. The default is ','. You should be aware however that, in the countries where comma is used as a decimal separator, Excel will use ';' as the separator.

  • parameter strip

    Whether to remove the white space around unquoted fields. The default is true for backward compatibility reasons.

  • parameter has_header

    tells that the first row of the CSV channel is to be interpreted as a header (this row will not be returned by next). This is useful to use the functions in the Rows module below. Default: false.

  • parameter header

    Supply the header to use for this CSV channel. If both header and has_header are given, the names of header take precedence; if a name in header is "", the one in the CSV header is used. If a name appears twice, only its first occurrence is used.

  • parameter backslash_escape

    Whether to allow \", \n,... in quoted fields. This is used by MySQL for example but is not standard CSV so it is set to false by default.

  • parameter excel_tricks

    enables Excel tricks, namely the fact that '"' followed by '0' in a quoted string means ASCII NULL and the fact that a field of the form ="..." only returns the string inside the quotes. Default: true.

  • parameter fix

    Parses the CSV data without raising the exception Csv.Failure. If the data does not conform to the CSV format (e.g. because of badly escaped quotes), try to repair it. Default: false.

Sourceval of_channel : ?separator:char -> ?strip:bool -> ?has_header:bool -> ?header:string list -> ?backslash_escape:bool -> ?excel_tricks:bool -> ?fix:bool -> Pervasives.in_channel -> in_channel

Same as Csv.of_in_obj except that the data is read from a standard channel.

Sourceval of_string : ?separator:char -> ?strip:bool -> ?has_header:bool -> ?header:string list -> ?backslash_escape:bool -> ?excel_tricks:bool -> ?fix:bool -> string -> in_channel

Same as Csv.of_in_obj except that the data is read from a string.

Sourceval load : ?separator:char -> ?strip:bool -> ?backslash_escape:bool -> ?excel_tricks:bool -> ?fix:bool -> string -> t

load fname loads the CSV file fname. If filename is "-" then load from stdin.

  • parameter separator

    What character the separator is. The default is ','. You should be aware however that, in the countries where comma is used as a decimal separator, Excel will use ';' as the separator.

  • parameter backslash_escape

    Whether to allow \", \n,... in quoted fields. This is used by MySQL for example but is not standard CSV so it is set to false by default.

  • parameter excel_tricks

    enables Excel tricks, namely the fact that '"' followed by '0' in a quoted string means ASCII NULL and the fact that a field of the form ="..." only returns the string inside the quotes. Default: true.

  • parameter fix

    Fix invalid CSV in order to parse it without raising Csv.Failure. See of_in_obj.

Sourceval load_in : ?separator:char -> ?strip:bool -> ?backslash_escape:bool -> ?excel_tricks:bool -> ?fix:bool -> Pervasives.in_channel -> t

load_in ch loads a CSV file from the input channel ch. See Csv.load for the meaning of separator and excel_tricks.

Sourceval to_in_obj : in_channel -> in_obj_channel

For efficiency reasons, the in_channel buffers the data from the original channel. If you want to examine the data by other means than the methods below (say after a failure), you need to use this function in order not to "loose" data in the buffer.

Sourceval close_in : in_channel -> unit

close_in ic closes the channel ic. The underlying channel is closed as well.

Sourceval next : in_channel -> string list

next ic returns the next record in the CSV file.

Sourceval fold_left : f:('a -> string list -> 'a) -> init:'a -> in_channel -> 'a

fold_left f a ic computes (f ... (f (f a r0) r1) ... rN) where r1,...,rN are the records in the CSV file. If f raises an exception, the record available at that moment is accessible through Csv.current_record.

Sourceval fold_right : f:(string list -> 'a -> 'a) -> in_channel -> 'a -> 'a

fold_right f ic a computes (f r1 ... (f rN-1 (f rN a)) ...) where r1,...,rN-1, rN are the records in the CSV file. All records are read before applying f so this method is not convenient if your file is large.

Sourceval iter : f:(string list -> unit) -> in_channel -> unit

iter f ic iterates f on all remaining records. If f raises an exception, the record available at that moment is accessible through Csv.current_record.

Sourceval input_all : in_channel -> t

input_all ic return a list of the CSV records till the end of the file.

Sourceval current_record : in_channel -> string list

The current record under examination. This is useful in order to gather the parsed data in case of Failure.

Sourceval load_rows : ?separator:char -> ?strip:bool -> ?backslash_escape:bool -> ?excel_tricks:bool -> ?fix:bool -> (string list -> unit) -> Pervasives.in_channel -> unit

Output

Sourcetype out_channel
Sourceval to_out_obj : ?separator:char -> ?backslash_escape:bool -> ?excel_tricks:bool -> ?quote_all:bool -> out_obj_channel -> out_channel

to_out_obj ?separator ?excel_tricks out_chan creates a new "channel" to output the data in CSV form.

  • parameter separator

    What character the separator is. The default is ','.

  • parameter backslash_escape

    Prefer to escape the separator in a quoted string with a backslash (e.g. "\"") instead of doubling it. Also backslash-escape '\n', '\r', '\t', '\b', '\026' (as '\Z') and '\000' (as '\0') This is nice for interoperability but is nonstandard CSV to it is set to false by default.

  • parameter excel_tricks

    enables Excel tricks, namely the fact that '\000' is represented as '"' followed by '0' and the fact that a field with leading or trailing spaces or a leading '0' will be encoded as ="..." (to avoid Excel "helping" you). Default: false.

  • parameter quote_all

    force all fields to be quoted, even if this is not required by the CSV specification.

Sourceval to_channel : ?separator:char -> ?backslash_escape:bool -> ?excel_tricks:bool -> ?quote_all:bool -> Pervasives.out_channel -> out_channel

Same as Csv.to_out_obj but output to a standard channel.

Sourceval to_buffer : ?separator:char -> ?backslash_escape:bool -> ?excel_tricks:bool -> ?quote_all:bool -> Buffer.t -> out_channel

Same as Csv.to_out_obj but output to a buffer.

Sourceval close_out : out_channel -> unit

close_out oc close the channel oc. The underlying channel is closed as well.

Sourceval output_record : out_channel -> string list -> unit

output_record oc r write the record r is CSV form to the channel oc.

Sourceval output_all : out_channel -> t -> unit

output_all oc csv outputs all records in csv to the channel oc.

Sourceval save_out : ?separator:char -> ?backslash_escape:bool -> ?excel_tricks:bool -> Pervasives.out_channel -> t -> unit
  • deprecated

    Save string list list to a channel.

Sourceval save : ?separator:char -> ?backslash_escape:bool -> ?excel_tricks:bool -> ?quote_all:bool -> string -> t -> unit

save fname csv saves the csv data to the file fname.

Sourceval print : ?separator:char -> ?backslash_escape:bool -> ?excel_tricks:bool -> ?quote_all:bool -> t -> unit

Print the CSV data.

Sourceval print_readable : t -> unit

Print the CSV data to stdout in a human-readable format. Not much is guaranteed about how the CSV is printed, except that it will be easier to follow than a "raw" output done with Csv.print. This is a one-way operation. There is no easy way to parse the output of this command back into CSV data.

Sourceval save_out_readable : Pervasives.out_channel -> t -> unit

As for Csv.print_readable, allowing the output to be sent to a channel.

Functions to access rows when a header is present

Sourcemodule Row : sig ... end

A row with a header.

Sourcemodule Rows : sig ... end

Accessing rows (when a header was provided).

Functions acting on CSV data loaded in memory

Sourceval lines : t -> int

Return the number of lines in a CSV data.

Sourceval columns : t -> int

Work out the (maximum) number of columns in a CSV file. Note that each line may be a different length, so this finds the one with the most columns.

Sourceval trim : ?top:bool -> ?left:bool -> ?right:bool -> ?bottom:bool -> t -> t

This takes a CSV file and trims empty cells.

All four of the option arguments (~top, ~left, ~right, ~bottom) default to true.

The exact behaviour is:

~right: If true, remove any empty cells at the right hand end of any row. The number of columns in the resulting CSV structure will not necessarily be the same for each row.

~top: If true, remove any empty rows (no cells, or containing just empty cells) from the top of the CSV structure.

~bottom: If true, remove any empty rows from the bottom of the CSV structure.

~left: If true, remove any empty columns from the left of the CSV structure. Note that ~left and ~right are quite different: ~left considers the whole CSV structure, whereas ~right considers each row in isolation.

Sourceval square : t -> t

Make the CSV data "square" (actually rectangular). This pads out each row with empty cells so that all rows are the same length as the longest row. After this operation, every row will have length Csv.columns.

Sourceval is_square : t -> bool

Return true iff the CSV is "square" (actually rectangular). This means that each row has the same number of cells.

Sourceval set_columns : cols:int -> t -> t

set_columns cols csv makes the CSV data square by forcing the width to the given number of cols. Any short rows are padded with blank cells. Any long rows are truncated.

Sourceval set_rows : rows:int -> t -> t

set_rows rows csv makes the CSV data have exactly rows rows by adding empty rows or truncating rows as necessary.

Note that set_rows does not make the CSV square. If you want it to be square, call either Csv.square or Csv.set_columns after.

Sourceval set_size : rows:int -> cols:int -> t -> t

set_size rows cols csv makes the CSV data square by forcing the size to rows * cols, adding blank cells or truncating as necessary. It is the same as calling set_columns cols (set_rows rows csv)

Sourceval sub : r:int -> c:int -> rows:int -> cols:int -> t -> t

sub r c rows cols csv returns a subset of csv. The subset is defined as having top left corner at row r, column c (counting from 0) and being rows deep and cols wide.

The returned CSV will be "square".

Sourceval compare : t -> t -> int

Compare two CSV files for equality, ignoring blank cells at the end of a row, and empty rows appended to one or the other. This is "semantic" equality - roughly speaking, the two CSV files would look the same if opened in a spreadsheet program.

Sourceval concat : t list -> t

Concatenate CSV files so that they appear side by side, arranged left to right across the page. Each CSV file (except the final one) is first squared.

(To concatenate CSV files so that they appear from top to bottom, just use List.concat).

Sourceval transpose : t -> t

Permutes the lines and columns of the CSV data. Nonexistent cells become empty cells after transpose if they must be created.

Sourceval to_array : t -> string array array
Sourceval of_array : string array array -> t

Convenience functions to convert to and from a matrix representation. to_array will produce a ragged matrix (not all rows will have the same length) unless you call Csv.square first.

Sourceval associate : string list -> t -> (string * string) list list

associate header data takes a block of data and converts each row in turn into an assoc list which maps column header to data cell.

Typically a spreadsheet will have the format:

  header1   header2   header3
  data11    data12    data13
  data21    data22    data23
  ...

This function arranges the data into a more usable form which is robust against changes in column ordering. The output of the function is:

  [ ["header1", "data11"; "header2", "data12"; "header3", "data13"];
    ["header1", "data21"; "header2", "data22"; "header3", "data23"];
    etc. ]

Each row is turned into an assoc list (see List.assoc).

If a row is too short, it is padded with empty cells (""). If a row is too long, it is truncated.

You would typically call this function as:

let header, data = match csv with h :: d -> h, d | [] -> assert false;;
let data = Csv.associate header data;;

The header strings are shared, so the actual space in memory consumed by the spreadsheet is not much larger.

Sourceval combine : header:string list -> string list -> (string * string) list

combine ~header row returns a row with elements (h, x) where h is the header name and x the corresponding row entry. If the row has less entries than header, they are interpreted as being empty. See associate which applies this function to all rows.

Sourceval map : f:(string -> string) -> t -> t

map f csv applies f to all entries of csv and returns the resulting CSV.

OCaml

Innovation. Community. Security.