# CSV (siː ɛs viː)

Comma Seperated Values (CSV) is a format used to store data within a file in the form of rows and columns for data exchange.

This post is to show some of the nuances when dealing with CSV files.

The principle is very simple, but let us define the components of a CSV file:

• Field - The container that holds the value.
• Delimiter - What seperates each field on the same row (,).
• Quote Character - The character used to wrap a field containing special characters.
• Line Terminator - The character used to determing the end of a row of data (\n)

A common CSV file may look like this:

sample.csv

Upon parsing, we can produce the following table:

First Name Last Name Favourite Colour
Riley Meyer Blue

Here’s a screenshot of the file after being imported into Excel:

In this sample, the delimiter is the comma (,) character, and each field is the data inbetween.

But what happens when we get a little creative? Lets add a quote column to our sample file.

sample incorrect.csv

What we want is the following:

First Name Last Name Quote Favourite Colour
Riley Meyer To be, or not to be, that is the question Blue
Bradley Gibson Aspire to inspire before we expire. Green

However this is how the data is parsed in Excel:

Note how it picks up the next field from using the comma within the quote.

A similar result happens in LibreOffice Calc:

One solution, is to change the delimiter character to something other than a comma. A character that is not used among the data set, such as a pipe (|). This may work if you are certain this delimiter character is not among the dataset, which may not always be the case, especially when dealing with data that has been entered by a human; chat logs, Twitter/Instagram biographies, etc.

The solution is to wrap the fields containing special characters that are used to structure the file with a quote charatcer such as double quotes (").

Here’s the revised CSV file:

sample corrected.csv

Here is the revised CSV file when imported into Excel:

And here it is imported into LibreOffice Calc:

But what happens if the field we are wrapping in the quote character, contains said quote character? Well, you just escape it by prepending it with the quote character.

sample corrected with quotes.csv

Here it is imported into Excel:

And in Calc:

See how the Double quotes are escaped when parsed. The integrity of the data is maintained.

Wrapping a field in a quotation character is not limited to being able to use the delimiter character, but all special characters: new lines, tabs, commas, quotes, even a null terminator.

sample corrected with quotes and newlines.csv

Imported into Excel:

And in Calc:

See how content of the quote field is maintained. The excel preview doesn’t display the new line character, but does parse it correctly. LibreOffice Calc’s preview does show the new line character. The end result is identical, the content matches that of the CSV fields.

And just to show an extreme case of it working, lets go meta:

Here’s a CSV file containing the content of all of the above CSV files along with the filename and a brief description.

meta.csv

And here it is all correctly parsed into Excel:

and LibreOffice Calc: