Diff: CSV - Waikato Linux Users Group

Differences between current version and previous revision of CSV.

Other diffs: Previous Major Revision, Previous Author, or view the Annotated Edit History

Newer page:	version 3	Last edited on Sunday, July 4, 2004 5:24:31 am	by AristotlePagaltzis
Older page:	version 2	Last edited on Tuesday, February 17, 2004 5:43:47 pm	by AristotlePagaltzis	Revert

@@ -1,3 +1,11 @@

-~~InNeedOfRefactor~~

+An [Acronym] for __C__haracter __S__eparated __V__alues (originally, __C__omma __S__eparated __V__alues).

-[~~Acronym~~ ] ~~for CommaSeparatedVector or~~ __C __~~omma __S__eparated __V__alues~~ .

+A FileFormat to store a table of data. Each row is stored on one line of a FlatFile as a list of values separated by a certain character. The common separator used to be a comma, but nowadays, semicolons and tabs seem to be the most prevelant choices. The pipe characters is also occasionally encountered. [CSV ] files __should __ have the same number of values on every line.

+

+Including the separator character in a data field is tricky business. The [CSV] format was never formally defined, so no standard quoting or escaping rules exist. As a result, while most software reads and writes [CSV] files in similar ways, they often disagree about edge cases. Most data importers recognise content within doublequotes as quoted literal strings within which separators are to be disregarded, but again, the same issues arise when including the quote character in a data field. Exchanging [CSV] files can be therefor be a maddening experience.

+

+Another issue that differs from file to file is whether it contains headers, ie column names, which, if present, are stored in the same form as a record, on the first line of the file. You should always generate [CSV] files with headers; they make a [CSV] file much easier to interpret and significantly reduce the likelihood of miscommunication.

+

+[CSV] files were very common in the days of limited computing power. Nowadays, most data is instead stored in [RelationalDataBase]s and [PostRelationalDataBase]s. Unlike the fields in a relation, data in [CSV] files is not intrinsically typed, and each field is of varying length. There is no standard means to crossreference between [CSV] tables nor to handle null values, either.

+

+Since the format is very compact but human readable, they're still an excellent way to exchange tabular data, provided all parties negotiate the exact [CSV] dialect beforehand .