Consistent? What do you mean, "consistent"? Sometimes it's comma separated, sometimes it's semicolon separated (depending on the user's locale), sometimes it's separated by tabs (because it's a _C_SV file, yeah, no biggie), no content encoding hint (Unicode? Latin-1251? Win-1252? Nobody knows), not to mention you've written this comment under an article that shows just about the least consistent behavior ever. (Line breaks? Ahahahaha!)
The only consistent thing about CSV is its ubiquity; other than that, it's a hairy, inconsistent mess that appears simple. (Source: having parsed millions of blobs that all identified themselves as CSV, despite being almost completely different in structure.)
You would think so, but people are dumb. I've seen tab-delimited files that are .CSV instead of .tsv, and I've also seen the semicolon delimiter a few times though I can't recall where. I think Excel actually pops up a prompt when importing to confirm the delimiter in some cases?
From your link, it's quite clear that you should not assume any particular CSV file to follow any particular rules.
> Interoperability considerations:
> Due to lack of a single specification, there are considerable differences among implementations. Implementors should "be conservative in what you do, be liberal in what you accept from others" (RFC 793 [8]) when processing CSV files. An attempt at a common definition can be found in Section 2....
> Published specification:
> While numerous private specifications exist for various programs and systems, there is no single "master" specification for this format. An attempt at a common definition can be found in Section 2.
Section 2 states:
> This section documents the format that seems to be followed by most implementations:
"All theory, dear friend, is gray, but the golden tree of life springs ever green." -Goethe
If CSV were indeed always comma-separated, my hair would be at least 5% less gray. Alas, most programs emit semicolon-separated "CSV" in some locales (MS Office, LibreOffice, you-name-it-they-got-it).
Of course, I understand that your academic position "if it chokes the RFC-compliant parser, it's not a True CSV and should be sent to /dev/null" tautologically exists - but for some reason, users tend to object to such treatment (especially when they have no useful tools that would emit your One True Format for them).
TL;DR: there is no single standard fitting all the things that call themselves "CSV".
In other words, as soon as you start exchanging data, you'll get something that is complex, broken, or (most common case) both. Existence of a simple, consistent general format has not been conclusively proven impossible, but I have yet to see one in practice.
(Of course, everybody and their dog have cooked up simple data schemes, yes, but those are a) domain-specific, and b) not in widespread use.)
The only consistent thing about CSV is its ubiquity; other than that, it's a hairy, inconsistent mess that appears simple. (Source: having parsed millions of blobs that all identified themselves as CSV, despite being almost completely different in structure.)