Disclaimer: this page is the personal opinion of Robert Bor
I have taken a look at the following Java CSV libraries:
- Commons CSV; Apache Commons project
- Flatpack; extensive tool, much more than just CSV
- JavaCSV; two-class solution
- opencsv; very popular Csv-to-Bean transformer
- Super CSV; extensively details Csv-to-Bean transformer
- jcsv; Csv parser experimenting with annotations
- JSefa; XML/CSV/FLR reader/writer with annotations
I checked out all the documentation and sites I could find on the tooling. After that I downloaded the latest know versions of the libraries and proceeded to run a couple of tests against them. The combination of these two activities gave me a pretty good feeling of the quality of the products. Enough to make a quality appraisal on each of them. Check out the Comparison Matrix to see the differences.
Apache Commons CSV set out with a noble goal:
There are three pre-existing BSD compatible CSV parsers which this component will hopefully make redundant
The project is still in the sandbox phase and has no releases in Maven Central yet. Given the time that has passed, it is doubtful it will ever grow to see its goal fulfilled. JARs have to be manually retrieved and added to the Maven dependencies.
Its primary focus is on the tokenizing process at which it excels. There is no support for Bean conversions. The documentation is extremely poor and mostly outdated, although the project is still being maintained.
Flatpack is obviously a lot more than just CSV handling, which a glance at the website shows. Configuring the instructions is a convoluted and disassociated, using old-school XML configuration files.
Flatpack can be a bit mean by failing silently. You will have to check the errors logs, because situations can become very nasty by its apparent willingness to gobble it all.
I dislike how you are forced to read the entire file in a DataSet and then have to iterate over that set.
Though the project has Maven Central entries (albeit without source), it is outdated (last update: 2008) and offers no support for Bean conversion. The documentation of the project is quite good.
The first thing that is noteworthy about this library is that it has only two classes; CsvClient and CsvWriter. The ease of its interface is clamored and has been a huge incentive for me to add the Facade layer to CSVeed.
The project has releases in Maven Central, although without source. The process lets you read a line one by one, there seems to be no way to get them all at once. There is no support for bean conversion. Documentation is somewhat poor since the website seems to have been used for another purpose as well.
Adorable statement in the author's blog:
For reasons I’ll never understand, my little Apache2 open source CSV parser, opencsv has recently trucked through the 20,000 download mark.
And understandably so. The library was far ahead of its time with rudimentary Bean conversion support. The coupling is slightly tedious, but it works. Documentation is fine, albeit a bit on the thin side. The library is registered in Maven Central. Active community with speedy bug fixing.
Very poor in its error feedback, a definitive negative. Also does NOT support all basic Objects out of the box. No support for comment and empty line skipping. Nevertheless, disruptive technology at that time and much needed, given its success.
I really have a soft spot for Super CSV. What the author says and does and the quality of his documentation and code, references to RFC 4180, it all fits. Just have a look at the landing page and see the motivation and the author's background. Super CSV is nothing, if not ambitious.
The project is extremely well documented and releases are available in Maven Central. If you are a corporate developer, eat your heart out.
Mapping must be done by column name mapping, there does not seem to be a way to do this by column index. There are no annotations and conversions to other classes have to be set manually with a disassociated mapping table (the "CellProcessor" approach). Advertisedly, Super CSV also supports deep conversions, although I haven't tried those.
Personally, I do not find the interface very good. It is irksome and counter-intuitive. The CellProcessor approach is the wrong way to go about in my humble opinion. Error handling is better than opencsv's, but still not the way it should be.
I mainly adopted this library into my comparison because of its usage of annotations. The project is obviously short-lived and has not been refined much. Interestingly enough it has some nice documentation, including examples and reference to RFC 4180.
The error feedback of jcsv is extremely poor and its bean conversion is error-ridden. Immature technology, not suitable for usage.
A colleague pointed me to JSefa as a reliable CSV reader. The first thing that caught my eye is that it aims to provide for XML, CSV and FLR. This approach has up and downsides, it is more versatile, but it is also harder to cater to the special requirements of a file structure, be they technical or process-wise.
Of the above libraries this one cost me the most trouble to get to the intricacies. Documentation on the API is scarce, though a number of useful sample projects have been provided. The library works with the concept of high- and low-level interfaces, high being similar to the bean reader and low the row reader. Its killer feature is that it supports deep conversion.
What irks me most about this interface is that one must define all properties with annotations, even having to tell it what converter it must use, for example for converting text to Long. This is not very convenient. I find error feedback lacking, although the line and position are reported, you cannot see the cell that went wrong.
Check out the different attributes of the afore-mentioned Java CSV libraries and CSVeed in the Comparison Matrix.