Skip to content

Commit c7ae7ff

Browse files
committed
Migrate the User Guide to Javadoc
1 parent 9fc2f8f commit c7ae7ff

File tree

3 files changed

+320
-189
lines changed

3 files changed

+320
-189
lines changed

src/main/javadoc/overview.html

Lines changed: 317 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,317 @@
1+
<!--
2+
Licensed to the Apache Software Foundation (ASF) under one or more
3+
contributor license agreements. See the NOTICE file distributed with
4+
this work for additional information regarding copyright ownership.
5+
The ASF licenses this file to You under the Apache License, Version 2.0
6+
(the "License"); you may not use this file except in compliance with
7+
the License. You may obtain a copy of the License at
8+
9+
https://www.apache.org/licenses/LICENSE-2.0
10+
11+
Unless required by applicable law or agreed to in writing, software
12+
distributed under the License is distributed on an "AS IS" BASIS,
13+
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
14+
See the License for the specific language governing permissions and
15+
limitations under the License.
16+
-->
17+
<html>
18+
<head>
19+
<title>Apache Commons CSV Overview</title>
20+
</head>
21+
<body>
22+
<img src="../images/commons-logo.png" alt="Apache Commons CSV">
23+
<p>
24+
You can find the Javadoc package list at the <a href="#all-packages-table">bottom of this page</a>.
25+
</p>
26+
<section>
27+
<h1>Introducing Commons CSV</h1>
28+
<p>Apache Commons CSV reads and writes files in variations of the Comma Separated Value (CSV) format.</p>
29+
<p>
30+
Common CSV formats are predefined in the <a href="org/apache/commons/csv/CSVFormat.html">CSVFormat</a> class:
31+
<table>
32+
<caption>CSV Formats</caption>
33+
<thead>
34+
<tr>
35+
<th scope="col">CSVFormat</th>
36+
<th scope="col">Description</th>
37+
<th scope="col">Since Version</th>
38+
</tr>
39+
</thead>
40+
<tbody>
41+
<tr>
42+
<td><a href="org/apache/commons/csv/CSVFormat.html#DEFAULT">DEFAULT</a></td>
43+
<td>IO for the Standard Comma Separated Value format, like <a href="https://datatracker.ietf.org/doc/html/rfc4180">RFC 4180</a> but allowing
44+
empty lines.
45+
</td>
46+
<td>1.0</td>
47+
</tr>
48+
<tr>
49+
<td><a href="org/apache/commons/csv/CSVFormat.html#EXCEL">EXCEL</a></td>
50+
<td>IO for the <a href="https://support.microsoft.com/en-us/office/import-or-export-text-txt-or-csv-files-5250ac4c-663c-47ce-937b-339e391393ba">Microsoft
51+
Excel CSV.</a> format.
52+
</td>
53+
<td>1.0</td>
54+
</tr>
55+
<tr>
56+
<td><a href="org/apache/commons/csv/CSVFormat.html#INFORMIX_UNLOAD">INFORMIX_UNLOAD</a></td>
57+
<td>IO for the <a href="https://www.ibm.com/docs/en/informix-servers/14.10?topic=statements-unload-statement">Informix UNLOAD TO file_name</a>
58+
command.
59+
</td>
60+
<td>1.3</td>
61+
</tr>
62+
<tr>
63+
<td><a href="org/apache/commons/csv/CSVFormat.html#INFORMIX_UNLOAD_CSV">INFORMIX_UNLOAD_CSV</a></td>
64+
<td>IO for the <a href="https://www.ibm.com/docs/en/informix-servers/14.10?topic=statements-unload-statement">Informix UNLOAD CSV TO
65+
file_name</a> command with escaping disabled.
66+
</td>
67+
<td>1.3</td>
68+
</tr>
69+
<tr>
70+
<td><a href="org/apache/commons/csv/CSVFormat.html#MONGODB_CSV">MONGODB_CSV</a></td>
71+
<td>IO for the <a href="https://docs.mongodb.com/manual/reference/program/mongoexport/">MongoDB CSV <code>mongoexport</code></a> command.
72+
</td>
73+
<td>1.7</td>
74+
</tr>
75+
<tr>
76+
<td><a href="org/apache/commons/csv/CSVFormat.html#MONGODB_TSV">MONGODB_TSV</a></td>
77+
<td>IO for the <a href="https://docs.mongodb.com/manual/reference/program/mongoexport/">MongoDB Tab Separated Values (TSV)<code>mongoexport</code></a>
78+
command.
79+
</td>
80+
<td>1.7</td>
81+
</tr>
82+
<tr>
83+
<td><a href="org/apache/commons/csv/CSVFormat.html#MYSQL">MYSQL</a></td>
84+
<td>IO for the <a href="https://dev.mysql.com/doc/refman/8.0/en/mysqldump-delimited-text.html">MySQL CSV</a> format.
85+
</td>
86+
<td>1.0</td>
87+
</tr>
88+
<tr>
89+
<td><a href="org/apache/commons/csv/CSVFormat.html#ORACLE">ORACLE</a></td>
90+
<td>IO for the <a href="https://docs.oracle.com/database/121/SUTIL/GUID-D1762699-8154-40F6-90DE-EFB8EB6A9AB0.htm#SUTIL4217">Oracle CSV</a> format
91+
of the SQL*Loader utility.
92+
</td>
93+
<td>1.6</td>
94+
</tr>
95+
<tr>
96+
<td><a href="org/apache/commons/csv/CSVFormat.html#POSTGRESQL_CSV">POSTGRESQL_CSV</a></td>
97+
<td>IO for the <a href="https://www.postgresql.org/docs/current/static/sql-copy.html">PostgreSQL CSV</a> format used by the <code>COPY</code>
98+
operation.
99+
</td>
100+
<td>1.5</td>
101+
</tr>
102+
<tr>
103+
<td><a href="org/apache/commons/csv/CSVFormat.html#POSTGRESQL_TEXT">POSTGRESQL_TEXT</a></td>
104+
<td>IO for the <a href="https://www.postgresql.org/docs/current/static/sql-copy.html">PostgreSQL Text</a> format used by the <code>COPY</code>
105+
operation.
106+
</td>
107+
<td>1.5</td>
108+
</tr>
109+
<tr>
110+
<td><a href="org/apache/commons/csv/CSVFormat.html#RFC4180">RFC4180</a></td>
111+
<td>IO for the RFC-4180 format defined by<a href="https://datatracker.ietf.org/doc/html/rfc4180">RFC 4180</a>.
112+
</td>
113+
<td>1.0</td>
114+
</tr>
115+
<tr>
116+
<td><a href="org/apache/commons/csv/CSVFormat.html#TDF">TDF</a></td>
117+
<td>IO for the <a href="https://en.wikipedia.org/wiki/Tab-separated_values">Tab Delimited Format</a> (also known as Tab Separated Values).
118+
</td>
119+
<td>1.0</td>
120+
</tr>
121+
</tbody>
122+
</table>
123+
<p>Custom formats can be created using a fluent style API.</p>
124+
</section>
125+
<section>
126+
<h1>Parsing Standard CSV Files</h1>
127+
<p>
128+
Parsing files with Apache Commons CSV is relatively straight forward. Pick a
129+
<code>CSVFormat</code>
130+
and go from there.
131+
</p>
132+
<section>
133+
<h2>Parsing an Excel CSV File</h2>
134+
<p>To parse an Excel CSV file, write:</p>
135+
<pre>
136+
<code>
137+
Reader in = new FileReader(&quot;path/to/file.csv&quot;);
138+
Iterable&lt;CSVRecord&gt; records = CSVFormat.EXCEL.parse(in);
139+
for (CSVRecord record : records) {
140+
String lastName = record.get("Last Name");
141+
String firstName = record.get("First Name");
142+
}
143+
</code>
144+
</pre>
145+
</section>
146+
</section>
147+
<section>
148+
<h1>Parsing Custom CSV Files</h1>
149+
<p>
150+
You can define your own using IO rules by building your own CSVFormat instance. Starting with
151+
<code>CSVFormat.builder()</code>
152+
lets you start from a predefined format and customize. For example:
153+
</p>
154+
<pre>
155+
<code>
156+
CSVFormat myFormat = CSVFormat.DEFAULT.builder()
157+
.setCommentMarker('#')
158+
.setEscape('+')
159+
.setIgnoreSurroundingSpaces(true)
160+
.setQuote('"')
161+
.setQuoteMode(QuoteMode.ALL)
162+
.get()
163+
</code>
164+
</pre>
165+
</section>
166+
<section>
167+
<h1>Handling Byte Order Marks</h1>
168+
<p>
169+
To handle files that start with a Byte Order Mark (BOM), like some Excel CSV files, you need an extra step to deal with the optional BOM bytes. Using the
170+
<a href="https://commons.apache.org/proper/commons-io/apidocs/org/apache/commons/io/input/BOMInputStream.html"> BOMInputStream </a> class from <a
171+
href="https://commons.apache.org/proper/commons-io/">Apache Commons IO</a> simplifies this task; for example:
172+
</p>
173+
<pre>
174+
<code>
175+
try (Reader reader = new InputStreamReader(BOMInputStream.builder()
176+
.setPath(path)
177+
.get(), "UTF-8");
178+
CSVParser parser = CSVFormat.EXCEL.builder()
179+
.setHeader()
180+
.get()
181+
.parse(reader)) {
182+
for (CSVRecord record : parser) {
183+
String string = record.get("ColumnA");
184+
// ...
185+
}
186+
}
187+
</code>
188+
</pre>
189+
<p>You might find it handy to create something like this:</p>
190+
<pre>
191+
<code>
192+
/**
193+
* Creates a reader capable of handling BOMs.
194+
*
195+
* @param path The path to read.
196+
* @return a new InputStreamReader for UTF-8 bytes.
197+
* @throws IOException if an I/O error occurs.
198+
*/
199+
public InputStreamReader newReader(final Path path) throws IOException {
200+
return new InputStreamReader(BOMInputStream.builder()
201+
.setPath(path)
202+
.get(), StandardCharsets.UTF_8);
203+
}
204+
</code>
205+
</pre>
206+
</section>
207+
<section>
208+
<h1>Using Headers</h1>
209+
<p>
210+
Apache Commons CSV provides several ways to access record values. The simplest way is to access values by their index in the record. However, columns in
211+
CSV files often have a name, for example: ID, CustomerNo, Birthday, etc. The CSVFormat class provides an API for specifying these <i>header</i> names and
212+
CSVRecord on the other hand has methods to access values by their corresponding header name.
213+
</p>
214+
<section>
215+
<h2>Accessing column values by index</h2>
216+
<p>To access a record value by index, no special configuration of the CSVFormat is necessary:</p>
217+
<pre>
218+
<code>
219+
Reader in = new FileReader(&quot;path/to/file.csv&quot;);
220+
Iterable&lt;CSVRecord&gt; records = CSVFormat.RFC4180.parse(in);
221+
for (CSVRecord record : records) {
222+
String columnOne = record.get(0);
223+
String columnTwo = record.get(1);
224+
}
225+
</code>
226+
</pre>
227+
</section>
228+
<section>
229+
<h2>Defining a header manually</h2>
230+
<p>Indices may not be the most intuitive way to access record values. For this reason it is possible to assign names to each column in the file:</p>
231+
<pre>
232+
<code>
233+
Reader in = new FileReader(&quot;path/to/file.csv&quot;);
234+
Iterable&lt;CSVRecord&gt; records = CSVFormat.RFC4180.builder()
235+
.setHeader("ID", "CustomerNo", "Name")
236+
.build()
237+
.parse(in);
238+
for (CSVRecord record : records) {
239+
String id = record.get("ID");
240+
String customerNo = record.get("CustomerNo");
241+
String name = record.get("Name");
242+
}
243+
</code>
244+
</pre>
245+
Note that column values can still be accessed using their index.
246+
</section>
247+
<section>
248+
<h2>Using an enum to define a header</h2>
249+
<p>Using String values all over the code to reference columns can be error prone. For this reason, it is possible to define an enum to specify header
250+
names. Note that the enum constant names are used to access column values. This may lead to enums constant names which do not follow the Java coding
251+
standard of defining constants in upper case with underscores:</p>
252+
<pre>
253+
<code>
254+
public enum Headers {
255+
ID, CustomerNo, Name
256+
}
257+
Reader in = new FileReader(&quot;path/to/file.csv&quot;);
258+
Iterable&lt;CSVRecord&gt; records = CSVFormat.RFC4180.builder()
259+
.setHeader(Headers.class)
260+
.build()
261+
.parse(in);
262+
for (CSVRecord record : records) {
263+
String id = record.get(Headers.ID);
264+
String customerNo = record.get(Headers.CustomerNo);
265+
String name = record.get(Headers.Name);
266+
}
267+
</code>
268+
</pre>
269+
Again it is possible to access values by their index and by using a String (for example "CustomerNo").
270+
</section>
271+
<section>
272+
<h2>Header auto detection</h2>
273+
<p>Some CSV files define header names in their first record. If configured, Apache Commons CSV can parse the header names from the first record:</p>
274+
<pre>
275+
<code>
276+
Reader in = new FileReader(&quot;path/to/file.csv&quot;);
277+
Iterable&lt;CSVRecord&gt; records = CSVFormat.RFC4180.builder()
278+
.setHeader()
279+
.setSkipHeaderRecord(true)
280+
.build()
281+
.parse(in);
282+
for (CSVRecord record : records) {
283+
String id = record.get("ID");
284+
String customerNo = record.get("CustomerNo");
285+
String name = record.get("Name");
286+
}
287+
</code>
288+
</pre>
289+
This will use the values from the first record as header names and skip the first record when iterating.
290+
</section>
291+
<section>
292+
<h2>Printing with headers</h2>
293+
<p>To print a CSV file with headers, you specify the headers in the format:</p>
294+
<pre>
295+
<code>
296+
Appendable out = ...;
297+
CSVPrinter printer = CSVFormat.DEFAULT.builder()
298+
.setHeader("H1", "H2")
299+
.build()
300+
.print(out);
301+
</code>
302+
</pre>
303+
<p>To print a CSV file with JDBC column labels, you specify the ResultSet in the format:</p>
304+
<pre>
305+
<code>
306+
try (ResultSet resultSet = ...) {
307+
CSVPrinter printer = CSVFormat.DEFAULT.builder()
308+
.setHeader(resultSet)
309+
.build()
310+
.print(out);
311+
}
312+
</code>
313+
</pre>
314+
</section>
315+
</section>
316+
</body>
317+
</html>

src/site/xdoc/index.xml

Lines changed: 2 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -24,26 +24,13 @@ limitations under the License.
2424
<!-- ================================================== -->
2525
<section name="Using Apache Commons CSV">
2626
<p>Commons CSV reads and writes files in variations of the Comma Separated Value (CSV) format.</p>
27-
<p>The most common CSV formats are predefined in the <a href="apidocs/org/apache/commons/csv/CSVFormat.html">CSVFormat</a> class:
28-
<ul>
29-
<li><a href="https://support.microsoft.com/en-us/office/import-or-export-text-txt-or-csv-files-5250ac4c-663c-47ce-937b-339e391393ba">Microsoft Excel</a></li>
30-
<li><a href="https://www.ibm.com/docs/en/informix-servers/14.10?topic=statements-unload-statement">Informix UNLOAD</a></li>
31-
<li><a href="https://www.ibm.com/docs/en/informix-servers/14.10?topic=statements-unload-statement">Informix UNLOAD CSV</a></li>
32-
<li><a href="https://dev.mysql.com/doc/refman/8.0/en/mysqldump-delimited-text.html">MySQL</a></li>
33-
<li><a href="https://docs.oracle.com/database/121/SUTIL/GUID-D1762699-8154-40F6-90DE-EFB8EB6A9AB0.htm#SUTIL4217">Oracle</a></li>
34-
<li><a href="https://www.postgresql.org/docs/current/static/sql-copy.html">PostgreSQL CSV</a></li>
35-
<li><a href="https://www.postgresql.org/docs/current/static/sql-copy.html">PostgreSQL Text</a></li>
36-
<li><a href="https://datatracker.ietf.org/doc/html/rfc4180">RFC 4180</a></li>
37-
<li><a href="https://en.wikipedia.org/wiki/Tab-separated_values">TDF</a></li>
38-
</ul>
39-
</p>
40-
<p>Custom formats can be created using a fluent style API.</p>
27+
<p>Read the documentation starting with the <a href="apidocs/index.html">Javadoc Overview</a>.</p>
4128
</section>
4229
<!-- ================================================== -->
4330
<section name="Documentation">
4431
<p>
4532
An overview of the functionality is provided in the
46-
<a href="user-guide.html">user guide</a>.
33+
<a href="apidocs/index.html">user guide</a>.
4734
Various <a href="project-reports.html">project reports</a> are also available.
4835
</p>
4936
<p>

0 commit comments

Comments
 (0)