Skip to content

Commit a6a5dff

Browse files
rzo1mawiesne
andauthored
OPENNLP-1714: Adjust Dev Manual to modularized structure (#976)
* OPENNLP-1714 - Adjust Dev Manual to modularized structure * Fixes minor Javadoc issue in LineSearch class header --------- Co-authored-by: Martin Wiesner <martin.wiesner@hs-heilbronn.de>
1 parent 9b826af commit a6a5dff

File tree

3 files changed

+315
-1
lines changed

3 files changed

+315
-1
lines changed

opennlp-core/opennlp-ml/opennlp-ml-maxent/src/main/java/opennlp/tools/ml/maxent/quasinewton/LineSearch.java

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,7 @@
2323
* Performs line search to find a minimum.
2424
*
2525
* @see <a href="https://link.springer.com/book/10.1007/978-0-387-40065-5">
26-
* Nocedal & Wright 2006, Numerical Optimization</a>, p. 37)
26+
* Nocedal &amp; Wright 2006, Numerical Optimization</a>, p. 37)
2727
*/
2828
public class LineSearch {
2929
private static final double C = 0.0001;

opennlp-docs/src/docbkx/opennlp.xml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -97,6 +97,7 @@ under the License.
9797
<title>Apache OpenNLP Developer Documentation</title>
9898
<toc/>
9999
<xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="./introduction.xml"/>
100+
<xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="./project-structure.xml"/>
100101
<xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="./langdetect.xml" />
101102
<xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="./sentdetect.xml"/>
102103
<xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="./tokenizer.xml" />
Lines changed: 313 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,313 @@
1+
<?xml version="1.0" encoding="UTF-8"?>
2+
<!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V5.0//EN"
3+
"https://cdn.docbook.org/schema/5.0/dtd/docbook.dtd"[
4+
]>
5+
<!--
6+
Licensed to the Apache Software Foundation (ASF) under one
7+
or more contributor license agreements. See the NOTICE file
8+
distributed with this work for additional information
9+
regarding copyright ownership. The ASF licenses this file
10+
to you under the Apache License, Version 2.0 (the
11+
"License"); you may not use this file except in compliance
12+
with the License. You may obtain a copy of the License at
13+
14+
http://www.apache.org/licenses/LICENSE-2.0
15+
16+
Unless required by applicable law or agreed to in writing,
17+
software distributed under the License is distributed on an
18+
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
19+
KIND, either express or implied. See the License for the
20+
specific language governing permissions and limitations
21+
under the License.
22+
-->
23+
24+
<chapter xml:id="tools.project.structure" xmlns:xlink="http://www.w3.org/1999/xlink">
25+
<title>Project Structure</title>
26+
27+
<section xml:id="tools.project.structure.overview">
28+
<title>Overview</title>
29+
<para>
30+
Starting with version 3.0, Apache OpenNLP has been reorganized from a single monolithic
31+
<code>opennlp-tools</code> artifact into a set of fine-grained modules. This modularization
32+
allows users to depend only on the components they actually need, resulting in a smaller
33+
dependency footprint. At the same time, the public API remains stable and fully compatible
34+
with previous 2.x releases.
35+
</para>
36+
<para>
37+
The following sections describe each module, its purpose, and when to include it as a dependency.
38+
</para>
39+
</section>
40+
41+
<section xml:id="tools.project.structure.api">
42+
<title>API Module</title>
43+
<para>
44+
The <code>opennlp-api</code> module defines the public interfaces and abstractions
45+
that form the contract between OpenNLP and its users. It contains the core interfaces
46+
such as <code>Tokenizer</code>, <code>SentenceDetector</code>, <code>POSTagger</code>,
47+
<code>TokenNameFinder</code>, <code>Chunker</code>, <code>Parser</code>,
48+
<code>LanguageDetector</code>, <code>Lemmatizer</code>, and <code>DocumentCategorizer</code>.
49+
</para>
50+
<para>
51+
This module also provides shared base classes such as <code>BaseModel</code>,
52+
the <code>ObjectStream</code> abstraction for data processing, the command-line
53+
argument parsing framework, and common utility types. It is a transitive dependency
54+
of <code>opennlp-runtime</code> and typically does not need to be declared explicitly.
55+
</para>
56+
57+
<programlisting language="xml">
58+
<![CDATA[<dependency>
59+
<groupId>org.apache.opennlp</groupId>
60+
<artifactId>opennlp-api</artifactId>
61+
<version>CURRENT_OPENNLP_VERSION</version>
62+
</dependency>]]>
63+
</programlisting>
64+
</section>
65+
66+
<section xml:id="tools.project.structure.runtime">
67+
<title>Runtime Module</title>
68+
<para>
69+
The <code>opennlp-runtime</code> module is the primary dependency for most users. It
70+
contains the core NLP tool implementations including sentence detection, tokenization,
71+
part-of-speech tagging, named entity recognition, chunking, parsing, language detection,
72+
lemmatization, and document categorization.
73+
</para>
74+
<para>
75+
By default, <code>opennlp-runtime</code> ships with the Maximum Entropy machine
76+
learning implementation. If you need other ML algorithms, add the corresponding
77+
ML module as described below.
78+
</para>
79+
80+
<programlisting language="xml">
81+
<![CDATA[<dependency>
82+
<groupId>org.apache.opennlp</groupId>
83+
<artifactId>opennlp-runtime</artifactId>
84+
<version>CURRENT_OPENNLP_VERSION</version>
85+
</dependency>]]>
86+
</programlisting>
87+
</section>
88+
89+
<section xml:id="tools.project.structure.ml">
90+
<title>Machine Learning Modules</title>
91+
<para>
92+
The machine learning implementations have been separated into individual modules so that
93+
applications can include only the algorithms they use. Each module provides a specific
94+
ML algorithm and is loaded at runtime via the <code>ExtensionLoader</code> service
95+
discovery mechanism.
96+
</para>
97+
98+
<itemizedlist>
99+
<listitem>
100+
<para>
101+
<code>opennlp-ml-commons</code> — Shared ML utilities and base classes used
102+
by all ML algorithm modules. This is a transitive dependency of each ML module
103+
and does not need to be declared explicitly.
104+
</para>
105+
</listitem>
106+
<listitem>
107+
<para>
108+
<code>opennlp-ml-maxent</code> — Maximum Entropy classifier. This is the default
109+
algorithm and is included transitively via <code>opennlp-runtime</code>.
110+
</para>
111+
</listitem>
112+
<listitem>
113+
<para>
114+
<code>opennlp-ml-perceptron</code> — Perceptron-based learning algorithm.
115+
Add this dependency if your models use the Perceptron or Perceptron Sequence trainer.
116+
</para>
117+
</listitem>
118+
<listitem>
119+
<para>
120+
<code>opennlp-ml-bayes</code> — Naive Bayes classifier.
121+
Add this dependency if your models use the Naive Bayes trainer.
122+
</para>
123+
</listitem>
124+
</itemizedlist>
125+
126+
<para>
127+
For example, to use the Perceptron trainer alongside the default Maximum Entropy, add:
128+
</para>
129+
130+
<programlisting language="xml">
131+
<![CDATA[<dependency>
132+
<groupId>org.apache.opennlp</groupId>
133+
<artifactId>opennlp-ml-perceptron</artifactId>
134+
<version>CURRENT_OPENNLP_VERSION</version>
135+
</dependency>]]>
136+
</programlisting>
137+
</section>
138+
139+
<section xml:id="tools.project.structure.models">
140+
<title>Models Module</title>
141+
<para>
142+
The <code>opennlp-models</code> module provides classpath-based model discovery and
143+
loading. It enables applications to bundle pre-trained OpenNLP models as JAR files and
144+
load them at runtime without explicit file path references.
145+
See <xref linkend="tools.model"/> for details on classpath model loading.
146+
</para>
147+
148+
<programlisting language="xml">
149+
<![CDATA[<dependency>
150+
<groupId>org.apache.opennlp</groupId>
151+
<artifactId>opennlp-models</artifactId>
152+
<version>CURRENT_OPENNLP_VERSION</version>
153+
</dependency>]]>
154+
</programlisting>
155+
</section>
156+
157+
<section xml:id="tools.project.structure.formats">
158+
<title>Formats Module</title>
159+
<para>
160+
The <code>opennlp-formats</code> module supports reading and writing various NLP
161+
training and evaluation data formats, including CoNLL, BioNLP, BRAT, AD (Floresta),
162+
Leipzig, and others. Include this module if you need to train models from data in
163+
non-native OpenNLP formats.
164+
</para>
165+
166+
<programlisting language="xml">
167+
<![CDATA[<dependency>
168+
<groupId>org.apache.opennlp</groupId>
169+
<artifactId>opennlp-formats</artifactId>
170+
<version>CURRENT_OPENNLP_VERSION</version>
171+
</dependency>]]>
172+
</programlisting>
173+
</section>
174+
175+
<section xml:id="tools.project.structure.dl">
176+
<title>Deep Learning Modules</title>
177+
<para>
178+
OpenNLP provides optional support for ONNX-based neural models via two modules:
179+
</para>
180+
181+
<itemizedlist>
182+
<listitem>
183+
<para>
184+
<code>opennlp-dl</code> — Integrates the ONNX Runtime for CPU-based inference.
185+
This module enables the use of models trained by external frameworks such as
186+
PyTorch or TensorFlow, exported in the ONNX format.
187+
</para>
188+
</listitem>
189+
<listitem>
190+
<para>
191+
<code>opennlp-dl-gpu</code> — Replaces the CPU ONNX Runtime with the
192+
GPU-accelerated variant for systems with supported GPU hardware.
193+
Use this module instead of <code>opennlp-dl</code> when GPU acceleration
194+
is available and desired.
195+
</para>
196+
</listitem>
197+
</itemizedlist>
198+
199+
<programlisting language="xml">
200+
<![CDATA[<!-- CPU variant -->
201+
<dependency>
202+
<groupId>org.apache.opennlp</groupId>
203+
<artifactId>opennlp-dl</artifactId>
204+
<version>CURRENT_OPENNLP_VERSION</version>
205+
</dependency>
206+
207+
<!-- OR GPU variant (do not include both) -->
208+
<dependency>
209+
<groupId>org.apache.opennlp</groupId>
210+
<artifactId>opennlp-dl-gpu</artifactId>
211+
<version>CURRENT_OPENNLP_VERSION</version>
212+
</dependency>]]>
213+
</programlisting>
214+
</section>
215+
216+
<section xml:id="tools.project.structure.cli">
217+
<title>CLI Module</title>
218+
<para>
219+
The <code>opennlp-cli</code> module provides the command-line tools for training,
220+
evaluating, and running OpenNLP models from a terminal. It is included in the binary
221+
distribution and not typically needed as a library dependency.
222+
See <xref linkend="tools.cli"/> for details on available CLI commands.
223+
</para>
224+
</section>
225+
226+
<section xml:id="tools.project.structure.tools">
227+
<title>Tools Module (Aggregated Jar)</title>
228+
<para>
229+
The <code>opennlp-tools</code> module is an aggregated artifact that bundles
230+
all core modules (<code>opennlp-api</code>, <code>opennlp-runtime</code>,
231+
all ML modules, <code>opennlp-models</code>, <code>opennlp-formats</code>,
232+
and <code>opennlp-cli</code>) into a single JAR. It is provided for backwards
233+
compatibility with 2.x and for the binary distribution.
234+
</para>
235+
<para>
236+
For new projects, we recommend depending on <code>opennlp-runtime</code>
237+
plus only the specific additional modules you need, rather than pulling in
238+
the full <code>opennlp-tools</code> artifact.
239+
</para>
240+
</section>
241+
242+
<section xml:id="tools.project.structure.extensions">
243+
<title>Extension Modules</title>
244+
<para>
245+
OpenNLP provides optional extension modules for integration with external frameworks:
246+
</para>
247+
248+
<itemizedlist>
249+
<listitem>
250+
<para>
251+
<code>opennlp-morfologik</code> — Integrates the
252+
<link xlink:href="https://github.com/morfologik">Morfologik</link>
253+
library for dictionary-based stemming and lemmatization.
254+
See <xref linkend="tools.morfologik"/> for usage details.
255+
</para>
256+
</listitem>
257+
<listitem>
258+
<para>
259+
<code>opennlp-uima</code> — Provides a set of
260+
<link xlink:href="https://uima.apache.org">Apache UIMA</link>
261+
annotators that wrap OpenNLP components for use in UIMA pipelines.
262+
See <xref linkend="tools.uima"/> for integration details.
263+
</para>
264+
</listitem>
265+
</itemizedlist>
266+
</section>
267+
268+
<section xml:id="tools.project.structure.migration">
269+
<title>Migrating from 2.x to 3.x</title>
270+
<para>
271+
The 3.x release introduces no known breaking API changes. Existing code using the
272+
<code>opennlp-tools</code> artifact will continue to work without modification.
273+
However, we strongly recommend migrating to the modular dependency structure for a
274+
smaller footprint.
275+
</para>
276+
<para>
277+
A minimal migration replaces:
278+
</para>
279+
280+
<programlisting language="xml">
281+
<![CDATA[<!-- 2.x: single monolithic dependency -->
282+
<dependency>
283+
<groupId>org.apache.opennlp</groupId>
284+
<artifactId>opennlp-tools</artifactId>
285+
<version>2.x.y</version>
286+
</dependency>]]>
287+
</programlisting>
288+
289+
<para>
290+
with:
291+
</para>
292+
293+
<programlisting language="xml">
294+
<![CDATA[<!-- 3.x: modular dependencies — add only what you need -->
295+
<dependency>
296+
<groupId>org.apache.opennlp</groupId>
297+
<artifactId>opennlp-runtime</artifactId>
298+
<version>CURRENT_OPENNLP_VERSION</version>
299+
</dependency>
300+
<!-- Add opennlp-models, opennlp-ml-perceptron, opennlp-dl, etc. as needed -->]]>
301+
</programlisting>
302+
303+
<note>
304+
<para>
305+
The <code>opennlp-runtime</code> module includes the Maximum Entropy ML
306+
implementation by default. If your models were trained with the Perceptron
307+
or Naive Bayes algorithm, add the corresponding <code>opennlp-ml-perceptron</code>
308+
or <code>opennlp-ml-bayes</code> dependency.
309+
</para>
310+
</note>
311+
</section>
312+
313+
</chapter>

0 commit comments

Comments
 (0)