Skip to content

Commit 94113f5

Browse files
committed
OPENNLP-1791: Document the use of ClassPathModelProvider in dev manual (#919)
* OPENNLP-1791: Document the use of ClassPathModelProvider in dev manual - adds note on JVM argument since J17 and beyond - improves formatting of all "Note:" text fragments to better highlight helpful context to the reader (cherry picked from commit 7b362c8)
1 parent 10a3a37 commit 94113f5

File tree

5 files changed

+39
-6
lines changed

5 files changed

+39
-6
lines changed

opennlp-docs/src/docbkx/doccat.xml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -126,7 +126,7 @@ GMDecrease Major acquisitions that have a lower gross margin than the existing n
126126
GMIncrease The upward movement of gross margin resulted from amounts pursuant to adjustments \
127127
to obligations towards dealers .]]>
128128
</screen>
129-
Note: The line breaks marked with a backslash are just inserted for formatting purposes and must not be
129+
<emphasis role="strong">Note</emphasis>: The line breaks marked with a backslash are just inserted for formatting purposes and must not be
130130
included in the training data.
131131
</para>
132132
<section id="tools.doccat.training.tool">

opennlp-docs/src/docbkx/langdetect.xml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -141,7 +141,7 @@ lav Egija Tri-Active procedūru īpaši iesaka izmantot siltākajos gadalaik
141141
nedēļu laika Ekonomikas ministrijai, Finanšu ministrijai un Labklājības ministrijai, lai ar vienotu \
142142
pozīciju atgrieztos pie jautājuma izskatīšanas.]]>
143143
</screen>
144-
Note: The line breaks marked with a backslash are just inserted for formatting purposes and must not be
144+
<emphasis role="strong">Note</emphasis>: The line breaks marked with a backslash are just inserted for formatting purposes and must not be
145145
included in the training data.
146146
</para>
147147
<section id="tools.langdetect.training.tool">
@@ -153,7 +153,7 @@ lav Egija Tri-Active procedūru īpaši iesaka izmantot siltākajos gadalaik
153153
$ bin/opennlp LanguageDetectorTrainer[.leipzig] -model modelFile [-params paramsFile] \
154154
[-factory factoryName] -data sampleData [-encoding charsetName]]]>
155155
</screen>
156-
Note: To customize the language detector, extend the class opennlp.tools.langdetect.LanguageDetectorFactory
156+
<emphasis role="strong">Note</emphasis>: To customize the language detector, extend the class opennlp.tools.langdetect.LanguageDetectorFactory
157157
add it to the classpath and pass it in the -factory argument.
158158
</para>
159159
</section>

opennlp-docs/src/docbkx/model-loading.xml

Lines changed: 34 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -96,6 +96,39 @@ for(ClassPathModelEntry entry : models) {
9696
</programlisting>
9797

9898
</para>
99+
<para>
100+
Moreover, certain OpenNLP models can be obtained via a
101+
<emphasis>ClassPathModelProvider</emphasis>, such as OpenNLP's
102+
built-in <emphasis>DefaultClassPathModelProvider</emphasis> class.
103+
It allows direct use of models available under a certain locale, given
104+
that those are present in the classpath and can be loaded.
105+
106+
<programlisting language="java">
107+
<![CDATA[
108+
final ClassPathModelProvider provider = new DefaultClassPathModelProvider(finder, loader);
109+
// Here: SentenceModel, other model types accordingly
110+
final SentenceModel sm = provider.load("en", opennlp.tools.models.ModelType.SENTENCE_DETECTOR, SentenceModel.class);
111+
if(sm != null) {
112+
// do something with the (sentence) model
113+
}]]>
114+
</programlisting>
115+
116+
In the above example, the finder and loader objects can be created or re-used as shown in the
117+
previous code example.
118+
</para>
119+
<para>
120+
<emphasis role="strong">Note</emphasis>:
121+
When running on Java 17+, the JVM argument
122+
123+
<screen>--add-opens java.base/jdk.internal.loader=ALL-UNNAMED</screen>
124+
125+
may be required. Without this parameter, OpenNLP uses the JVM bootstrap classpath to locate models
126+
rather than the UCP class loader.
127+
128+
For more advanced or non-standard class loading scenarios, using ClassGraph and implementing a
129+
custom provider may cover additional cases beyond the default UCP class loader or
130+
JVM bootstrap class path behavior.
131+
</para>
99132
</section>
100133

101134

@@ -106,7 +139,7 @@ for(ClassPathModelEntry entry : models) {
106139
we recommend that you have a look at our setup in the <ulink url="https://github.com/apache/opennlp-models">OpenNLP Models
107140
repository</ulink>. We recommend to bundle one model per JAR file.
108141

109-
Make sure you add a <emphasis>model.properties</emphasis> file with the following content
142+
Make sure you add a <emphasis>model.properties</emphasis> file with the following content:
110143

111144
<programlisting language="java">
112145
<![CDATA[

opennlp-docs/src/docbkx/namefinder.xml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -506,7 +506,7 @@ Precision: 0.8005071889818507
506506
Recall: 0.7450581122145297
507507
F-Measure: 0.7717879983140168]]>
508508
</screen>
509-
Note: The command line interface does not support cross evaluation in the current version.
509+
<emphasis role="strong">Note</emphasis>: The command line interface does not support cross evaluation in the current version.
510510
</para>
511511
</section>
512512
<section id="tools.namefind.eval.api">

opennlp-docs/src/docbkx/parser.xml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -200,7 +200,7 @@ $ opennlp ParserTrainer -model en-parser-chunking.bin -parserType CHUNKING \
200200
tool replaces the tagger model inside the parser model with a new one.
201201
</para>
202202
<para>
203-
Note: The original parser model will be overwritten with the new parser model which
203+
<emphasis role="strong">Note</emphasis>: The original parser model will be overwritten with the new parser model which
204204
contains the replaced tagger model.
205205
<screen>
206206
<![CDATA[

0 commit comments

Comments
 (0)