Skip to content

Commit 6861817

Browse files
Merge pull request #265 from jamesacampbell/master
added documentation for airgap setup
2 parents 8b88be2 + af1e49f commit 6861817

File tree

1 file changed

+17
-11
lines changed

1 file changed

+17
-11
lines changed

README.md

Lines changed: 17 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -5,9 +5,9 @@ tika-python
55
===========
66
A Python port of the [Apache Tika](http://tika.apache.org/)
77
library that makes Tika available using the
8-
[Tika REST Server](http://wiki.apache.org/tika/TikaJAXRS).
8+
[Tika REST Server](http://wiki.apache.org/tika/TikaJAXRS).
99

10-
This makes Apache Tika available as a Python library,
10+
This makes Apache Tika available as a Python library,
1111
installable via Setuptools, Pip and Easy Install.
1212

1313
To use this library, you need to have Java 7+ installed on your
@@ -22,8 +22,14 @@ Installation (with pip)
2222

2323
Installation (without pip)
2424
--------------------------
25-
1. `python setup.py build`
26-
2. `python setup.py install`
25+
1. `python setup.py build`
26+
2. `python setup.py install`
27+
28+
Airgap Environment Setup
29+
------------------------
30+
To get this working in a disconnected environment, download a tika server file and set the TIKA_SERVER_JAR environment variable to TIKA_SERVER_JAR="file:///<yourpath>/tika-server.jar" which successfully tells `python-tika` to "download" this file and move it to `/tmp/tika-server.jar` and run as background process.
31+
32+
This is the only way to run `python-tika` without internet access. Without this set, the default is to check the tika version and pull latest every time from Apache.
2733

2834
Environment Variables
2935
---------------------
@@ -58,11 +64,11 @@ print(parsed["content"])
5864

5965
Parser Interface
6066
----------------------
61-
The parser interface extracts text and metadata using the /rmeta
67+
The parser interface extracts text and metadata using the /rmeta
6268
interface. This is one of the better ways to get the internal XHTML
6369
content extracted.
6470

65-
Note:
71+
Note:
6672
![Alert Icon](https://github.com/adam-p/markdown-here/raw/master/src/common/images/icon28.png "Alert")
6773
The parser interface needs the following environment variable set on the console for printing of the extracted content.
6874
```export PYTHONIOENCODING=utf8```
@@ -85,7 +91,7 @@ Specify Output Format To XHTML
8591
---------------------
8692
The parser interface is optionally able to output the content as XHTML rather than plain text.
8793

88-
Note:
94+
Note:
8995
![Alert Icon](https://github.com/adam-p/markdown-here/raw/master/src/common/images/icon28.png "Alert")
9096
The parser interface needs the following environment variable set on the console for printing of the extracted content.
9197
```export PYTHONIOENCODING=utf8```
@@ -129,7 +135,7 @@ print(detector.from_file('/path/to/file'))
129135
Config Interface
130136
----------------------
131137
The config interface allows you to inspect the Tika Server environment's
132-
configuration including what parsers, mime types, and detectors the
138+
configuration including what parsers, mime types, and detectors the
133139
server has been configured with.
134140

135141
```
@@ -143,7 +149,7 @@ print(config.getDetectors())
143149

144150
Language Detection Interface
145151
---------------------------------
146-
The language detection interface provides a 2 character language
152+
The language detection interface provides a 2 character language
147153
code texted based on the text in provided file.
148154

149155
```
@@ -187,10 +193,10 @@ Changing the Tika Classpath
187193
---------------------------
188194
You can update the classpath that Tika server uses by
189195
setting the classpath as a set of ':' delimited strings.
190-
For example if you want to get Tika-Python working with
196+
For example if you want to get Tika-Python working with
191197
[GeoTopicParsing](http://wiki.apache.org/tika/GeoTopicParser),
192198
you can do this, replace paths below with your own paths, as
193-
identified [here](http://wiki.apache.org/tika/GeoTopicParser)
199+
identified [here](http://wiki.apache.org/tika/GeoTopicParser)
194200
and make sure that you have done this:
195201

196202
kill Tika server (if already running):

0 commit comments

Comments
 (0)