You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This makes Apache Tika available as a Python library,
10
+
This makes Apache Tika available as a Python library,
11
11
installable via Setuptools, Pip and Easy Install.
12
12
13
13
To use this library, you need to have Java 7+ installed on your
@@ -22,8 +22,14 @@ Installation (with pip)
22
22
23
23
Installation (without pip)
24
24
--------------------------
25
-
1.`python setup.py build`
26
-
2.`python setup.py install`
25
+
1.`python setup.py build`
26
+
2.`python setup.py install`
27
+
28
+
Airgap Environment Setup
29
+
------------------------
30
+
To get this working in a disconnected environment, download a tika server file and set the TIKA_SERVER_JAR environment variable to TIKA_SERVER_JAR="file:///<yourpath>/tika-server.jar" which successfully tells `python-tika` to "download" this file and move it to `/tmp/tika-server.jar` and run as background process.
31
+
32
+
This is the only way to run `python-tika` without internet access. Without this set, the default is to check the tika version and pull latest every time from Apache.
27
33
28
34
Environment Variables
29
35
---------------------
@@ -58,11 +64,11 @@ print(parsed["content"])
58
64
59
65
Parser Interface
60
66
----------------------
61
-
The parser interface extracts text and metadata using the /rmeta
67
+
The parser interface extracts text and metadata using the /rmeta
62
68
interface. This is one of the better ways to get the internal XHTML
0 commit comments