Skip to content

Commit 6abeda7

Browse files
Merge pull request #427 from griffin-rickle/improvement/hash_verification_algo
Allow users to choose between md5 and sha1 for JAR checksum verification
2 parents e3cc7aa + 70b128f commit 6abeda7

File tree

2 files changed

+18
-12
lines changed

2 files changed

+18
-12
lines changed

README.md

Lines changed: 12 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -39,15 +39,18 @@ These are read once, when tika/tika.py is initially loaded and used throughout a
3939
2. `TIKA_SERVER_JAR` - set to the full URL to the remote Tika server jar to download and cache.
4040
3. `TIKA_SERVER_ENDPOINT` - set to the host (local or remote) for the running Tika server jar.
4141
4. `TIKA_CLIENT_ONLY` - if set to True, then `TIKA_SERVER_JAR` is ignored, and relies on the value for `TIKA_SERVER_ENDPOINT` and treats Tika like a REST client.
42-
5. `TIKA_TRANSLATOR` - set to the fully qualified class name (defaults to Lingo24) for the Tika translator implementation.
43-
6. `TIKA_SERVER_CLASSPATH` - set to a string (delimited by ':' for each additional path) to prepend to the Tika server jar path.
44-
7. `TIKA_LOG_PATH` - set to a directory with write permissions and the `tika.log` and `tika-server.log` files will be placed in this directory.
45-
8. `TIKA_PATH` - set to a directory with write permissions and the `tika_server.jar` file will be placed in this directory.
46-
9. `TIKA_JAVA` - set the Java runtime name, e.g., `java` or `java9`
47-
10. `TIKA_STARTUP_SLEEP` - number of seconds (`float`) to wait per check if Tika server is launched at runtime
48-
11. `TIKA_STARTUP_MAX_RETRY` - number of checks (`int`) to attempt for Tika server startup if launched at runtime
49-
12. `TIKA_JAVA_ARGS` - set java runtime arguments, e.g, `-Xmx4g`
50-
13. `TIKA_LOG_FILE` - set the filename for the log file. default: `tika.log`. if it is an empty string (`''`), no log file is created.
42+
3. `TIKA_JAR_HASH_ALGO` - set to `sha1` when running on FIPS-compliant systems; default value is `md5`.
43+
4. `TIKA_SERVER_ENDPOINT` - set to the host (local or remote) for the running Tika server jar.
44+
5. `TIKA_CLIENT_ONLY` - if set to True, then `TIKA_SERVER_JAR` is ignored, and relies on the value for `TIKA_SERVER_ENDPOINT` and treats Tika like a REST client.
45+
6. `TIKA_TRANSLATOR` - set to the fully qualified class name (defaults to Lingo24) for the Tika translator implementation.
46+
7. `TIKA_SERVER_CLASSPATH` - set to a string (delimited by ':' for each additional path) to prepend to the Tika server jar path.
47+
8. `TIKA_LOG_PATH` - set to a directory with write permissions and the `tika.log` and `tika-server.log` files will be placed in this directory.
48+
9. `TIKA_PATH` - set to a directory with write permissions and the `tika_server.jar` file will be placed in this directory.
49+
10. `TIKA_JAVA` - set the Java runtime name, e.g., `java` or `java9`
50+
11. `TIKA_STARTUP_SLEEP` - number of seconds (`float`) to wait per check if Tika server is launched at runtime
51+
12. `TIKA_STARTUP_MAX_RETRY` - number of checks (`int`) to attempt for Tika server startup if launched at runtime
52+
13. `TIKA_JAVA_ARGS` - set java runtime arguments, e.g, `-Xmx4g`
53+
14. `TIKA_LOG_FILE` - set the filename for the log file. default: `tika.log`. if it is an empty string (`''`), no log file is created.
5154

5255
Testing it out
5356
==============

tika/tika.py

Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -173,6 +173,7 @@ def make_content_disposition_header(fn):
173173
TikaServerJar = os.getenv(
174174
'TIKA_SERVER_JAR',
175175
"http://search.maven.org/remotecontent?filepath=org/apache/tika/tika-server-standard/"+TikaVersion+"/tika-server-standard-"+TikaVersion+".jar")
176+
TikaJarHashAlgo=os.getenv('TIKA_JAR_HASH_ALGO', 'md5')
176177
ServerHost = "localhost"
177178
Port = "9998"
178179
ServerEndpoint = os.getenv(
@@ -609,9 +610,11 @@ def checkJarSig(tikaServerJar, jarPath):
609610
:param jarPath:
610611
:return: ``True`` if the signature of the jar matches
611612
'''
612-
if not os.path.isfile(jarPath + ".md5"):
613-
getRemoteJar(tikaServerJar + ".md5", jarPath + ".md5")
614-
m = hashlib.md5()
613+
localChecksumPath = '.'.join([jarPath, TikaJarHashAlgo])
614+
if not os.path.isfile(localChecksumPath):
615+
remoteChecksum = '.'.join([tikaServerJar, TikaJarHashAlgo])
616+
getRemoteJar(remoteChecksum, localChecksumPath)
617+
m = hashlib.new(TikaJarHashAlgo)
615618
with open(jarPath, 'rb') as f:
616619
binContents = f.read()
617620
m.update(binContents)

0 commit comments

Comments
 (0)