File tree Expand file tree Collapse file tree 1 file changed +31
-0
lines changed Expand file tree Collapse file tree 1 file changed +31
-0
lines changed Original file line number Diff line number Diff line change @@ -91,6 +91,37 @@ parsed = parser.from_file('/path/to/file', 'http://tika:9998/tika')
9191string_parsed = parser.from_buffer('Good evening, Dave', 'http://tika:9998/tika')
9292```
9393
94+ You can also pass a binary stream
95+ ```
96+ with open(file, 'rb') as file_obj:
97+ response = tika.parser.from_file(file_obj)
98+ ```
99+
100+ Gzip compression
101+ ---------------------
102+ Since Tika 1.24.1 gzip compression of input and output streams is allowed.
103+
104+ Input compression can be achieved with gzip or zlib:
105+ ```
106+ import zlib
107+
108+ with open(file, 'rb') as file_obj:
109+ return tika.parser.from_buffer(zlib.compress(file_obj.read()))
110+
111+ ...
112+
113+ import gzip
114+
115+ with open(file, 'rb') as file_obj:
116+ return tika.parser.from_buffer(gzip.compress(file_obj.read()))
117+ ```
118+
119+ And output with the header:
120+ ```
121+ with open(file, 'rb') as file_obj:
122+ return tika.parser.from_file(file_obj, headers={'Accept-Encoding': 'gzip, deflate'})
123+ ```
124+
94125Specify Output Format To XHTML
95126---------------------
96127The parser interface is optionally able to output the content as XHTML rather than plain text.
You can’t perform that action at this time.
0 commit comments