Skip to content

Commit a183472

Browse files
committed
Update README.md
1 parent 6eacf56 commit a183472

File tree

1 file changed

+31
-0
lines changed

1 file changed

+31
-0
lines changed

README.md

Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -91,6 +91,37 @@ parsed = parser.from_file('/path/to/file', 'http://tika:9998/tika')
9191
string_parsed = parser.from_buffer('Good evening, Dave', 'http://tika:9998/tika')
9292
```
9393

94+
You can also pass a binary stream
95+
```
96+
with open(file, 'rb') as file_obj:
97+
response = tika.parser.from_file(file_obj)
98+
```
99+
100+
Gzip compression
101+
---------------------
102+
Since Tika 1.24.1 gzip compression of input and output streams is allowed.
103+
104+
Input compression can be achieved with gzip or zlib:
105+
```
106+
import zlib
107+
108+
with open(file, 'rb') as file_obj:
109+
return tika.parser.from_buffer(zlib.compress(file_obj.read()))
110+
111+
...
112+
113+
import gzip
114+
115+
with open(file, 'rb') as file_obj:
116+
return tika.parser.from_buffer(gzip.compress(file_obj.read()))
117+
```
118+
119+
And output with the header:
120+
```
121+
with open(file, 'rb') as file_obj:
122+
return tika.parser.from_file(file_obj, headers={'Accept-Encoding': 'gzip, deflate'})
123+
```
124+
94125
Specify Output Format To XHTML
95126
---------------------
96127
The parser interface is optionally able to output the content as XHTML rather than plain text.

0 commit comments

Comments
 (0)