Skip to content

Commit 34ed6a2

Browse files
authored
Feature/chunked transfer encoding (#24)
Add support for chunked transfer encoding
1 parent 2e4c843 commit 34ed6a2

File tree

10 files changed

+271
-136
lines changed

10 files changed

+271
-136
lines changed

LICENSE

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
Copyright 2018 MGH & BWH Center for Clinical Data Science
1+
Copyright 2020 MGH Computational Pathology
22

33
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
44

docs/development.rst

Lines changed: 11 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ Source code is available at Github and can be cloned via git:
77

88
.. code-block:: none
99
10-
git clone https://github.com/clindatsci/dicomweb-client ~/dicomweb-client
10+
git clone https://github.com/mghcomputationalpathology/dicomweb-client ~/dicomweb-client
1111
1212
The :mod:`dicomweb_client` package can be installed in *develop* mode for local development:
1313

@@ -30,19 +30,25 @@ Before creating a pull request on Github, read the coding style guideline, run t
3030
Coding style
3131
------------
3232

33-
Code must comply with `PEP 8 <https://www.python.org/dev/peps/pep-0008/>`_. The `flake8 <http://flake8.pycqa.org/en/latest/>`_ package is used to enforce compliance.
33+
Code must comply with `PEP 8 <https://www.python.org/dev/peps/pep-0008/>`_.
34+
The `flake8 <http://flake8.pycqa.org/en/latest/>`_ package is used to enforce compliance.
3435

35-
The project uses `numpydoc <https://github.com/numpy/numpydoc/>`_ for documenting code according to `PEP 257 <https://www.python.org/dev/peps/pep-0257/>`_ docstring conventions. Further information and examples for the NumPy style can be found at the `NumPy Github repository <https://github.com/numpy/numpy/blob/master/doc/HOWTO_DOCUMENT.rst.txt>`_ and the website of the `Napoleon sphinx extension <https://sphinxcontrib-napoleon.readthedocs.io/en/latest/example_numpy.html#example-numpy>`_.
36+
The project uses `numpydoc <https://github.com/numpy/numpydoc/>`_ for documenting code according to `PEP 257 <https://www.python.org/dev/peps/pep-0257/>`_ docstring conventions.
37+
Further information and examples for the NumPy style can be found at the `NumPy Github repository <https://github.com/numpy/numpy/blob/master/doc/HOWTO_DOCUMENT.rst.txt>`_ and the website of the `Napoleon sphinx extension <https://sphinxcontrib-napoleon.readthedocs.io/en/latest/example_numpy.html#example-numpy>`_.
3638

37-
All API classes, functions and modules must be documented (including "private" functions and methods). Each docstring must describe input parameters and return values. Types must be specified using type hints as specified by `PEP 484 <https://www.python.org/dev/peps/pep-0484/>`_ (see `typing <https://docs.python.org/3/library/typing.html>`_ module).
39+
All API classes, functions and modules must be documented (including "private" functions and methods).
40+
Each docstring must describe input parameters and return values.
41+
Types must be specified using type hints as specified by `PEP 484 <https://www.python.org/dev/peps/pep-0484/>`_ (see `typing <https://docs.python.org/3/library/typing.html>`_ module) in both the function definition as well as the docstring.
3842

3943

4044
.. _running-tests:
4145

4246
Running tests
4347
-------------
4448

45-
The project uses `pytest <http://doc.pytest.org/en/latest/>`_ to write and runs unit tests. Tests should be placed in a separate ``tests`` folder within the package root folder. Files containing actual test code should follow the pattern ``test_*.py``.
49+
The project uses `pytest <http://doc.pytest.org/en/latest/>`_ to write and runs unit tests.
50+
Tests should be placed in a separate ``tests`` folder within the package root folder.
51+
Files containing actual test code should follow the pattern ``test_*.py``.
4652

4753
Install requirements:
4854

docs/installation.rst

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ Installation guide
88
Requirements
99
------------
1010

11-
* `Python <https://www.python.org/>`_ (version 2.7 or higher)
11+
* `Python <https://www.python.org/>`_ (version 3.5 or higher)
1212
* Python package manager `pip <https://pip.pypa.io/en/stable/>`_
1313

1414
For support of image formats:
@@ -32,6 +32,5 @@ Source code available at Github:
3232

3333
.. code-block:: none
3434
35-
git clone https://github.com/clindatsci/dicomweb-client ~/dicomweb-client
35+
git clone https://github.com/mghcomputationalpathology/dicomweb-client ~/dicomweb-client
3636
pip install ~/dicomweb-client
37-

src/dicomweb_client/__init__.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,4 @@
1-
__version__ = '0.20.0'
1+
__version__ = '0.21.0rc'
2+
23

34
from dicomweb_client.api import DICOMwebClient

src/dicomweb_client/api.py

Lines changed: 115 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,7 @@
55
import logging
66
import email
77
import six
8+
import xml.etree.ElementTree as ET
89
from collections import OrderedDict
910
from io import BytesIO
1011
from urllib.parse import quote_plus, urlparse
@@ -177,6 +178,41 @@ def load_json_dataset(dataset: Dict[str, dict]) -> pydicom.dataset.Dataset:
177178
return ds
178179

179180

181+
def _load_xml_dataset(dataset: ET) -> pydicom.dataset.Dataset:
182+
'''Loads DICOM Data Set in DICOM XML format.
183+
184+
Parameters
185+
----------
186+
dataset: xml.etree.ElementTree
187+
element tree
188+
189+
Returns
190+
-------
191+
pydicom.dataset.Dataset
192+
data set
193+
194+
'''
195+
ds = pydicom.Dataset()
196+
for element in dataset:
197+
keyword = element.attrib['keyword']
198+
vr = element.attrib['vr']
199+
if vr == 'SQ':
200+
value = [
201+
_load_xml_dataset(item)
202+
for item in element
203+
]
204+
else:
205+
value = list(element)
206+
if len(value) == 1:
207+
value = value[0].text.strip()
208+
elif len(value) > 1:
209+
value = [v.text.strip() for v in value]
210+
else:
211+
value = None
212+
setattr(ds, keyword, value)
213+
return ds
214+
215+
180216
class DICOMwebClient(object):
181217

182218
'''Class for connecting to and interacting with a DICOMweb RESTful service.
@@ -216,7 +252,8 @@ def __init__(
216252
headers: Optional[Dict[str, Union[str, Sequence[str]]]] = None,
217253
callback: Optional[Callable] = None,
218254
auth: Optional[requests.auth.AuthBase] = None,
219-
gcp_service_account_key_file: Optional[str] = None
255+
gcp_service_account_key_file: Optional[str] = None,
256+
chunk_size: Optional[int] = None
220257
) -> None:
221258
'''
222259
Parameters
@@ -256,6 +293,10 @@ def __init__(
256293
JSON format to be used for authentication with Google Cloud
257294
Healthcare services
258295
(see `Google Cloud Healthcare API authentication <https://cloud.google.com/healthcare/docs/how-tos/authentication>`)
296+
chunk_size: int, optional
297+
maximum number of bytes per data chunk using chunked transfer
298+
encoding (helpful for storing and retrieving large objects or large
299+
collections of objects such as studies or series)
259300
260301
''' # noqa
261302
logger.debug('initialize HTTP session')
@@ -341,6 +382,7 @@ def __init__(
341382
'No password provided for user "{0}".'.format(username)
342383
)
343384
self._session.auth = (username, password)
385+
self._chunk_size = chunk_size
344386

345387
def _get_gcp_session(
346388
self,
@@ -648,7 +690,9 @@ def _http_get(
648690
params = {}
649691
url += self._build_query_string(params)
650692
logger.debug('GET: {} {}'.format(url, headers))
651-
response = self._session.get(url=url, headers=headers)
693+
# Setting stream allows for retrieval of data in chunks using
694+
# the iter_content() method
695+
response = self._session.get(url=url, headers=headers, stream=True)
652696
try:
653697
response.raise_for_status()
654698
except requests.exceptions.HTTPError as error:
@@ -710,10 +754,11 @@ def _decode_multipart_message(
710754
message parts
711755
712756
'''
713-
header = ''
714-
for key, value in headers.items():
715-
header += '{}: {}\n'.format(key, value)
716-
message = email.message_from_bytes(header.encode() + body)
757+
header = ''.join([
758+
'{}: {}\n'.format(key, value)
759+
for key, value in headers.items()
760+
]).encode()
761+
message = email.message_from_bytes(header + body)
717762
elements = []
718763
for part in message.walk():
719764
if part.get_content_maintype() == 'multipart':
@@ -997,8 +1042,17 @@ def _http_get_multipart_application_dicom(
9971042
),
9981043
}
9991044
response = self._http_get(url, params, headers)
1045+
with response as r:
1046+
if self._chunk_size is not None:
1047+
logger.info('retrieve data in chunks')
1048+
content = b''.join([
1049+
chunk
1050+
for chunk in r.iter_content(chunk_size=self._chunk_size)
1051+
])
1052+
else:
1053+
content = r.content
10001054
datasets = self._decode_multipart_message(
1001-
response.content,
1055+
content,
10021056
response.headers
10031057
)
10041058
return [pydicom.dcmread(BytesIO(ds)) for ds in datasets]
@@ -1357,16 +1411,53 @@ def _http_post(
13571411
13581412
'''
13591413
logger.debug('POST: {} {}'.format(url, headers))
1360-
response = self._session.post(url=url, data=data, headers=headers)
1414+
1415+
def serve_data_chunks(data):
1416+
for i, offset in enumerate(range(0, len(data), self._chunk_size)):
1417+
end = offset + self._chunk_size
1418+
yield data[offset:end]
1419+
1420+
if self._chunk_size is not None and len(data) > self._chunk_size:
1421+
logger.info('store data in chunks using chunked transfer encoding')
1422+
chunked_headers = dict(headers)
1423+
chunked_headers['Transfer-Encoding'] = 'chunked'
1424+
chunked_headers['Cache-Control'] = 'no-cache'
1425+
chunked_headers['Connection'] = 'Keep-Alive'
1426+
data_chunks = serve_data_chunks(data)
1427+
response = self._session.post(
1428+
url=url,
1429+
data=data_chunks,
1430+
headers=headers
1431+
)
1432+
else:
1433+
response = self._session.post(url=url, data=data, headers=headers)
13611434
logger.debug('request status code: {}'.format(response.status_code))
1362-
response.raise_for_status()
1435+
try:
1436+
response.raise_for_status()
1437+
except requests.exceptions.HTTPError as error:
1438+
raise HTTPError(error)
1439+
except requests.exceptions.ConnectionError as error:
1440+
raise HTTPError(error[0])
1441+
if not response.ok:
1442+
logger.warning('storage was not successful for all instances')
1443+
payload = response.content
1444+
tree = ET.fromstring(payload)
1445+
dataset = _load_xml_dataset(tree)
1446+
failed_sop_sequence = getattr(dataset, 'FailedSOPSequence', [])
1447+
for failed_sop_item in failed_sop_sequence:
1448+
logger.error(
1449+
'storage of instance {} failed: "{}"'.format(
1450+
failed_sop_item.ReferencedSOPInstanceUID,
1451+
failed_sop_item.FailureReason
1452+
)
1453+
)
13631454
return response
13641455

13651456
def _http_post_multipart_application_dicom(
13661457
self,
13671458
url: str,
13681459
data: bytes
1369-
) -> Dict[str, dict]:
1460+
) -> Union[None, Dict[str, dict]]:
13701461
'''Performs a HTTP POST request with a multipart payload with
13711462
"application/dicom" media type.
13721463
@@ -1380,7 +1471,7 @@ def _http_post_multipart_application_dicom(
13801471
Returns
13811472
-------
13821473
Dict[str, dict]
1383-
information about stored instances in DICOM JSON format
1474+
information about stored instances
13841475
13851476
'''
13861477
content_type = (
@@ -1389,20 +1480,20 @@ def _http_post_multipart_application_dicom(
13891480
'boundary=0f3cf5c0-70e0-41ef-baef-c6f9f65ec3e1'
13901481
)
13911482
content = self._encode_multipart_message(data, content_type)
1392-
self._http_post(
1483+
response = self._http_post(
13931484
url,
13941485
content,
13951486
headers={'Content-Type': content_type}
13961487
)
1397-
# FIXME: return information
1398-
# http://dicom.nema.org/medical/dicom/current/output/chtml/part18/chapter_I.html
1399-
# response = self._http_post(
1400-
# url,
1401-
# content,
1402-
# headers={'Content-Type': content_type}
1403-
# )
1404-
# response.content
1405-
return {}
1488+
if response.content:
1489+
if (response.headers['Content-Type'] == 'application/dicom+json' or
1490+
response.headers['Content-Type'] == 'application/json'):
1491+
return load_json_dataset(response.json())
1492+
elif (response.headers['Content-Type'] == 'application/dicom+xml' or
1493+
response.headers['Content-Type'] == 'application/xml'):
1494+
tree = ET.fromstring(response.content)
1495+
return _load_xml_dataset(tree)
1496+
return None
14061497

14071498
def search_for_studies(
14081499
self,
@@ -2039,16 +2130,16 @@ def store_instances(
20392130
Returns
20402131
-------
20412132
Dict[str, dict]
2042-
information about status of stored instances in DICOM JSON format
2133+
information about status of stored instances
20432134
20442135
'''
20452136
url = self._get_studies_url('stow', study_instance_uid)
20462137
encoded_datasets = list()
2047-
# TODO: can we do this more memory efficient? Concatenations?
20482138
for ds in datasets:
20492139
with BytesIO() as b:
20502140
pydicom.dcmwrite(b, ds)
2051-
encoded_datasets.append(b.getvalue())
2141+
encoded_ds = b.getvalue()
2142+
encoded_datasets.append(encoded_ds)
20522143
return self._http_post_multipart_application_dicom(
20532144
url,
20542145
encoded_datasets

0 commit comments

Comments
 (0)