Skip to content

Commit fd22fdd

Browse files
authored
Merge pull request #301 from pbashyal-nmdp/fix-XX-cwd-redux
Fix XX CWD redux
2 parents e460282 + 8096ea2 commit fd22fdd

File tree

10 files changed

+83
-35
lines changed

10 files changed

+83
-35
lines changed

Dockerfile

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ LABEL MAINTAINER="Pradeep Bashyal"
44

55
WORKDIR /app
66

7-
ARG PY_ARD_VERSION=1.0.10
7+
ARG PY_ARD_VERSION=1.0.11
88

99
COPY requirements.txt /app
1010
RUN pip install --no-cache-dir --upgrade pip && \

README.md

Lines changed: 60 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -8,10 +8,19 @@ Swiss army knife of **HLA** Nomenclature
88

99
### `py-ard` is ARD reduction for HLA in Python
1010

11-
Human leukocyte antigen (HLA) genes encode cell surface proteins that are important for immune regulation. Exons encoding the Antigen Recognition Domain (ARD) are the most polymorphic region of HLA genes and are important for donor/recipient [HLA matching](https://bethematch.org/patients-and-families/before-transplant/find-a-donor/hla-matching/). The history of allele typing methods has played a major role in determining resolution and ambiguity of reported HLA values. Although HLA [nomenclature](https://www.theatlantic.com/magazine/archive/2023/04/clint-smith-nomenclature-poem/673097/) has not always conformed to the same standard, it is now defined by [The WHO Nomenclature Committee for Factors of the HLA System](https://hla.alleles.org/nomenclature/committee.html). `py-ard` is aware of the variation in historical resolutions and grouping and is able to translate from one representation to another based on alleles published quarterly by [IPD/IMGT-HLA](https://github.com/ANHIG/IMGTHLA/).
12-
11+
Human leukocyte antigen (HLA) genes encode cell surface proteins that are important for immune regulation. Exons
12+
encoding the Antigen Recognition Domain (ARD) are the most polymorphic region of HLA genes and are important for
13+
donor/recipient [HLA matching](https://bethematch.org/patients-and-families/before-transplant/find-a-donor/hla-matching/).
14+
The history of allele typing methods has played a major role in determining resolution and ambiguity of reported HLA
15+
values. Although
16+
HLA [nomenclature](https://www.theatlantic.com/magazine/archive/2023/04/clint-smith-nomenclature-poem/673097/) has not
17+
always conformed to the same standard, it is now defined
18+
by [The WHO Nomenclature Committee for Factors of the HLA System](https://hla.alleles.org/nomenclature/committee.html). `py-ard`
19+
is aware of the variation in historical resolutions and grouping and is able to translate from one representation to
20+
another based on alleles published quarterly by [IPD/IMGT-HLA](https://github.com/ANHIG/IMGTHLA/).
1321

1422
## Table of Contents
23+
1524
1. [Installation](#installation)
1625
* [Install From PyPi](#install-from-pypi)
1726
* [Install With Homebrew](#install-with-homebrew)
@@ -31,15 +40,17 @@ Human leukocyte antigen (HLA) genes encode cell surface proteins that are import
3140
5. [Docker Deployment](#docker-deployment-of-py-ard-rest-web-service)
3241

3342
## Installation
43+
3444
`py-ard` works with Python 3.8 and higher.
3545

3646
### Install from PyPi
3747

3848
```shell
3949
pip install py-ard
4050
```
41-
Note: With `py-ard` version *1.0.0* and higher, the redux API has changed. If your use requires the older API, please install with `pip install py-ard==0.9.2`
4251

52+
Note: With `py-ard` version *1.0.0* and higher, the redux API has changed. If your use requires the older API, please
53+
install with `pip install py-ard==0.9.2`
4354

4455
### Install With Homebrew
4556

@@ -75,7 +86,8 @@ See [Our Contribution Guide](CONTRIBUTING.rst) for open source contribution to `
7586

7687
### Using `py-ard` from Python code
7788

78-
`py-ard` can be used in a program to reduce/expand HLA GL String representation. If pyard discovers an invalid Allele, it'll throw an Invalid Exception, not silently return an empty result.
89+
`py-ard` can be used in a program to reduce/expand HLA GL String representation. If pyard discovers an invalid Allele,
90+
it'll throw an Invalid Exception, not silently return an empty result.
7991

8092
#### Initialize `py-ard`
8193

@@ -85,7 +97,6 @@ Import `pyard` package.
8597
import pyard
8698
```
8799

88-
89100
Initialize `ARD` object with a version of IMGT HLA database
90101

91102
```python
@@ -94,7 +105,10 @@ import pyard
94105
ard = pyard.init('3510')
95106
```
96107

97-
When processing a large numbers of typings, it's helpful to have a cache of previously calculated reductions to make similar typings reduce faster. The cache size of pre-computed reductions can be changed from the default of 1,000 by setting `cache_size` argument. This increases the memory footprint but will significantly increase the processing times for large number of reductions.
108+
When processing a large numbers of typings, it's helpful to have a cache of previously calculated reductions to make
109+
similar typings reduce faster. The cache size of pre-computed reductions can be changed from the default of 1,000 by
110+
setting `cache_size` argument. This increases the memory footprint but will significantly increase the processing times
111+
for large number of reductions.
98112

99113
```python
100114
import pyard
@@ -103,7 +117,8 @@ max_cache_size = 1_000_000
103117
ard = pyard.init('3510', cache_size=max_cache_size)
104118
```
105119

106-
By default, the IPD-IMGT/HLA data is stored locally in `$TMPDIR/pyard`. This may be removed when your computer restarts. You can specify a different, more permanent directory for the cached data.
120+
By default, the IPD-IMGT/HLA data is stored locally in `$TMPDIR/pyard`. This may be removed when your computer restarts.
121+
You can specify a different, more permanent directory for the cached data.
107122

108123
```python
109124
import pyard.ard
@@ -126,13 +141,15 @@ ard = pyard.init()
126141
```
127142

128143
You can check the current version of IPD-IMGT/HLA database.
144+
129145
```python
130146
ard.get_db_version()
131147
```
132148

133149
### Reduce Typings
134150

135-
**Note**: Previous to version of 1.0.0 release of `py-ard`, there was `redux` and `redux_gl` methods on `ard`. They have been consolidated so that `redux` handles both GL Strings and individual alleles.
151+
**Note**: Previous to version of 1.0.0 release of `py-ard`, there was `redux` and `redux_gl` methods on `ard`. They have
152+
been consolidated so that `redux` handles both GL Strings and individual alleles.
136153

137154
Reduce a single locus HLA Typing by specifying the allele/MAC/XX code and the reduction method to `redux` method.
138155

@@ -167,16 +184,16 @@ ard.redux('B14', 'lg')
167184

168185
## Valid Reduction Types
169186

170-
| Reduction Type | Description |
171-
|----------------|-------------------------------------------------|
172-
| `G` | Reduce to G Group Level |
173-
| `P` | Reduce to P Group Level |
174-
| `lg` | Reduce to 2 field ARD level (append `g`) |
175-
| `lgx` | Reduce to 2 field ARD level |
176-
| `W` | Reduce/Expand to 4 field WHO nomenclature level |
177-
| `exon` | Reduce/Expand to 3 field level |
178-
| `U2` | Reduce to 2 field unambiguous level |
179-
| `S` | Reduce to Serological level |
187+
| Reduction Type | Description |
188+
|----------------|-----------------------------------------------------------|
189+
| `G` | Reduce to G Group Level |
190+
| `P` | Reduce to P Group Level |
191+
| `lg` | Reduce to 2 field ARD level (append `g`) |
192+
| `lgx` | Reduce to 2 field ARD level |
193+
| `W` | Reduce/Expand to full field(4,3,2) WHO nomenclature level |
194+
| `exon` | Reduce/Expand to 3 field level |
195+
| `U2` | Reduce to 2 field unambiguous level |
196+
| `S` | Reduce to Serological level |
180197

181198
### Perform DRB1 blending with DRB3, DRB4 and DRB5
182199

@@ -195,6 +212,7 @@ looking up MAC representation. See [MAC Service UI](https://hml.nmdp.org/MacUI/)
195212
### Expand MAC
196213

197214
You can also use `py-ard` to expand MAC codes. Use `expand_mac` method on `ard`.
215+
198216
```python
199217
ard.expand_mac('HLA-A*01:BC')
200218
# 'HLA-A*01:02/HLA-A*01:03'
@@ -212,6 +230,7 @@ ard.lookup_mac('A*01:02/A*01:01/A*01:03')
212230
### CWD Reduction
213231

214232
Reduce a MAC code or an allele list GL String to CWD reduced list.
233+
215234
```python
216235
ard.cwd_redux("B*15:01:01/B*15:01:03/B*15:04/B*15:07/B*15:26N/B*15:27")
217236
# => B*15:01/B*15:07
@@ -226,19 +245,24 @@ ard.lookup_mac(ard.cwd_redux("B*15:01:01/B*15:01:03/B*15:04/B*15:07/B*15:26N/B*1
226245

227246
### Using `py-ard` from R code
228247

229-
`py-ard` works well from `R` as well. Please see [Using pyard from R language](https://github.com/nmdp-bioinformatics/py-ard/wiki/Using-pyard-library-from-R-language) page for detailed walkthrough.
248+
`py-ard` works well from `R` as well. Please
249+
see [Using pyard from R language](https://github.com/nmdp-bioinformatics/py-ard/wiki/Using-pyard-library-from-R-language)
250+
page for detailed walkthrough.
230251

231252
## Command Line Tools
232253

233-
Various command line interface (CLI) tools are available to use for managing local IPD-IMGT/HLA cache database, running impromptu reduction queries and batch processing of CSV files.
254+
Various command line interface (CLI) tools are available to use for managing local IPD-IMGT/HLA cache database, running
255+
impromptu reduction queries and batch processing of CSV files.
234256

235-
For all tools, use `--imgt-version` and `--data-dir` to specify the IPD-IMGT/HLA database version and the directory where the SQLite files are created.
257+
For all tools, use `--imgt-version` and `--data-dir` to specify the IPD-IMGT/HLA database version and the directory
258+
where the SQLite files are created.
236259

237260
### `pyard-import` Import the latest IPD-IMGT/HLA database
238261

239262
`pyard-import` helps with importing and reinstalling of prepared IPD-IMGT/HLA and MAC data.
240263

241264
Use `pyard-import -h` to see all the options available.
265+
242266
```shell
243267
$ pyard-import -h
244268
usage: pyard-import [-h] [--list] [-i IMGT_VERSION] [-d DATA_DIR] [--v2-to-v3-mapping V2_V3_MAPPING] [--refresh-mac] [--re-install] [--skip-mac]
@@ -313,13 +337,15 @@ $ pyard-import --imgt-version 3150 --skip-mac
313337

314338
Show the statuses of all `py-ard` databases
315339

316-
`pyard-status` goes through all the available databases and checks all the tables that should be available. This is very helpful to show all the databases, number of rows in each table, any missing tables and the stored IPD-IMGT/HLA version.
340+
`pyard-status` goes through all the available databases and checks all the tables that should be available. This is very
341+
helpful to show all the databases, number of rows in each table, any missing tables and the stored IPD-IMGT/HLA version.
317342

318343
```shell
319344
$ pyard-status
320345
```
321346

322347
Use ` --data-dir` to specify an alternate directory for cached database files.
348+
323349
```shell
324350
$ pyard-status --data-dir ~/.pyard/
325351
IMGT DB Version: Latest (3440)
@@ -348,7 +374,9 @@ Size: 533.37MB
348374

349375
### `pyard` Redux quickly
350376

351-
`pyard` command can be used for quick reductions from the command line. Use `--help` option to see all the available options.
377+
`pyard` command can be used for quick reductions from the command line. Use `--help` option to see all the available
378+
options.
379+
352380
```shell
353381
$ pyard --help
354382
usage: pyard [-h] [-v] [-d DATA_DIR] [-i IMGT_VERSION] [-g GL_STRING]
@@ -371,7 +399,8 @@ options:
371399

372400
```
373401

374-
Reduce from command line by specifying any typing with `-g` or `--gl` option and the reduction method with `-r` or `--redux-type` option.
402+
Reduce from command line by specifying any typing with `-g` or `--gl` option and the reduction method with `-r`
403+
or `--redux-type` option.
375404

376405
```shell
377406
$ pyard -g 'A*01:AB' -r lgx
@@ -429,14 +458,16 @@ B14 = B64/B65
429458

430459
### `pyard-csv-reduce` Batch Reduce a CSV file
431460

432-
`pyard-csv-reduce` can be used to batch process a CSV file with HLA typings. See [documentation](extras/README.md) for detailed information about all the options.
433-
461+
`pyard-csv-reduce` can be used to batch process a CSV file with HLA typings. See [documentation](extras/README.md) for
462+
detailed information about all the options.
434463

435464
## `py-ard` REST Web Service
436465

437466
Run `py-ard` as a service so that it can be accessed as a REST service endpoint.
438467

439-
To start in debug mode, you can run the `app.py` script. The endpoint should then be available at [localhost:8080](http://0.0.0.0:8080)
468+
To start in debug mode, you can run the `app.py` script. The endpoint should then be available
469+
at [localhost:8080](http://0.0.0.0:8080)
470+
440471
```shell
441472
$ python app.py
442473
* Serving Flask app 'app'
@@ -453,13 +484,15 @@ Press CTRL+C to quit
453484
For deploying to production, build a Docker image and use that image for deploying to a server.
454485

455486
Build the docker image:
487+
456488
```shell
457489
make docker-build
458490
```
459491

460492
builds a Docker image named `pyard-service:latest`
461493

462494
Build the docker and run it with:
495+
463496
```shell
464497
make docker
465498
```

api-spec.yaml

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@ openapi: 3.0.3
22
info:
33
title: ARD Reduction
44
description: Reduce to ARD Level
5-
version: "1.0.10"
5+
version: "1.0.11"
66
servers:
77
- url: 'http://localhost:8080'
88
tags:
@@ -65,6 +65,7 @@ paths:
6565
- W
6666
- exon
6767
- U2
68+
- S
6869
example: "lgx"
6970
responses:
7071
200:

api.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -126,7 +126,7 @@ def cwd_redux_controller():
126126
return {"message": "gl_string and reduction_method not provided"}, 404
127127
# Perform redux
128128
try:
129-
cwd = ard.cwd_redux(ard.redux(gl_string, "lgx"))
129+
cwd = ard.cwd_redux(gl_string)
130130
except PyArdError as e:
131131
return {"message": e.message}, 400
132132

pyard/__init__.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,7 @@
2727
from .misc import get_imgt_db_versions as db_versions
2828

2929
__author__ = """NMDP Bioinformatics"""
30-
__version__ = "1.0.10"
30+
__version__ = "1.0.11"
3131

3232

3333
def init(

pyard/ard.py

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -328,8 +328,11 @@ def _sorted_unique_gl(self, gls: List[str], delim: str) -> str:
328328

329329
if delim == "+":
330330
# No need to make unique. eg. homozygous cases are valid for SLUGs
331+
non_empty_gls = filter(lambda s: s != "", gls)
331332
return delim.join(
332-
sorted(gls, key=functools.cmp_to_key(self.smart_sort_comparator))
333+
sorted(
334+
non_empty_gls, key=functools.cmp_to_key(self.smart_sort_comparator)
335+
)
333336
)
334337

335338
# generate a unique list over a delimiter
@@ -838,7 +841,7 @@ def cwd_redux(self, allele_list_gl):
838841
for allele in allele_list_gl.split("/"):
839842
if self.is_mac(allele):
840843
alleles.extend(self.expand_mac(allele).split("/"))
841-
elif is_2_field_allele(allele) and not self.is_mac(allele):
844+
elif is_2_field_allele(allele) and not self.is_XX(allele):
842845
alleles.append(allele)
843846
else:
844847
alleles.extend(self.redux(allele, "lgx").split("/"))

setup.cfg

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
[bumpversion]
2-
current_version = 1.0.10
2+
current_version = 1.0.11
33
commit = True
44
tag = True
55

setup.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -36,7 +36,7 @@
3636

3737
setup(
3838
name="py-ard",
39-
version="1.0.10",
39+
version="1.0.11",
4040
description="ARD reduction for HLA with Python",
4141
long_description=readme,
4242
long_description_content_type="text/markdown",

tests/features/cwd.feature

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -45,3 +45,9 @@ Feature: CWD Reduction
4545
| C*04:KBG | C*04:01/C*04:09N |
4646
| C*04:01:01G/C*04:09N | C*04:01/C*04:09N |
4747
| B*15:01/B*15:01N/B*15:102/B*15:104 | B*15:01/B*15:01N |
48+
49+
Scenario: CWD reduction of XX alleles
50+
51+
Given the GL String we want to find CWD of is "DRB1*14:XX"
52+
When we find CWD alleles for the GL String
53+
Then the CWD alleles should be "DRB1*14:01/DRB1*14:02/DRB1*14:03/DRB1*14:04/DRB1*14:05/DRB1*14:06/DRB1*14:07/DRB1*14:08/DRB1*14:09/DRB1*14:10/DRB1*14:11/DRB1*14:12/DRB1*14:13/DRB1*14:14/DRB1*14:15/DRB1*14:16/DRB1*14:17/DRB1*14:18/DRB1*14:19/DRB1*14:20/DRB1*14:21/DRB1*14:22/DRB1*14:24/DRB1*14:25/DRB1*14:28/DRB1*14:29/DRB1*14:33/DRB1*14:48/DRB1*14:61/DRB1*14:70"

tests/features/serology_redux.feature

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -24,3 +24,8 @@ Feature: Serology Reduction
2424
| B*15:01/B*15:02/B*15:03/B*15:04 | S | B15/B62/B70/B72/B75 |
2525
| B*15:10 | S | B15/B70/B71 |
2626
| A*24:03/A*24:10/A*24:23/A*24:33/A*24:374 | S | A9/A24/A2403 |
27+
28+
29+
Examples: Skip Loci that don't have Serology mappings
30+
| Allele | Level | Redux Serology |
31+
| A*01:01+A*01:01^B*08:ASXJP+B*07:02^C*02:02+C*07:HTGM^DPB1*28:01:01G+DPB1*296:01 | S | A1+A1^B7+B8^Cw2+Cw7 |

0 commit comments

Comments
 (0)