|
4 | 4 |
|
5 | 5 | **Example Scripts to batch reduce HLA typings from a CSV File** |
6 | 6 |
|
7 | | -`pyard-reduce-csv` command can be used with a config file(that describes ways |
8 | | -to reduce the file) can be used to take a CSV file with HLA typing data and |
9 | | -reduce certain columns and produce a new CSV or an Excel file. |
10 | | - |
11 | | -Install `py-ard` and use `pyard-reduce-csv` command specifying the changes in a JSON |
12 | | -config file and running `pyard-reduce-csv -c <config-file>` will produce result based |
13 | | -on the configuration in the config file. |
| 7 | +`pyard-reduce-csv` command can be used with a config file(that describes ways to reduce the file) can be used to take a |
| 8 | +CSV file with HLA typing data and reduce certain columns and produce a new CSV or an Excel file. |
14 | 9 |
|
| 10 | +Install `py-ard` and use `pyard-reduce-csv` command specifying the changes in a JSON config file and |
| 11 | +running `pyard-reduce-csv -c <config-file>` to produce a resulting file based on the configuration in the config file. |
15 | 12 |
|
16 | 13 | See [Example JSON config file](reduce_conf.json). |
17 | 14 |
|
18 | | - |
19 | 15 | ### Input CSV filename |
| 16 | + |
20 | 17 | `in_csv_filename` Directory path and file name of the Input CSV file |
21 | 18 |
|
22 | 19 | ### Output CSV filename |
| 20 | + |
23 | 21 | `out_csv_filename` Directory path and file name of the Reduced Output CSV file |
24 | 22 |
|
25 | 23 | ### CSV Columns to read |
| 24 | + |
26 | 25 | `columns_from_csv` The column names to read from CSV file |
27 | 26 |
|
28 | 27 | ```json |
29 | 28 | [ |
30 | | - "nmdp_id", |
31 | | - "r_a_typ1", |
32 | | - "r_a_typ2", |
33 | | - "r_b_typ1", |
34 | | - "r_b_typ2", |
35 | | - "r_c_typ1", |
36 | | - "r_c_typ2", |
37 | | - "r_drb1_typ1", |
38 | | - "r_drb1_typ2", |
39 | | - "r_dpb1_typ1", |
40 | | - "r_dpb1_typ2" |
41 | | - ] |
| 29 | + "nmdp_id", |
| 30 | + "r_a_typ1", |
| 31 | + "r_a_typ2", |
| 32 | + "r_b_typ1", |
| 33 | + "r_b_typ2", |
| 34 | + "r_c_typ1", |
| 35 | + "r_c_typ2", |
| 36 | + "r_drb1_typ1", |
| 37 | + "r_drb1_typ2", |
| 38 | + "r_dpb1_typ1", |
| 39 | + "r_dpb1_typ2" |
| 40 | +] |
42 | 41 | ``` |
43 | 42 |
|
44 | 43 | ### CSV Columns to reduce |
| 44 | + |
45 | 45 | `columns_to_reduce_in_csv` List of columns which have typing information and need to be reduced. |
46 | 46 |
|
47 | | -**NOTE**: The locus is the 2nd term in the column name |
48 | | -E.g., for column `column R_DRB1_type1`, `DPB1` is the locus name |
| 47 | +**Important**: The locus is the 2nd term in the column name separated by `_`. The program uses this to figure out the |
| 48 | +column name for the typings in that column. |
| 49 | + |
| 50 | +E.g., for column `R_DRB1_type1`, `DPB1` is the locus name |
49 | 51 |
|
50 | 52 | ```json |
51 | 53 | [ |
52 | | - "r_a_typ1", |
53 | | - "r_a_typ2", |
54 | | - "r_b_typ1", |
55 | | - "r_b_typ2", |
56 | | - "r_c_typ1", |
57 | | - "r_c_typ2", |
58 | | - "r_drb1_typ1", |
59 | | - "r_drb1_typ2", |
60 | | - "r_dpb1_typ1", |
61 | | - "r_dpb1_typ2" |
62 | | - ], |
| 54 | + "r_a_typ1", |
| 55 | + "r_a_typ2", |
| 56 | + "r_b_typ1", |
| 57 | + "r_b_typ2", |
| 58 | + "r_c_typ1", |
| 59 | + "r_c_typ2", |
| 60 | + "r_drb1_typ1", |
| 61 | + "r_drb1_typ2", |
| 62 | + "r_dpb1_typ1", |
| 63 | + "r_dpb1_typ2" |
| 64 | +] |
63 | 65 | ``` |
64 | 66 |
|
65 | | - |
66 | 67 | ### Redux Options |
67 | | -`redux_type` Reduction Type |
68 | 68 |
|
69 | | -Valid Options: `G`, `lg` and `lgx` |
| 69 | +`redux_type` Reduction Type |
70 | 70 |
|
71 | | -### Compression Options |
72 | | -`apply_compression` Compression to use for output file |
| 71 | +Valid Options are: |
73 | 72 |
|
74 | | -Valid options: `'gzip'`, `'zip'` or `null` |
| 73 | +| Reduction Type | Description | |
| 74 | +|----------------|-------------------------------------------------| |
| 75 | +| `G` | Reduce to G Group Level | |
| 76 | +| `lg` | Reduce to 2 field ARD level (append `g`) | |
| 77 | +| `lgx` | Reduce to 2 field ARD level | |
| 78 | +| `W` | Reduce/Expand to 3 field WHO nomenclature level | |
| 79 | +| `exon` | Reduce/Expand to exon level | |
75 | 80 |
|
76 | | -### Verbose log Options |
77 | | -`log_comment` Show verbose log ? |
78 | 81 |
|
79 | | -Valid options: `true` or `false` |
| 82 | +### Kinds of typings to reduce |
80 | 83 |
|
81 | | -### Types of typings to reduce |
82 | 84 | ```json |
83 | | - "verbose_log": true, |
84 | | - "reduce_serology": false, |
85 | | - "reduce_v2": true, |
86 | | - "reduce_3field": true, |
87 | | - "reduce_P": true, |
88 | | - "reduce_XX": false, |
89 | | - "reduce_MAC": true, |
| 85 | +"reduce_serology": false, |
| 86 | +"reduce_v2": true, |
| 87 | +"convert_v2_to_v3": false, |
| 88 | +"reduce_3field": true, |
| 89 | +"reduce_P": true, |
| 90 | +"reduce_XX": false, |
| 91 | +"reduce_MAC": true, |
90 | 92 | ``` |
91 | 93 | Valid options: `true` or `false` |
92 | 94 |
|
93 | | - |
94 | 95 | ### Locus Name in Allele |
95 | | -`locus_in_allele_name` |
96 | | -Is locus name present in allele ? E.g. A*01:01 vs 01:01 |
| 96 | + |
| 97 | +`locus_in_allele_name` |
| 98 | +Is locus name present in allele ? E.g. `A*01:01` vs `01:01` |
97 | 99 |
|
98 | 100 | Valid options: `true` or `false` |
99 | 101 |
|
100 | 102 | ### Output Format |
| 103 | + |
101 | 104 | `output_file_format` Format of the output file |
102 | 105 |
|
103 | | -Valid options: `csv` or `xlsx` |
| 106 | +Valid options: `csv` or `xlsx` |
| 107 | + |
| 108 | +For Excel output, `openpyxl` library needs to be installed. Install with: |
| 109 | +```shell |
| 110 | + pip install openpyxl |
| 111 | +``` |
| 112 | + |
104 | 113 |
|
105 | | -### Create New Column |
106 | | -`new_column_for_redux` Add a separate column for processed column or replace |
107 | | -the current column. Creates a `reduced_` version of the column. |
| 114 | +### Create New Column |
| 115 | + |
| 116 | +`new_column_for_redux` Add a separate column for processed column or replace the current column. Creates a `reduced_` version of the column. Otherwise, the same column is replaced with the reduced version. |
108 | 117 |
|
109 | 118 | Valid options: `true`, `false` |
110 | 119 |
|
111 | 120 | ### Map to DRBX |
112 | | -`map_drb345_to_drbx` Map to DRBX Typings based on DRB3, DRB4 and DRB5 typings. |
| 121 | + |
| 122 | +`map_drb345_to_drbx` Map to DRBX Typings based on DRB3, DRB4 and DRB5 typings using [WMDA method](https://www.nature.com/articles/1705672). |
113 | 123 |
|
114 | 124 | Valid options: `true` or `false` |
| 125 | + |
| 126 | +### Compression Options |
| 127 | + |
| 128 | +`apply_compression` Compression to use for output file. Applies only to CSV files. |
| 129 | + |
| 130 | +Valid options: `'gzip'`, `'zip'` or `null` |
| 131 | + |
| 132 | +### Verbose log Options |
| 133 | + |
| 134 | +`verbose_log` Show verbose log ? |
| 135 | + |
| 136 | +Valid options: `true` or `false` |
0 commit comments