|
1 | 1 | # Extras |
2 | 2 |
|
3 | | -# Batch Script for CSV File |
| 3 | +# Script to batch process a CSV File |
4 | 4 |
|
5 | 5 | **Example Scripts to batch reduce HLA typings from a CSV File** |
6 | 6 |
|
7 | | -`reduce_csv.py` and `conf.py` scripts can be used to take a CSV file with HLA |
8 | | -typing data and reduce certain columns and produce a new CSV and Excel file. |
9 | | - |
10 | | -For most use case, installing `py-ard`, specifying the changes in `conf.py` file |
11 | | -and running `python reduce_csv.py` will produce result based on the configuration |
12 | | -in the `conf.py`. |
13 | | - |
14 | | - |
15 | | -```python |
16 | | -# |
17 | | -# configurations for processing CSV files |
18 | | -# |
19 | | - |
20 | | -# The column names that are in CSV |
21 | | -# The output file will have these columns |
22 | | -all_columns_in_csv = [ |
23 | | - "nmdp_id", "r_a_typ1", "r_a_typ2", "r_b_typ1", "r_b_typ2", "r_c_typ1", "r_c_typ2", "r_drb1_typ1", "r_drb1_typ2", |
24 | | - "r_dpb1_typ1", "r_dpb1_typ2" |
25 | | -] |
26 | | - |
27 | | -# |
28 | | -# List of columns which have typing information and need to be reduced. |
29 | | -# The locus is the 2nd term in the column name |
30 | | -# Eg: For column R_DRB1_type1, DPB1 is the locus name |
31 | | -# |
32 | | -columns_to_reduce_in_csv = [ |
33 | | - "r_a_typ1", "r_a_typ2", "r_b_typ1", "r_b_typ2", "r_c_typ1", "r_c_typ2", "r_drb1_typ1", "r_drb1_typ2", "r_dpb1_typ1", |
| 7 | +`pyard-reduce-csv` command can be used with a config file(that describes ways |
| 8 | +to reduce the file) can be used to take a CSV file with HLA typing data and |
| 9 | +reduce certain columns and produce a new CSV or an Excel file. |
| 10 | + |
| 11 | +Install `py-ard` and use `pyard-reduce-csv` command specifying the changes in a JSON |
| 12 | +config file and running `pyard-reduce-csv -c <config-file>` will produce result based |
| 13 | +on the configuration in the config file. |
| 14 | + |
| 15 | + |
| 16 | +See [Example JSON config file](reduce_conf.json). |
| 17 | + |
| 18 | + |
| 19 | +### Input CSV filename |
| 20 | +`in_csv_filename` Directory path and file name of the Input CSV file |
| 21 | + |
| 22 | +### Output CSV filename |
| 23 | +`out_csv_filename` Directory path and file name of the Reduced Output CSV file |
| 24 | + |
| 25 | +### CSV Columns to read |
| 26 | +`columns_from_csv` The column names to read from CSV file |
| 27 | + |
| 28 | +```json |
| 29 | + [ |
| 30 | + "nmdp_id", |
| 31 | + "r_a_typ1", |
| 32 | + "r_a_typ2", |
| 33 | + "r_b_typ1", |
| 34 | + "r_b_typ2", |
| 35 | + "r_c_typ1", |
| 36 | + "r_c_typ2", |
| 37 | + "r_drb1_typ1", |
| 38 | + "r_drb1_typ2", |
| 39 | + "r_dpb1_typ1", |
| 40 | + "r_dpb1_typ2" |
| 41 | + ] |
| 42 | +``` |
| 43 | + |
| 44 | +### CSV Columns to reduce |
| 45 | +`columns_to_reduce_in_csv` List of columns which have typing information and need to be reduced. |
| 46 | + |
| 47 | +**NOTE**: The locus is the 2nd term in the column name |
| 48 | +E.g., for column `column R_DRB1_type1`, `DPB1` is the locus name |
| 49 | + |
| 50 | +```json |
| 51 | + [ |
| 52 | + "r_a_typ1", |
| 53 | + "r_a_typ2", |
| 54 | + "r_b_typ1", |
| 55 | + "r_b_typ2", |
| 56 | + "r_c_typ1", |
| 57 | + "r_c_typ2", |
| 58 | + "r_drb1_typ1", |
| 59 | + "r_drb1_typ2", |
| 60 | + "r_dpb1_typ1", |
34 | 61 | "r_dpb1_typ2" |
35 | | -] |
36 | | - |
37 | | -# |
38 | | -# Configuration options to ARD reduction of a CSV file |
39 | | -# |
40 | | -ard_config = { |
41 | | - # All Columns in the CSV file |
42 | | - "csv_in_column_names": all_columns_in_csv, |
43 | | - |
44 | | - # Columns to check for typings |
45 | | - "columns_to_check": columns_to_reduce_in_csv, |
46 | | - |
47 | | - # How should the typings be reduced |
48 | | - # Valid Options: |
49 | | - # - G |
50 | | - # - lg |
51 | | - # - lgx |
52 | | - "redux_type": "lgx", |
53 | | - |
54 | | - # Input CSV filename |
55 | | - "in_csv_filename": "sample.csv", |
56 | | - |
57 | | - # Output CSV filename |
58 | | - "out_csv_filename": 'clean_sample.csv', |
59 | | - |
60 | | - # Use compression |
61 | | - # Valid options |
62 | | - # - 'gzip' |
63 | | - # - 'zip' |
64 | | - # - None |
65 | | - "apply_compression": 'gzip', |
66 | | - |
67 | | - # Show verbose log |
68 | | - # Valid options: |
69 | | - # - True |
70 | | - # - False |
71 | | - "verbose_log": True, |
72 | | - |
73 | | - # What to reduce ? |
74 | | - "reduce_serology": False, |
75 | | - "reduce_v2": True, |
76 | | - "reduce_3field": True, |
77 | | - "reduce_P": True, |
78 | | - "reduce_XX": False, |
79 | | - "reduce_MAC": True, |
80 | | - |
81 | | - # Is locus name present in allele |
82 | | - # Eg. A*01:01 vs 01:01 |
83 | | - "locus_in_allele_name": False, |
84 | | - |
85 | | - # Format |
86 | | - # Valid options: |
87 | | - # - csv |
88 | | - # - xlsx |
89 | | - "output_file_format": 'csv', |
90 | | - |
91 | | - # Add a separate column for processed column |
92 | | - "new_column_for_redux": False, |
93 | | -} |
| 62 | + ], |
94 | 63 | ``` |
95 | 64 |
|
96 | | -The included sample CSV file `sample.csv` can be processed using the script. |
97 | 65 |
|
98 | | -```shell |
| 66 | +### Redux Options |
| 67 | +`redux_type` Reduction Type |
| 68 | + |
| 69 | +Valid Options: `G`, `lg` and `lgx` |
| 70 | + |
| 71 | +### Compression Options |
| 72 | +`apply_compression` Compression to use for output file |
99 | 73 |
|
| 74 | +Valid options: `'gzip'`, `'zip'` or `null` |
| 75 | + |
| 76 | +### Verbose log Options |
| 77 | +`log_comment` Show verbose log ? |
| 78 | + |
| 79 | +Valid options: `true` or `false` |
| 80 | + |
| 81 | +### Types of typings to reduce |
| 82 | +```json |
| 83 | + "verbose_log": true, |
| 84 | + "reduce_serology": false, |
| 85 | + "reduce_v2": true, |
| 86 | + "reduce_3field": true, |
| 87 | + "reduce_P": true, |
| 88 | + "reduce_XX": false, |
| 89 | + "reduce_MAC": true, |
100 | 90 | ``` |
| 91 | +Valid options: `true` or `false` |
| 92 | + |
| 93 | + |
| 94 | +### Locus Name in Allele |
| 95 | +`locus_in_allele_name` |
| 96 | +Is locus name present in allele ? E.g. A*01:01 vs 01:01 |
| 97 | + |
| 98 | +Valid options: `true` or `false` |
| 99 | + |
| 100 | +### Output Format |
| 101 | +`output_file_format` Format of the output file |
| 102 | + |
| 103 | +Valid options: `csv` or `xlsx` |
| 104 | + |
| 105 | +### Create New Column |
| 106 | +`new_column_for_redux` Add a separate column for processed column or replace |
| 107 | +the current column. Creates a `reduced_` version of the column. |
| 108 | + |
| 109 | +Valid options: `true`, `false` |
| 110 | + |
| 111 | +### Map to DRBX |
| 112 | +`map_drb345_to_drbx` Map to DRBX Typings based on DRB3, DRB4 and DRB5 typings. |
| 113 | + |
| 114 | +Valid options: `true` or `false` |
0 commit comments