Skip to content

Commit bc8d537

Browse files
Merge pull request #14 from mmaiers-nmdp/directory
Directory
2 parents dc34bab + 8c7e3ea commit bc8d537

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

78 files changed

+393982
-595988
lines changed

.gitignore

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -116,6 +116,7 @@ venv.bak/
116116
.spyproject
117117
.idea
118118
.vscode
119+
*.swp
119120

120121
# Rope project settings
121122
.ropeproject
@@ -137,3 +138,5 @@ allure_report/
137138

138139
# cython temp files
139140
grim/**/*.c
141+
142+
output/

Makefile

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -90,6 +90,7 @@ docker: docker-build ## build a docker image and run the service
9090

9191
install: clean ## install the package to the active Python's site-packages
9292
pip install --upgrade pip
93+
python3 setup.py build_ext --inplace
9394
python setup.py install
9495
pip install -r requirements.txt
9596
pip install -r requirements-tests.txt

grim/conf/README.md renamed to conf/README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -4,8 +4,8 @@
44
| --- | --- |
55
| populations | The population to consider them frequencies. |
66
| priority | The coefficient values that define the priority matrix. |
7-
| loci_map| Loci full name Mapping for indexes. |
8-
| freq_trim_threshold | The numerator in the frequency threshold. |
7+
| loci_map| Loci full name Mapping for indexes. |
8+
| freq_trim_threshold | The numerator in the frequency threshold. |
99
| factor_missing_data | factor to haplotype frequency in plan B in missing data case |
1010
| Plan_B_Matrix | matrix arranged by the most probable possibilities for recombination. The first element in the matrix should be the full haplotype. the indexes are corresponding to loci_map|
1111
| planb| True - use plan B anc C. False - use only Plan A. |

grim/conf/minimal-configuration.json renamed to conf/minimal-configuration.json

Lines changed: 6 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,6 @@
11
{
22
"populations": [
3-
"FILII",
4-
"NAMER"
3+
"CAU"
54
],
65
"freq_trim_threshold": 1e-5,
76
"priority": {
@@ -37,12 +36,15 @@
3736
"number_of_pop_results": 100,
3837
"output_MUUG": true,
3938
"output_haplotypes": true,
40-
"graph_files_path": "output/csv" ,
39+
"freq_data_dir": "data/freqs" ,
40+
"pops_count_file": "graph_generation/output/pop_ratio.txt" ,
41+
"freq_file": "graph_generation/output/hpf.csv" ,
42+
"graph_files_path": "graph_generation/output/csv/" ,
4143
"node_csv_file": "nodes.csv",
4244
"edges_csv_file": "edges.csv",
4345
"info_node_csv_file": "info_node.csv",
4446
"top_links_csv_file": "top_links.csv",
45-
"imputation_in_file": "validation/simulation/data/simulated_donor.csv",
47+
"imputation_in_file": "data/subjects/donor.csv",
4648
"imputation_out_umug_freq_filename": "don.umug",
4749
"imputation_out_umug_pops_filename": "don.umug.pops",
4850
"imputation_out_hap_freq_filename": "don.pmug",

data/freqs/CAU.freqs.gz

17.2 KB
Binary file not shown.

data/subjects/donor.csv

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
D1,A*01:02+A*02:01/A*03:01^B*15:01+B*15:01,CAU,CAU
File renamed without changes.
File renamed without changes.

grim/imputation/graph_generation/README.bug renamed to graph_generation/README.bug

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -21,4 +21,3 @@ $ cut -f1,2 -d',' output/csv/nemo/edges.csv |sort |uniq -c |sort -rn |more
2121
539 117913,117365
2222
539 117884,117365
2323
515 117918,117370
24-

grim/imputation/graph_generation/README.md renamed to graph_generation/README.md

Lines changed: 10 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -13,25 +13,25 @@
1313
```
1414

1515
- Python 3
16-
- On MacOS install with
16+
- On MacOS install with
1717
```
1818
brew install python3
1919
```
2020

2121
- Install Neo4J
22-
- On MacOS install with
22+
- On MacOS install with
2323
```
2424
brew install neo4j
2525
```
2626

2727
- Setup NEO4J_HOME
28-
28+
2929
Point NEO4J_HOME to the root of the NEO4J directory.
3030
```
3131
export NEO4J_HOME=/usr/local/Cellar/neo4j/3.2.2/libexec
3232
```
3333

34-
### Linux
34+
### Linux
3535
- JDK 8
3636
- Install JDK 1.8 from Oracle
3737
- add JAVA_HOME to ~/.bash_profile
@@ -51,11 +51,11 @@
5151
```
5252

5353
- Point NEO4J_HOME to the root of the uncompressed NEO4J directory and add the following line to ~/.bash_profile
54-
54+
5555
```
5656
export NEO4J_HOME=path/to/neo4j-community-3.5.7
5757
```
58-
58+
5959

6060

6161
# Using Makefile
@@ -99,12 +99,12 @@ make nemo
9999

100100
To use a different set of frequencies use the following procedure:
101101

102-
- Starting in the graph generator directory, convert the data from frequency format to hpf (haplotype, population, frequency).
103-
```
104-
python nemo_to_hpf_csv.py
102+
- Starting in the graph generator directory, convert the data from frequency format to hpf (haplotype, population, frequency).
103+
```
104+
python nemo_to_hpf_csv.py
105105
```
106106

107-
- This program looks for a data/NEMO2011 directory and reads the individual frequency files and generates this csv:
107+
- This program looks for a data/NEMO2011 directory and reads the individual frequency files and generates this csv:
108108
```
109109
output/hpf.csv
110110
```
@@ -120,4 +120,3 @@ To use a different set of frequencies use the following procedure:
120120
└── top_links.csv
121121
```
122122
Note: there is an option to trim the frequency set below a frequency threshold. If the trimming threshold is 1e-6 it will take 9m35s to generate the graph csv files on a mid-2015 MacBook Pro (2.5 GHz Intel Core i7) and will result in 1,088,817 nodes (159MB), 14,868,976 edges (2.0GB)and 5,947,591 top links (108MB).
123-

0 commit comments

Comments
 (0)