Skip to content

Commit 0b7cea9

Browse files
🎨 README updated
1 parent 5f33ddb commit 0b7cea9

File tree

1 file changed

+61
-54
lines changed

1 file changed

+61
-54
lines changed

‎README.md‎

Lines changed: 61 additions & 54 deletions
Original file line numberDiff line numberDiff line change
@@ -61,24 +61,32 @@ This pipeline enables robust reconstruction of critical protein regions, advanci
6161

6262
| Folder / File | Description |
6363
|----------------|-------------|
64-
| `environment.linux.yml` | Conda environment for Linux systems |
65-
| `environment.osx-arm64.yaml` | Conda environment for macOS (Apple Silicon) |
66-
| `src/instanexus/` | Core InstaNexus package (modules + CLI) |
67-
| `src/instanexus/__main__.py` | Entry point for CLI (`instanexus` command) |
68-
| `src/instanexus/script_dbg.py` | De Bruijn Graph-based assembly |
69-
| `src/instanexus/script_greedy.py` | Greedy-based peptide assembly |
70-
| `src/opt/` | Grid search and optimization workflows |
64+
| `docs/` | Sphinx documentation, tutorials, and images |
7165
| `fasta/` | FASTA reference and contaminant sequences |
7266
| `inputs/` | Example input CSV files |
7367
| `json/` | Metadata and parameter configuration files |
74-
| `notebooks/` | Jupyter notebooks for analysis and visualization |
75-
| `images/` | Logos and workflow figures |
7668
| `outputs/` | Generated results (created during execution) |
69+
| `src/instanexus/` | Core InstaNexus package |
70+
| `src/instanexus/main.py` | **Master orchestrator** (runs the full pipeline) |
71+
| `src/instanexus/preprocessing.py` | Module for data cleaning |
72+
| `src/instanexus/assembly.py` | Module for sequence assembly |
73+
| `src/instanexus/clustering.py` | Module for clustering (mmseqs2) |
74+
| `src/instanexus/alignment.py` | Module for alignment (clustalo) |
75+
| `src/instanexus/consensus.py` | Module for consensus generation |
76+
| `src/instanexus/opt/` | Grid search and optimization workflows |
77+
| `tests/` | Pytest unit and integration tests |
78+
| `environment.linux.yml` | Conda environment for Linux |
79+
| `environment.osx-arm64.yaml` | Conda environment for macOS |
80+
| `pyproject.toml` | Package metadata, dependencies, and entry point |
7781

7882
---
7983

8084
## Installation
8185

86+
InstaNexus requires Python 3.11+, Conda, **MMseqs2**, and **Clustal Omega**.
87+
88+
We strongly recommend installing these dependencies in a dedicated conda environment.
89+
8290
- [Conda](https://docs.conda.io/en/latest/)
8391
- [MMseqs2](https://github.com/soedinglab/MMseqs2)
8492
- [Clustal Omega](https://www.ebi.ac.uk/Tools/msa/clustalo/)
@@ -93,91 +101,90 @@ This pipeline enables robust reconstruction of critical protein regions, advanci
93101

94102
Follow these steps to clone the repository and set up the environment using Conda:
95103

96-
### 1. Clone the repository
104+
### Option 1: Install from PyPI
97105

98-
To clone and set up the environment:
106+
1. Create and activate your conda environment.
107+
2. Install the package directly from PyPI:
99108

100109
```bash
101-
git clone [email protected]:Multiomics-Analytics-Group/InstaNexus.git
102-
cd instanexus
110+
pip install instanexus
103111
```
104112

105-
### 2. Create and activate the Conda environment
113+
### Option 2: Install from Source (for Developers)
114+
If you want to modify or contribute to the code, you can install it from the source repository:
106115

107-
Create instanexus conda environment for linux.
116+
#### Clone the repository:
108117

109118
```bash
110-
conda env create -f environment.linux.yml
119+
git clone [email protected]:Multiomics-Analytics-Group/InstaNexus.git
120+
cd instanexus
111121
```
112122

113-
Create instanexus conda environment for OS.
114-
123+
#### Create and activate the Conda environment:
115124
```bash
125+
# For Linux
126+
conda env create -f environment.linux.yml
127+
# For macOS (Apple Silicon)
116128
conda env create -f environment.osx-arm64.yaml
117-
```
118-
119-
Activate:
120129

121-
```bash
122130
conda activate instanexus
123131
```
124132

125-
---
126-
127-
### 3. Install InstaNexus as a local package
128-
129-
```
133+
#### Install the package in editable mode:
134+
```bash
130135
pip install -e .
131136
```
132137

133-
Then verify the CLI installation:
134-
135-
```
136-
instanexus --version
138+
#### Verify the installation
139+
```bash
140+
instanexus --help
137141
```
138142

139143
---
140144

141145
## Command-line usage
142146

143-
After activating the environment, you can run InstaNexus directly from the terminal:
144-
```bash
145-
instanexus --help
146-
```
147-
148-
### Run De Bruijn graph assembly
149-
150-
```
151-
instanexus dbg --input_csv inputs/sample.csv --chain light --folder_outputs outputs --reference
152-
```
147+
After installation (and adding the `[project.scripts]` entry point), you can run the entire InstaNexus pipeline using the `instanexus` command.
153148

154-
### Run greedy assembly
149+
All parameters for preprocessing, assembly, clustering, and consensus are provided in a single call. The pipeline will automatically create a unique, timestamped output folder for that specific combination of parameters.
155150

156-
```
157-
instanexus greedy --input_csv inputs/sample.csv --folder_outputs outputs
151+
```bash
152+
instanexus --help
158153
```
159154

155+
Example: Run the full pipeline
156+
This command runs the complete workflow:
160157

158+
Preprocesses the input CSV.
161159

160+
Assembles using dbg (De Bruijn graph).
162161

163-
---
162+
Clusters the resulting scaffolds.
164163

165-
## Hyperparameter Optimization
164+
Aligns the clusters.
166165

167-
To launch the hyperparameter grid search, run the following command from the project root (the folder containing ```src/``` and ```json/```):
166+
Generates consensus sequences.
168167

169168
```bash
170-
python -m src.opt.gridsearch
169+
instanexus \
170+
--input-csv inputs/bsa.csv \
171+
--folder-outputs outputs \
172+
--metadata-json-path json/sample_metadata.json \
173+
--contaminants-fasta-path fasta/contaminants.fasta \
174+
--assembly-mode dbg \
175+
--conf 0.9 \
176+
--kmer-size 7 \
177+
--size-threshold 12 \
178+
--min-overlap 3 \
179+
--min-seq-id 0.85 \
180+
--coverage 0.8
171181
```
172-
**Adjusting Parameters**
173182

174-
Grid search parameters for both the De Bruijn graph (dbg) and Greedy (greedy) assembly methods are defined in:
183+
The results for this specific run will be saved in a unique directory, such as:```outputs/bsa/dbg_c0.9_ks7_mo3_ts12/```
175184

176-
```bash
177-
json/gridsearch_params.json
178-
```
179185

180-
To test more (or fewer) combinations, edit the arrays for each parameter in this file.
186+
187+
---
181188

182189
## License
183190

0 commit comments

Comments
 (0)