@@ -61,24 +61,32 @@ This pipeline enables robust reconstruction of critical protein regions, advanci
6161
6262| Folder / File | Description |
6363| ----------------| -------------|
64- | ` environment.linux.yml ` | Conda environment for Linux systems |
65- | ` environment.osx-arm64.yaml ` | Conda environment for macOS (Apple Silicon) |
66- | ` src/instanexus/ ` | Core InstaNexus package (modules + CLI) |
67- | ` src/instanexus/__main__.py ` | Entry point for CLI (` instanexus ` command) |
68- | ` src/instanexus/script_dbg.py ` | De Bruijn Graph-based assembly |
69- | ` src/instanexus/script_greedy.py ` | Greedy-based peptide assembly |
70- | ` src/opt/ ` | Grid search and optimization workflows |
64+ | ` docs/ ` | Sphinx documentation, tutorials, and images |
7165| ` fasta/ ` | FASTA reference and contaminant sequences |
7266| ` inputs/ ` | Example input CSV files |
7367| ` json/ ` | Metadata and parameter configuration files |
74- | ` notebooks/ ` | Jupyter notebooks for analysis and visualization |
75- | ` images/ ` | Logos and workflow figures |
7668| ` outputs/ ` | Generated results (created during execution) |
69+ | ` src/instanexus/ ` | Core InstaNexus package |
70+ | ` src/instanexus/main.py ` | ** Master orchestrator** (runs the full pipeline) |
71+ | ` src/instanexus/preprocessing.py ` | Module for data cleaning |
72+ | ` src/instanexus/assembly.py ` | Module for sequence assembly |
73+ | ` src/instanexus/clustering.py ` | Module for clustering (mmseqs2) |
74+ | ` src/instanexus/alignment.py ` | Module for alignment (clustalo) |
75+ | ` src/instanexus/consensus.py ` | Module for consensus generation |
76+ | ` src/instanexus/opt/ ` | Grid search and optimization workflows |
77+ | ` tests/ ` | Pytest unit and integration tests |
78+ | ` environment.linux.yml ` | Conda environment for Linux |
79+ | ` environment.osx-arm64.yaml ` | Conda environment for macOS |
80+ | ` pyproject.toml ` | Package metadata, dependencies, and entry point |
7781
7882---
7983
8084## Installation
8185
86+ InstaNexus requires Python 3.11+, Conda, ** MMseqs2** , and ** Clustal Omega** .
87+
88+ We strongly recommend installing these dependencies in a dedicated conda environment.
89+
8290- [ Conda] ( https://docs.conda.io/en/latest/ )
8391- [ MMseqs2] ( https://github.com/soedinglab/MMseqs2 )
8492- [ Clustal Omega] ( https://www.ebi.ac.uk/Tools/msa/clustalo/ )
@@ -93,91 +101,90 @@ This pipeline enables robust reconstruction of critical protein regions, advanci
93101
94102Follow these steps to clone the repository and set up the environment using Conda:
95103
96- ### 1. Clone the repository
104+ ### Option 1: Install from PyPI
97105
98- To clone and set up the environment:
106+ 1 . Create and activate your conda environment.
107+ 2 . Install the package directly from PyPI:
99108
100109``` bash
101- git clone
[email protected] :Multiomics-Analytics-Group/InstaNexus.git
102- cd instanexus
110+ pip install instanexus
103111```
104112
105- ### 2. Create and activate the Conda environment
113+ ### Option 2: Install from Source (for Developers)
114+ If you want to modify or contribute to the code, you can install it from the source repository:
106115
107- Create instanexus conda environment for linux.
116+ #### Clone the repository:
108117
109118``` bash
110- conda env create -f environment.linux.yml
119+ git clone
[email protected] :Multiomics-Analytics-Group/InstaNexus.git
120+ cd instanexus
111121```
112122
113- Create instanexus conda environment for OS.
114-
123+ #### Create and activate the Conda environment:
115124``` bash
125+ # For Linux
126+ conda env create -f environment.linux.yml
127+ # For macOS (Apple Silicon)
116128conda env create -f environment.osx-arm64.yaml
117- ```
118-
119- Activate:
120129
121- ``` bash
122130conda activate instanexus
123131```
124132
125- ---
126-
127- ### 3. Install InstaNexus as a local package
128-
129- ```
133+ #### Install the package in editable mode:
134+ ``` bash
130135pip install -e .
131136```
132137
133- Then verify the CLI installation:
134-
135- ```
136- instanexus --version
138+ #### Verify the installation
139+ ``` bash
140+ instanexus --help
137141```
138142
139143---
140144
141145## Command-line usage
142146
143- After activating the environment, you can run InstaNexus directly from the terminal:
144- ``` bash
145- instanexus --help
146- ```
147-
148- ### Run De Bruijn graph assembly
149-
150- ```
151- instanexus dbg --input_csv inputs/sample.csv --chain light --folder_outputs outputs --reference
152- ```
147+ After installation (and adding the ` [project.scripts] ` entry point), you can run the entire InstaNexus pipeline using the ` instanexus ` command.
153148
154- ### Run greedy assembly
149+ All parameters for preprocessing, assembly, clustering, and consensus are provided in a single call. The pipeline will automatically create a unique, timestamped output folder for that specific combination of parameters.
155150
156- ```
157- instanexus greedy --input_csv inputs/sample.csv --folder_outputs outputs
151+ ``` bash
152+ instanexus --help
158153```
159154
155+ Example: Run the full pipeline
156+ This command runs the complete workflow:
160157
158+ Preprocesses the input CSV.
161159
160+ Assembles using dbg (De Bruijn graph).
162161
163- ---
162+ Clusters the resulting scaffolds.
164163
165- ## Hyperparameter Optimization
164+ Aligns the clusters.
166165
167- To launch the hyperparameter grid search, run the following command from the project root (the folder containing ``` src/ ``` and ``` json/ ``` ):
166+ Generates consensus sequences.
168167
169168``` bash
170- python -m src.opt.gridsearch
169+ instanexus \
170+ --input-csv inputs/bsa.csv \
171+ --folder-outputs outputs \
172+ --metadata-json-path json/sample_metadata.json \
173+ --contaminants-fasta-path fasta/contaminants.fasta \
174+ --assembly-mode dbg \
175+ --conf 0.9 \
176+ --kmer-size 7 \
177+ --size-threshold 12 \
178+ --min-overlap 3 \
179+ --min-seq-id 0.85 \
180+ --coverage 0.8
171181```
172- ** Adjusting Parameters**
173182
174- Grid search parameters for both the De Bruijn graph (dbg) and Greedy (greedy) assembly methods are defined in:
183+ The results for this specific run will be saved in a unique directory, such as: ``` outputs/bsa/dbg_c0.9_ks7_mo3_ts12/ ```
175184
176- ``` bash
177- json/gridsearch_params.json
178- ```
179185
180- To test more (or fewer) combinations, edit the arrays for each parameter in this file.
186+
187+ ---
181188
182189## License
183190
0 commit comments