Skip to content

Commit c0d26f5

Browse files
authored
Merge pull request #320 from bbglab/s3-samtools-access
docs: add Samtools section with setup instructions for S3 access
2 parents 9152fa7 + cd6a94d commit c0d26f5

File tree

2 files changed

+63
-8
lines changed

2 files changed

+63
-8
lines changed

docs/BBGProtocols/BBglab_data_organization.md

Lines changed: 12 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -9,12 +9,16 @@ the **[BBGlab Exit protocol](https://drive.google.com/file/d/1BnhLZCygroJ-dfamuZ
99

1010
You MUST add all the relevant information about your finished or ongoing project in:
1111

12-
1. **Project Compilation file**: this file indicates all the paths in the cluster that you have been working in across all your projects. For that:
13-
1. Copy the **[Project Compilation template](https://docs.google.com/spreadsheets/d/1jJleTek9eP4S6CCe5fO8_M4-vLuKhumgKjlQ58jP_rc/edit?gid=0#gid=0)** and store it in the [Projects Personal Spreadsheet folder](https://drive.google.com/drive/folders/14SS8kvBcCrPsdwg3ETbTn-c9Erwu702a?usp=drive_link). It should be one file per BBGlab member stored.
14-
2. Change the name of the document with your information as follow: `ProjectCompilation-202X-NameSurname`
15-
3. Modify the document by removing the rows showing the examples and add your own entries.
12+
1. **Project Compilation file**: this file indicates all the paths in the cluster that you have been
13+
working in across all your projects. For that:
14+
1. Copy the **[Project Compilation template](https://docs.google.com/spreadsheets/d/1jJleTek9eP4S6CCe5fO8_M4-vLuKhumgKjlQ58jP_rc/edit?gid=0#gid=0)**
15+
and store it in the [Projects Personal Spreadsheet folder](https://drive.google.com/drive/folders/14SS8kvBcCrPsdwg3ETbTn-c9Erwu702a?usp=drive_link).
16+
It should be one file per BBGlab member stored.
17+
2. Change the name of the document with your information as follow: `ProjectCompilation-202X-NameSurname`
18+
3. Modify the document by removing the rows showing the examples and add your own entries.
1619

17-
2. **[BBGlab datasets file](https://bbglab.github.io/bbgwiki/Datasets/Datasets_BBGLAB/)**: includes all the information about the datasets we use (both internal or external).
20+
2. **[BBGlab datasets file](https://bbglab.github.io/bbgwiki/Datasets/Datasets_BBGLAB/)**: includes all the
21+
information about the datasets we use (both internal or external).
1822

1923
> **It is essential to fill all these files so that all your project data is updated and stored. It is the
2024
> responsibility of ALL the users involved in the project to keep it updated!**
@@ -90,13 +94,13 @@ Everywhere where you store files (Cluster, Drive, Cloud, Computer)
9094

9195
**Store** only essential files in the cluster by ensuring you **erase** **intermediate or temporary** files
9296
that are no longer needed. **Archive** the essential files from completed projects to keep the cluster clean and
93-
manageable, find how to archive files [here](https://bbglab.github.io/bbgwiki/Datasets/Archive_data/).
97+
manageable, find how to archive files [in our archiving guide](https://bbglab.github.io/bbgwiki/Datasets/Archive_data/).
9498

9599
## Track and Manage Your Code with GitHub
96100

97101
It is highly recommended to **create a GitHub repository** for your project code and regularly update it to track
98102
changes, share it with others, review it... Find all documentation on how to work with GitHub repositories
99-
[here](https://docs.github.com/en/repositories/creating-and-managing-repositories/about-repositories).
103+
[in the GitHub documentation](https://docs.github.com/en/repositories/creating-and-managing-repositories/about-repositories).
100104

101105
## Be environmentally friendly
102106

@@ -115,7 +119,7 @@ make it more efficiently with many tools like:
115119
- [nf-co2footprint](https://github.com/nextflow-io/nf-co2footprint)
116120
- [CodeCarbon](https://codecarbon.io/)
117121
- [carbontracker](https://github.com/lfwa/carbontracker)
118-
- Check out more [here](https://github.com/GreenAlgorithms/GreenAlgorithms4HPC)
122+
- Check out more [green algorithms resources](https://github.com/GreenAlgorithms/GreenAlgorithms4HPC)
119123

120124
Check Loïc Lannealongue [talk](<https://summit.nextflow.io/2024/barcelona/agenda/10-30--towards-environmentally-sustainable-computational-science/>) <!-- markdownlint-disable MD013 -->
121125
in Nextflow Submit 2024 in Barcelona to learn more about this.

docs/Cluster_basics/s3.md

Lines changed: 51 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,7 @@
1515
- [boto3](#boto3)
1616
- [s3fs](#s3fs)
1717
- [dask](#dask)
18+
- [Samtools](#samtools)
1819

1920
## What is a S3 Storage?
2021

@@ -153,6 +154,8 @@ download files, but you cannot modify existing objects or upload new ones from t
153154
and log in with your LDAP credentials.
154155

155156
2. Verify that your credentials are valid, or create new ones if they have expired.
157+
The credentials created will be `~/.config/rclone/rclone.conf`
158+
(the same if using [the terminal](#terminal-from-irb-cluster))
156159
![s3 open onDemand credentials](../assets/images/s3-ooo-credentials.png)
157160

158161
3. Once the credentials are valid, click on the "Home Directory System" section.
@@ -303,3 +306,51 @@ df = dd.read_csv("s3://bbg/data/example/file.vcf",
303306
# Display the first few rows
304307
print(df.head())
305308
```
309+
310+
### Samtools
311+
312+
First, you need to have your s3 credentials ready, as explained in the [How to access](#how-to-access) section.
313+
314+
Create a conda env:
315+
316+
```bash
317+
$ conda create -n samtools-s3 -c bioconda -c conda-forge samtools
318+
$ conda activate samtools-s3
319+
#Check if samtools is compiled with the s3 options:
320+
$ samtools --version | grep -E "S3=yes|Amazon S3:"
321+
Features: build=configure libcurl=yes S3=yes GCS=yes libdeflate=yes lzma=yes bzip2=yes plugins=yes plugin-path=/home/bbg/mgrau/apps/miniconda3.7/envs/samtools-s3/libexec/htslib: htscodecs=1.6.6
322+
Amazon S3: s3+https, s3, s3+http
323+
```
324+
325+
Create a config file (replace access_key/secret_key/access_token with your `~/.config/rclone/rclone.conf` values):
326+
327+
```bash
328+
$ cat ~/.hts/s3cfg_minio
329+
[default]
330+
access_key = XXX
331+
secret_key = YYYY
332+
access_token = ZZZZ
333+
host_base = irbminio.irbbarcelona.pcb.ub.es:9000
334+
host_bucket = irbminio.irbbarcelona.pcb.ub.es:9000
335+
use_https = False
336+
signature_v2 = False
337+
```
338+
339+
Include two new entries in your .basrhrc
340+
341+
```bash
342+
#Para samtools s3
343+
export HTS_S3_S3CFG="$HOME/.hts/s3cfg_minio"
344+
export HTS_S3_ADDRESS_STYLE="path"
345+
```
346+
347+
And test:
348+
349+
```bash
350+
samtools view -H s3+http://bbg-scratch/test/test.bam
351+
```
352+
353+
## Reference
354+
355+
- Miguel Grau
356+
- Federica Brando

0 commit comments

Comments
 (0)