Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
380 changes: 193 additions & 187 deletions Cargo.lock

Large diffs are not rendered by default.

4 changes: 2 additions & 2 deletions data/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,10 +15,10 @@ crypt4gh keygen --sk c4gh/keys/bob.sec --pk c4gh/keys/bob.pub
Files were encrypted by running:

```sh
crypt4gh encrypt --sk c4gh/keys/alice.sec --recipient_pk c4gh/keys/bob.pub < bam/htsnexus_test_NA12878.bam > c4gh/htsnexus_test_NA12878.bam.c4gh
crypt4gh encrypt --sk c4gh/keys/alice.sec --recipient_pk c4gh/keys/bob.pub < bam/seraseq_cebpa_larger.bam > c4gh/seraseq_cebpa_larger.bam.c4gh
crypt4gh encrypt --sk c4gh/keys/alice.sec --recipient_pk c4gh/keys/bob.pub < bcf/sample1-bcbio-cancer.bcf > c4gh/sample1-bcbio-cancer.bcf.c4gh
crypt4gh encrypt --sk c4gh/keys/alice.sec --recipient_pk c4gh/keys/bob.pub < bcf/spec-v4.3.bcf > c4gh/spec-v4.3.bcf.c4gh
crypt4gh encrypt --sk c4gh/keys/alice.sec --recipient_pk c4gh/keys/bob.pub < cram/htsnexus_test_NA12878.cram > c4gh/htsnexus_test_NA12878.cram.c4gh
crypt4gh encrypt --sk c4gh/keys/alice.sec --recipient_pk c4gh/keys/bob.pub < cram/seraseq_cebpa_larger.cram > c4gh/seraseq_cebpa_larger.cram.c4gh
crypt4gh encrypt --sk c4gh/keys/alice.sec --recipient_pk c4gh/keys/bob.pub < vcf/sample1-bcbio-cancer.vcf.gz > c4gh/sample1-bcbio-cancer.vcf.gz.c4gh
crypt4gh encrypt --sk c4gh/keys/alice.sec --recipient_pk c4gh/keys/bob.pub < vcf/spec-v4.3.vcf.gz > c4gh/spec-v4.3.vcf.gz.c4gh
```
Expand Down
Binary file removed data/bam/htsnexus_test_NA12878.bam
Binary file not shown.
Binary file removed data/bam/htsnexus_test_NA12878.bam.bai
Binary file not shown.
Binary file removed data/bam/htsnexus_test_NA12878.bam.gzi
Binary file not shown.
Binary file added data/bam/seraseq_cebpa_larger.bam
Binary file not shown.
Binary file added data/bam/seraseq_cebpa_larger.bam.bai
Binary file not shown.
Binary file removed data/cram/htsnexus_test_NA12878.cram
Binary file not shown.
Binary file removed data/cram/htsnexus_test_NA12878.cram.crai
Binary file not shown.
Binary file added data/cram/seraseq_cebpa_larger.cram
Binary file not shown.
Binary file added data/cram/seraseq_cebpa_larger.cram.crai
Binary file not shown.
4 changes: 2 additions & 2 deletions docker/examples/file/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ This launches a `File` htsget-actix server serving data from the [`data`][data]
The htsget-rs server can then be queried:

```sh
curl http://127.0.0.1:8080/reads/data/bam/htsnexus_test_NA12878
curl http://127.0.0.1:8080/reads/data/bam/seraseq_cebpa_larger
```

Which outputs:
Expand All @@ -23,7 +23,7 @@ Which outputs:
"format": "BAM",
"urls": [
{
"url": "http://0.0.0.0:8081/data/bam/htsnexus_test_NA12878.bam",
"url": "http://0.0.0.0:8081/data/bam/seraseq_cebpa_larger.bam",
"headers": {
"Range": "bytes=0-2596770"
}
Expand Down
6 changes: 3 additions & 3 deletions docker/examples/minio/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ docker compose up
Then:

```sh
curl http://127.0.0.1:8080/reads/bam/htsnexus_test_NA12878
curl http://127.0.0.1:8080/reads/bam/seraseq_cebpa_larger
```

Outputs:
Expand All @@ -43,7 +43,7 @@ Outputs:
"format": "BAM",
"urls": [
{
"url": "http://data.minio:9000/bam/htsnexus_test_NA12878.bam?x-id=GetObject&X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=user%2F20240320%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20240320T014007Z&X-Amz-Expires=1000&X-Amz-SignedHeaders=host%3Brange&X-Amz-Signature=33a75bd6363ccbfd5ce8edf7e102a5edff8ca7cee17e3c654db01a880e98072d",
"url": "http://data.minio:9000/bam/seraseq_cebpa_larger.bam?x-id=GetObject&X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=user%2F20240320%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20240320T014007Z&X-Amz-Expires=1000&X-Amz-SignedHeaders=host%3Brange&X-Amz-Signature=33a75bd6363ccbfd5ce8edf7e102a5edff8ca7cee17e3c654db01a880e98072d",
"headers": {
"Range": "bytes=0-2596770"
}
Expand All @@ -59,7 +59,7 @@ Outputs:
The url tickets can then be fetched within the compose network context:

```sh
docker exec -it minio curl -H "Range: bytes=0-2596770" "http://data.minio:9000/bam/htsnexus_test_NA12878.bam?x-id=GetObject&X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=user%2F20240320%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20240320T014007Z&X-Amz-Expires=1000&X-Amz-SignedHeaders=host%3Brange&X-Amz-Signature=33a75bd6363ccbfd5ce8edf7e102a5edff8ca7cee17e3c654db01a880e98072d"
docker exec -it minio curl -H "Range: bytes=0-2596770" "http://data.minio:9000/bam/seraseq_cebpa_larger.bam?x-id=GetObject&X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=user%2F20240320%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20240320T014007Z&X-Amz-Expires=1000&X-Amz-SignedHeaders=host%3Brange&X-Amz-Signature=33a75bd6363ccbfd5ce8edf7e102a5edff8ca7cee17e3c654db01a880e98072d"
```

[path-style-deprecated]: https://aws.amazon.com/blogs/aws/amazon-s3-path-deprecation-plan-the-rest-of-the-story/
Expand Down
12 changes: 6 additions & 6 deletions htsget-actix/benches/request_benchmarks.rs
Original file line number Diff line number Diff line change
Expand Up @@ -234,8 +234,8 @@ fn criterion_benchmark(c: &mut Criterion) {
bench_pair(
&mut group,
"[LIGHT] simple request",
format_url(&htsget_rs_url, "reads/data/bam/htsnexus_test_NA12878"),
format_url(&htsget_refserver_url, "reads/htsnexus_test_NA12878"),
format_url(&htsget_rs_url, "reads/data/bam/seraseq_cebpa_larger"),
format_url(&htsget_refserver_url, "reads/seraseq_cebpa_larger"),
&json_content,
);

Expand All @@ -254,8 +254,8 @@ fn criterion_benchmark(c: &mut Criterion) {
bench_pair(
&mut group,
"[LIGHT] with region",
format_url(&htsget_rs_url, "reads/data/bam/htsnexus_test_NA12878"),
format_url(&htsget_refserver_url, "reads/htsnexus_test_NA12878"),
format_url(&htsget_rs_url, "reads/data/bam/seraseq_cebpa_larger"),
format_url(&htsget_refserver_url, "reads/seraseq_cebpa_larger"),
&json_content,
);

Expand All @@ -281,8 +281,8 @@ fn criterion_benchmark(c: &mut Criterion) {
bench_pair(
&mut group,
"[LIGHT] with two regions",
format_url(&htsget_rs_url, "reads/data/bam/htsnexus_test_NA12878"),
format_url(&htsget_refserver_url, "reads/htsnexus_test_NA12878"),
format_url(&htsget_rs_url, "reads/data/bam/seraseq_cebpa_larger"),
format_url(&htsget_refserver_url, "reads/seraseq_cebpa_larger"),
&json_content,
);

Expand Down
10 changes: 5 additions & 5 deletions htsget-axum/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -122,7 +122,7 @@ cargo run -p htsget-axum --features experimental -- --config htsget-config/examp
Crypt4GH encrypted byte ranges can be queried:

```sh
curl 'http://localhost:8080/reads/data/c4gh/htsnexus_test_NA12878?referenceName=11&start=5000000&end=5050000'
curl 'http://localhost:8080/reads/data/c4gh/seraseq_cebpa_larger?referenceName=11&start=5000000&end=5050000'
```

The output consists of the Crypt4GH header, which includes the original header, the edit lists, and the re-encrypted header that
Expand All @@ -137,7 +137,7 @@ the recipient can use to decrypt bytes:
"url": "data:;base64,Y3J5cHQ0Z2gBAAAAAwAAAA=="
},
{
"url": "http://127.0.0.1:8081/data/c4gh/htsnexus_test_NA12878.bam.c4gh",
"url": "http://127.0.0.1:8081/data/c4gh/seraseq_cebpa_larger.bam.c4gh",
"headers": {
"Range": "bytes=16-123"
}
Expand All @@ -146,13 +146,13 @@ the recipient can use to decrypt bytes:
"url": "data:;base64,bAAAAAAAAABPIoRdk+d+ifp2PWRFeXoe6Z9kPOj+HrREhzxZ3QiDa2SYh+0Gy8aKpFic4MtTa+ywMpkHziJgojVbcmbvBAr3G7o01lDubsBW98aQ/U1AcalIUCp0fGNkrtdTBN4NaVNIdtQmbAAAAAAAAABPIoRdk+d+ifp2PWRFeXoe6Z9kPOj+HrREhzxZ3QiDa+xJ+yh+52zHvw8qQXMyCtqT6jTFvaYhRPw/6ZzvOdt98YPQgCcTIut58VeTGmR3ien0TdcQFxmfE10MH4qapF2blgjX"
},
{
"url": "http://127.0.0.1:8081/data/c4gh/htsnexus_test_NA12878.bam.c4gh",
"url": "http://127.0.0.1:8081/data/c4gh/seraseq_cebpa_larger.bam.c4gh",
"headers": {
"Range": "bytes=124-1114711"
}
},
{
"url": "http://127.0.0.1:8081/data/c4gh/htsnexus_test_NA12878.bam.c4gh",
"url": "http://127.0.0.1:8081/data/c4gh/seraseq_cebpa_larger.bam.c4gh",
"headers": {
"Range": "bytes=2557120-2598042"
}
Expand All @@ -165,7 +165,7 @@ the recipient can use to decrypt bytes:
For example, using a [htsget client][htsget-client], the data can be concatenated, and then decrypted using the [Crypt4GH CLI][crypt4gh-cli]:

```sh
htsget 'http://localhost:8080/reads/data/c4gh/htsnexus_test_NA12878?referenceName=11&start=5000000&end=5050000' > out.c4gh
htsget 'http://localhost:8080/reads/data/c4gh/seraseq_cebpa_larger?referenceName=11&start=5000000&end=5050000' > out.c4gh
crypt4gh decrypt --sk data/c4gh/keys/alice.sec < out.c4gh > out.bam
samtools view out.bam
```
Expand Down
6 changes: 3 additions & 3 deletions htsget-config/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ cargo run --all-features -p htsget-axum -- --config <your_config_file.toml>
This will serve files under the [`data`][data] directory:

```sh
curl 'http://localhost:8080/reads/bam/htsnexus_test_NA12878'
curl 'http://localhost:8080/reads/bam/seraseq_cebpa_larger'
```

Locations allow htsget-rs access to bioinformatics files and indexes. Instead of local files, htsget-rs can access
Expand All @@ -55,8 +55,8 @@ locations = [ "file://data/bam", "file://data/cram" ]
This allows htsget-rs to serve data only when the request also contains the prefix:

```sh
curl 'http://localhost:8080/reads/bam/htsnexus_test_NA12878'
curl 'http://localhost:8080/reads/cram/htsnexus_test_NA12878?format=CRAM'
curl 'http://localhost:8080/reads/bam/seraseq_cebpa_larger'
curl 'http://localhost:8080/reads/cram/seraseq_cebpa_larger?format=CRAM'
```

Locations can be mixed, and don't all need to have the same directory or resource:
Expand Down
14 changes: 7 additions & 7 deletions htsget-http/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -134,10 +134,10 @@ mod tests {
let request = HashMap::new();

let mut expected_response_headers = Headers::default();
expected_response_headers.insert("Range".to_string(), "bytes=0-2596798".to_string());
expected_response_headers.insert("Range".to_string(), "bytes=0-986643".to_string());

let request = Request::new(
"bam/htsnexus_test_NA12878".to_string(),
"bam/seraseq_cebpa_larger".to_string(),
request,
Default::default(),
);
Expand All @@ -154,7 +154,7 @@ mod tests {
request.insert("format".to_string(), "VCF".to_string());

let request = Request::new(
"bam/htsnexus_test_NA12878".to_string(),
"bam/seraseq_cebpa_larger".to_string(),
request,
Default::default(),
);
Expand Down Expand Up @@ -189,7 +189,7 @@ mod tests {

#[tokio::test]
async fn post_request() {
let request = Request::new_with_id("bam/htsnexus_test_NA12878".to_string());
let request = Request::new_with_id("bam/seraseq_cebpa_larger".to_string());
let body = PostRequest {
format: None,
class: None,
Expand All @@ -200,7 +200,7 @@ mod tests {
};

let mut expected_response_headers = Headers::default();
expected_response_headers.insert("Range".to_string(), "bytes=0-2596798".to_string());
expected_response_headers.insert("Range".to_string(), "bytes=0-986643".to_string());

assert_eq!(
post(get_searcher(), body, request, Endpoint::Reads).await,
Expand All @@ -210,7 +210,7 @@ mod tests {

#[tokio::test]
async fn post_variants_request_with_reads_format() {
let request = Request::new_with_id("bam/htsnexus_test_NA12878".to_string());
let request = Request::new_with_id("bam/seraseq_cebpa_larger".to_string());
let body = PostRequest {
format: Some("BAM".to_string()),
class: None,
Expand Down Expand Up @@ -265,7 +265,7 @@ mod tests {
JsonResponse::from(Response::new(
Bam,
vec![
Url::new("http://127.0.0.1:8081/bam/htsnexus_test_NA12878.bam".to_string())
Url::new("http://127.0.0.1:8081/bam/seraseq_cebpa_larger.bam".to_string())
.with_headers(headers),
],
))
Expand Down
8 changes: 4 additions & 4 deletions htsget-search/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -78,19 +78,19 @@ produce minimal byte ranges. For example, consider this [file][example-file]:
* Using just this data, the following query with:
* `referenceName=11`, `start=5015000`, and `end=5050000`
* Would produce these byte ranges:
* `bytes=0-4667`
* `bytes=0-38969`
* `bytes=256721-1065951`
* However, an equally valid response, with smaller byte ranges is:
* `bytes=0-4667`
* `bytes=0-38969`
* `bytes=256721-647345`
* `bytes=824361-842100`
* `bytes=977196-996014`

To produce the smallest byte ranges, htsget-rs needs can search through GZI files and regular index files. It does not
read data from the underlying target file.

[example-file]: ../data/bam/htsnexus_test_NA12878.bam
[example-index]: ../data/bam/htsnexus_test_NA12878.bam.bai
[example-file]: ../data/bam/seraseq_cebpa_larger.bam
[example-index]: ../data/bam/seraseq_cebpa_larger.bam.bai

## Benchmarks

Expand Down
16 changes: 8 additions & 8 deletions htsget-search/benches/search_benchmarks.rs
Original file line number Diff line number Diff line change
Expand Up @@ -45,39 +45,39 @@ fn criterion_benchmark(c: &mut Criterion) {
bench_query(
&mut group,
"[LIGHT] Bam query all",
Query::new_with_default_request("bam/htsnexus_test_NA12878", Bam),
Query::new_with_default_request("bam/seraseq_cebpa_larger", Bam),
);
bench_query(
&mut group,
"[LIGHT] Bam query specific",
Query::new_with_default_request("bam/htsnexus_test_NA12878", Bam)
.with_reference_name("11")
Query::new_with_default_request("bam/seraseq_cebpa_larger", Bam)
.with_reference_name("chr19")
.with_start(4999977)
.with_end(5008321),
);
bench_query(
&mut group,
"[LIGHT] Bam query header",
Query::new_with_default_request("bam/htsnexus_test_NA12878", Bam).with_class(Header),
Query::new_with_default_request("bam/seraseq_cebpa_larger", Bam).with_class(Header),
);

bench_query(
&mut group,
"[LIGHT] Cram query all",
Query::new_with_default_request("cram/htsnexus_test_NA12878", Cram),
Query::new_with_default_request("cram/seraseq_cebpa_larger", Cram),
);
bench_query(
&mut group,
"[LIGHT] Cram query specific",
Query::new_with_default_request("cram/htsnexus_test_NA12878", Cram)
.with_reference_name("11")
Query::new_with_default_request("cram/seraseq_cebpa_larger", Cram)
.with_reference_name("chr19")
.with_start(4999977)
.with_end(5008321),
);
bench_query(
&mut group,
"[LIGHT] Cram query header",
Query::new_with_default_request("cram/htsnexus_test_NA12878", Cram).with_class(Header),
Query::new_with_default_request("cram/seraseq_cebpa_larger", Cram).with_class(Header),
);

bench_query(
Expand Down
Loading
Loading