Skip to content

Commit e4ee614

Browse files
pinin4fjordsclaude
andcommitted
Document RSeQC inner_distance limitation for large chromosomes
Add documentation explaining the bx-python BitSet limitation that affects genomes with chromosomes >500 Mb (commonly plant genomes). Provide clear workaround by excluding inner_distance from rseqc_modules parameter. Closes #608 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
1 parent eb418bd commit e4ee614

File tree

2 files changed

+18
-0
lines changed

2 files changed

+18
-0
lines changed

CHANGELOG.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,7 @@ Special thanks to the following for their contributions to the release:
1515

1616
- [PR #1608](https://github.com/nf-core/rnaseq/pull/1608) - Bump version after release 3.21.0
1717
- [PR #1617](https://github.com/nf-core/rnaseq/pull/1617) - Update bbmap/bbsplit module
18+
- [PR #1624](https://github.com/nf-core/rnaseq/pull/1624) - Document RSeQC inner_distance limitation for genomes with large chromosomes (>500 Mb), such as plant genomes
1819

1920
## [[3.21.0](https://github.com/nf-core/rnaseq/releases/tag/3.21.0)] - 2025-09-18
2021

docs/usage.md

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -378,6 +378,23 @@ This pipeline uses featureCounts to generate QC metrics based on [biotype](http:
378378

379379
Please get in touch with us on the #rnaseq channel in the [nf-core Slack workspace](https://nf-co.re/join) if you are having problems or need any advice.
380380

381+
#### Large chromosomes (plant genomes)
382+
383+
Genomes with very large chromosomes (>500 Mb), such as plant genomes, may encounter failures in the RSeQC `inner_distance` module due to a known limitation in the underlying bx-python library. The bx-python BitSet implementation has a maximum capacity of approximately 537 million bases, which can be exceeded by chromosomes in organisms like wheat, barley, and other plants.
384+
385+
If you encounter an error message similar to `IndexError: [coordinate] is larger than the size of this BitSet (536870912)`, you can work around this by excluding the `inner_distance` module from RSeQC analysis:
386+
387+
```bash
388+
--rseqc_modules 'bam_stat,infer_experiment,junction_annotation,junction_saturation,read_distribution,read_duplication'
389+
```
390+
391+
This removes `inner_distance` from the default list of RSeQC modules while retaining all other quality control metrics. Note that the inner_distance metric is only relevant for paired-end data and provides information about fragment size distribution.
392+
393+
For more information, see the upstream issues:
394+
395+
- [nf-core/rnaseq#608](https://github.com/nf-core/rnaseq/issues/608)
396+
- [bxlab/bx-python#67](https://github.com/bxlab/bx-python/issues/67)
397+
381398
### iGenomes (not recommended)
382399

383400
If the `--genome` parameter is provided (e.g. `--genome GRCh37`) then the FASTA and GTF files (and existing indices) will be automatically obtained from AWS-iGenomes unless these have already been downloaded locally in the path specified by `--igenomes_base`.

0 commit comments

Comments
 (0)