Skip to content

Commit 4e47864

Browse files
Enlarge the sampling range for column determination in FilterGTF script.
1 parent da99418 commit 4e47864

File tree

2 files changed

+5
-4
lines changed

2 files changed

+5
-4
lines changed

CHANGELOG.md

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -7,8 +7,9 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
77

88
### Enhancements and fixes
99

10-
- [PR #1126](https://github.com/nf-core/rnaseq/pull/1126) - Fixes error when transcript_fasta not provided and skip_gtf_filter set to true
11-
- [#1125](https://github.com/nf-core/rnaseq/issues/1125) - Pipeline fails if transcript_fasta not provided and skip_gtf_filter = true
10+
- [[#1125](https://github.com/nf-core/rnaseq/issues/1125)][[#1126](https://github.com/nf-core/rnaseq/pull/1126)] - Pipeline fails if transcript_fasta not provided and `skip_gtf_filter = true`.
11+
- [[#1127](https://github.com/nf-core/rnaseq/pull/)] - Enlarge sampling to determine the number of columns in `filter_gtf.py` script.
12+
1213

1314
## [[3.13.1](https://github.com/nf-core/rnaseq/releases/tag/3.13.1)] - 2023-11-17
1415

bin/filter_gtf.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -23,14 +23,14 @@ def extract_fasta_seq_names(fasta_name: str) -> Set[str]:
2323
def tab_delimited(file: str) -> float:
2424
"""Check if file is tab-delimited and return median number of tabs."""
2525
with open(file, "r") as f:
26-
data = f.read(1024)
26+
data = f.read(102400)
2727
return statistics.median(line.count("\t") for line in data.split("\n"))
2828

2929

3030
def filter_gtf(fasta: str, gtf_in: str, filtered_gtf_out: str, skip_transcript_id_check: bool) -> None:
3131
"""Filter GTF file based on FASTA sequence names."""
3232
if tab_delimited(gtf_in) != 8:
33-
raise ValueError("Invalid GTF file: Expected 8 tab-separated columns.")
33+
raise ValueError("Invalid GTF file: Expected nine tab-separated columns.")
3434

3535
seq_names_in_genome = extract_fasta_seq_names(fasta)
3636
logger.info(f"Extracted chromosome sequence names from {fasta}")

0 commit comments

Comments
 (0)