Skip to content

fq_data_preprocess bug #501

@Rruier

Description

@Rruier

Please make sure these conditions are met

  • I have checked that this issue has not already been reported.
  • I have confirmed this bug exists on the latest version of omicverse.
  • (optional) I have confirmed this bug exists on the main branch of omicverse.

What happened?

The FASTQ files from the single-end sequencing cannot be processed/converted and an error is reported. It seems that fastp does not take single-end sequencing into account.
For some of my samples, STAR finishes the mapping phase but then crashes during coordinate-sorted BAM generation with the following message in Log.out:

Minimal code sample

fastq_files=["/data1/jpy/H1/SRP554854/fastq_data/SRR31869821.fastq.gz"]
result = fq_data_preprocess(
    fastq_files=fastq_files,                   
    work_dir="/data1/jpy/H1/SRP554854/work_se", 
    genome="human",
    threads=20,
)

Error output

--------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
Cell In[5], line 2
      1 fastq_files=["/data1/jpy/H1/SRP554854/fastq_data/SRR31869821.fastq.gz"]
----> 2 result = fq_data_preprocess(
      3     fastq_files=fastq_files,                    
      4     work_dir="/data1/jpy/H1/SRP554854/work_se",  
      5     genome="human",
      6     threads=8, 
      7 )

File ~/miniconda3/envs/ovgit/lib/python3.11/site-packages/omicverse/bulk/_alignment/__init__.py:439, in fq_data_preprocess(fastq_files, config, input_type, with_align, work_dir, threads, genome, sample_prefix, **kwargs)
    436 fastq_pairs: List[Tuple[str, Path, Optional[Path]]] = pipeline._parse_fastq_input(fastq_files)
    438 # Run: start from FASTQ and enter the unified workflow (fastp -> STAR -> featureCounts)
--> 439 return pipeline.run_from_fastq(
    440     fastq_pairs,
    441     with_align=with_align
    442 )

File ~/miniconda3/envs/ovgit/lib/python3.11/site-packages/omicverse/bulk/_alignment/alignment.py:983, in Alignment.run_from_fastq(self, fastq_pairs, with_align, align_index)
    971 """
    972 Execute the full pipeline starting from FASTQ inputs.
    973 
   (...)    980     A dictionary describing the processing outcomes.
    981 """
    982 # Run QC immediately.
--> 983 fastqs_qc = self.fastp(fastq_pairs)
    985 result: dict[str, Any] = {
    986     "type": "fastq_direct",
    987     "fastq_input": fastq_pairs,
    988     "fastq_qc": fastqs_qc,
    989 }
    991 if with_align:

File ~/miniconda3/envs/ovgit/lib/python3.11/site-packages/omicverse/bulk/_alignment/alignment.py:760, in Alignment.fastp(self, fq_pairs)
    758 if not hasattr(_qc_fastp, "fastp_batch"):
    759     raise RuntimeError("qc_fastp.fastp_batch(...) not found. Please expose it.")
--> 760 return _qc_fastp.fastp_batch(
    761     pairs=fq_pairs,
    762     out_root=str(self.cfg.qc_root),
    763     threads=self.cfg.threads
    764 )

File ~/miniconda3/envs/ovgit/lib/python3.11/site-packages/omicverse/bulk/_alignment/qc_fastp.py:124, in fastp_batch(pairs, out_root, threads, max_workers)
    122 if errors:
    123     msg = "; ".join([f"{s}:{m}" for s, m in errors])
--> 124     raise RuntimeError(f"fastp_batch failed for {len(errors)} samples: {msg}")
    126 # Preserve the original order.
    127 order = {s: i for i, (s, _, _) in enumerate(pairs)}

RuntimeError: fastp_batch failed for 1 samples: SRR31869821fastq:expected str, bytes or os.PathLike object, not NoneType


Nov 20 20:55:28 ..... finished mapping RAM after mapping: VmPeak: 33084520 kB; VmSize: 32925040 kB; VmHWM: 32252412 kB; VmRSS: 32250288 kB; RAM after freeing genome index memory: VmPeak: 33084520 kB; VmSize: 3491304 kB; VmHWM: 32252412 kB; VmRSS: 2816552 kB; Nov 20 20:55:33 ..... started sorting BAM Max memory needed for sorting = 84579752686 EXITING because of fatal ERROR: not enough memory for BAM sorting: SOLUTION: re-run STAR with at least --limitBAMsortRAM 85579752686 Nov 20 20:55:33 ...... FATAL ERROR, exiting

Versions

Details 🔬 Starting plot initialization... 🧬 Detecting GPU devices… ✅ NVIDIA CUDA GPUs detected: 4 • [CUDA 0] NVIDIA RTX A6000 Memory: 47.4 GB | Compute: 8.6 • [CUDA 1] NVIDIA RTX A6000 Memory: 47.4 GB | Compute: 8.6 • [CUDA 2] NVIDIA RTX A6000 Memory: 47.4 GB | Compute: 8.6 • [CUDA 3] NVIDIA RTX A6000 Memory: 47.4 GB | Compute: 8.6

/ __ ____ ___ ()| | / / _____________
/ / / / __ `__ / / / | / / _ / / / _ \
/ /
/ / / / / / / / /
| |/ / / / ( ) /
_
/
/ /
/ /
/
/_
/ |/_// /___/___/

🔖 Version: 1.7.9rc1 📚 Tutorials: https://omicverse.readthedocs.io/
✅ plot_set complete.


Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions