Skip to content

[Question] How to generate FASTG/GFA with IDs matching final.contigs.fa? #389

@Sharkbio

Description

@Sharkbio

Background
I am using MEGAHIT to assemble metagenomic data and need to use the assembly graph (FASTG/GFA) for downstream analysis (specifically for GraphBin). Since MEGAHIT does not produce these files by default, I used megahit_toolkit to convert intermediate contigs:

Bash

megahit_toolkit contig2fastg 141 intermediate_contigs/k141.contigs.fa > k141.fastg
The Problem
The Contig IDs in the generated k141.fastg do not match the IDs in the final.contigs.fa.

Example in final.contigs.fa: >k141_126935

Example in k141.fastg:>NODE_1_length_151_cov_1.0000_ID_1:NODE_536013_length_143_cov_1.0000_ID_1072025;

Because of this mismatch, downstream tools like GraphBin cannot map the binning results (based on final.contigs.fa) back to the assembly graph.

Questions to the Developers
Why do IDs change? Does megahit_toolkit contig2fastg rename nodes during the conversion process, or is it using an internal indexing system?

Standard Workflow: What is the recommended way to generate a GFA or FASTG file that maintains 100% ID consistency with the final assembly output?

Alternative: Is there a way to output the assembly graph directly during the megahit run instead of converting intermediate files post-assembly?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions