[Question] How to generate FASTG/GFA with IDs matching final.contigs.fa?

Background
I am using MEGAHIT to assemble metagenomic data and need to use the assembly graph (FASTG/GFA) for downstream analysis (specifically for GraphBin). Since MEGAHIT does not produce these files by default, I used megahit_toolkit to convert intermediate contigs:

Bash

megahit_toolkit contig2fastg 141 intermediate_contigs/k141.contigs.fa > k141.fastg
The Problem
The Contig IDs in the generated k141.fastg do not match the IDs in the final.contigs.fa.

Example in final.contigs.fa: >k141_126935

Example in k141.fastg:>NODE_1_length_151_cov_1.0000_ID_1:NODE_536013_length_143_cov_1.0000_ID_1072025;

Because of this mismatch, downstream tools like GraphBin cannot map the binning results (based on final.contigs.fa) back to the assembly graph.

Questions to the Developers
Why do IDs change? Does megahit_toolkit contig2fastg rename nodes during the conversion process, or is it using an internal indexing system?

Standard Workflow: What is the recommended way to generate a GFA or FASTG file that maintains 100% ID consistency with the final assembly output?

Alternative: Is there a way to output the assembly graph directly during the megahit run instead of converting intermediate files post-assembly?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Question] How to generate FASTG/GFA with IDs matching final.contigs.fa? #389

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

[Question] How to generate FASTG/GFA with IDs matching final.contigs.fa? #389

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions