Extreme memory consumption when reading certain SRA records?

I am reading sequence data from SRA records by first downloading the SRA record with the `prefetch` command and then iterating through the file using the C++ interface (version 2.10.8), i.e.:
```
ngs::ReadCollection run("DRR001375");
const size_t num_read = run.getReadCount(ngs::Read::all);
ngs::ReadIterator run_iter = ngs::ReadIterator( run.getReadRange ( 1, num_read, ngs::Read::all ) );

size_t read_count = 0;
while( run_iter.nextRead() ){

	++read_count;
	while( run_iter.nextFragment() ){
		
		const string seq = run_iter.getFragmentBases().toString();

		// Process the read sequence ...
		process_read(seq);
	}
}
```
In general, this approach seems to work well. However, I have noticed that for some SRA records (like ERR191522), there is (a) **significant memory consumption** and (b) a **dramatic slow-down** when iterating through the file. The following plot shows the speed (in reads per second) and memory consumption (from `/proc/meminfo`, reported as a fraction of total system memory):
<img width="590" alt="image" src="https://user-images.githubusercontent.com/36452131/88502053-9de9c080-cf8a-11ea-8755-a9f057fd16fa.png">

Other SRA records seem to be fine. For DRR001375, the following graph shows fairly constant speed and memory usage:
<img width="657" alt="image" src="https://user-images.githubusercontent.com/36452131/88502211-0042c100-cf8b-11ea-9e7e-8d1bd4e9efb8.png">

Is there a way to read SRA records, like ERR191522, *without* the large memory consumption? If not, is there a way to identify SRA records (in advance) that will exhibit this behavior (as the available RAM on on cluster instances can easily be exhausted while processing a single SRA record).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extreme memory consumption when reading certain SRA records? #31

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Extreme memory consumption when reading certain SRA records? #31

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions