Skip to content

Load Imbalance in Substitute Matrix #1

@esaliya

Description

@esaliya

The substitute matrix, S, shows a high load imbalance. Fixing this may require keeping a randomized mapping of k-mers to k-mer IDs.

See the email thread About Large Runs on 12/29/2019. Here's a logfile from a run that shows this effect and also fails.

Process Grid (p x p x t): 68 x 68 x 2

INFO: Program started on Sat Dec 28 20:04:12 2019

INFO: Job ID knl_fa_shuff_subs25/knl_fa_shuff_subs25_c61ed871-1547-4285-8082-f05137282334
Parameters...
  Input file (-i):                     /global/cscratch1/sd/esaliya/data/isolates/archaea/sanitized_2728834_impure_2729008_len_lte_2000_in_shuffled_isolates_proteins_archaea.fasta
  Original sequence count (-c):        2728834
  Kmer length (k):                     6
  Kmer stride (s):                     1
  Overlap in bytes (-O):               10000
  Max seed count (--sc):               1
  Gap open penalty (-g):               -11
  Gap extension penalty (-e):          -2
  Overlap file (--of):                 None
  Alignment file (--af):               knl_fa_shuff_subs25/knl_fa_shuff_subs25_align.txt
  Alignment write frequency (--afreq): 100000
  No align (--na):                     False
  Full align (--fa):                   True
  Xdrop align (--xa):                  False
  Banded align (--ba):                 False
  Index map (--idxmap):                knl_fa_shuff_subs25_archaea_idx_map.txt
  Alphabet (--alph):                   0
  Use substitute kmers (--subs):       True | sub kmers: 25
Creating fileknl_fa_shuff_subs25_archaea_idx_map.txt with 41438932 bytes
File knl_fa_shuff_subs25_archaea_idx_map.txt is actually 41438932 bytes seen from process 4623

INFO: Modfied sequence count
  Final sequence count: 2728822 (0.000440% removed)
Matrix A: 
Load imbalance: 3.118424
As a whole: 2728822 rows and 244140625 columns and 718716196 nonzeros
Matrix At: As a whole: 244140625 rows and 2728822 columns and 718716196 nonzeros
Matrix S: 
Load imbalance: 113.142021
As a whole: 244140625 rows and 244140625 columns and 723834658 nonzeros
Matrix AS: 
Load imbalance: 2.567925
As a whole: 2728822 rows and 244140625 columns and 10751320837 nonzeros
terminate called after throwing an instance of 'std::bad_alloc'

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions