Skip to content

Some basic questions about binsparse specification #55

@srobertp

Description

@srobertp

Hi all,

I was just reading through the specification and had several comments and question to provide. Some are related and others are not, we can split up into separate questions and discussions ... I will go back through and try to link to places in the spec repo as well

  1. I understand why we use 0 based for C/C++, but the format may be implemented in Fortran where 1-base is normal, right ? Doesn't matlab also use 1-base sometimes? Would it be better to have index be part of the definition ? Or is there a reason you want it always 0-base ? I suppose it is simpler, but using CSR format with ind as 0 or 1 index (or for distributed matrices, possibly as large k>0 value for start of local rows ? )
auto row_st = pointers_to_1[row] - ind;
auto row_en = pointers_to_1[row+1] - ind;
for (int j = row_st; j < row_en; ++j) {
    auto col = indices_1[j] - ind;
    auto val = values[j];
}

isn't much different than the always 0 index version:

auto row_st = pointers_to_1[row];
auto row_en = pointers_to_1[row+1];
for (int j = row_st; j < row_en; ++j) {
    auto col = indices_1[j];
    auto val = values[j];
}
  1. What might it look like to have a distributed range of matrix stored in separate files on separate systems possibly ? Is there a way to encode that this file contains CSR data for rows between 90 and 180 or a larger matrix with nRows 10000 … ?

  2. number_of_stored_values is semantically cleaner than number_of_non_zeros or nnz, but is there a shortened version of the acronym?

  3. "Fill" is fascinating, so it allows you to store a matrix like

[ D       ones ]
[ ones  D      ]

very efficiently eh :)

  1. Do DMATR/DMATC take a leading dimension taht is different from the nrows/ncols? Or does it allow to introduce a leading dimension when reading it in from file ?

  2. Would you ever support reading in or writing out a CSR4 representation ? Is it possible as a custom format? CSR4 technically allows for rearranging rows simply or using submatrices … honestly it isn't the most important format, but does show up in some libraries impls … you have two pointers for start and end instead of the one … and each are length nrows instead of nrows+1 …

auto row_st = pointers_to_1_st[row] - ind;   // start of pointers_to_1
auto row_en = pointers_to_1_en[row] - ind; // end of pointers_to_1
for (int j = row_st; j < row_en; ++j) {
    auto col = indices_1[j] - ind;
    auto val = values[j];
}
  1. I am fascinated by the idea of having COOC format as different from COOR (aka COO), but it makes sense :) do you allow unsorted at all for COO ? I don't want you to have to implement matrix sorting in an impl, but is it something to consider, and COOR vs COOC is a nice way to have unique sorted state clear whereas COO doesn't have one …

  2. Why are we requiring that "elements must not be duplicated, even for COO, it says "Pairs must not be duplicated" … will you provide a functionality to compress duplication in some way ?

  3. Data Types: have you considered adding bfloat16 or TF32 as supported real types ?

  4. The word "structure" in sparse contexts often means something a little differently to me than the pattern of symmetric_lower or skew_symmetric_upper … to me it refers to everything but the values … would you consider changing the name "structure" to "pattern" in the specification? I was thinking the structure section was going to be about storing a matrix without values, but I was mistaken :) as in "structure_only" …
    On second thought, I suppose it isn't the worst name :D, so keeping it as is, is fine, but it is just slightly unexpected to me… is there a way to mark there are no corresponding values at all ?

Best Regards,
Spencer

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions