Skip to content

Conversation

@majosm
Copy link
Collaborator

@majosm majosm commented Jan 14, 2026

Adds CSRMatmul array type for multiplication by sparse matrices in compressed sparse row format.

cc @lukeolson

@majosm
Copy link
Collaborator Author

majosm commented Jan 14, 2026

@inducer I started looking through the loopy codegen code to try to figure out how to generate temporaries for the reduction bounds to avoid islpy errors (as is done in the loopy example), but I'm having some trouble figuring out how best to do that. Seems like I need to create the temporaries inside InlinedExpressionGenMapper.map_reduce() before adding the reduction domain to the kernel, but I'm not sure how to make sure the temporaries get the right scope, dependencies, inames, etc. Any suggestions on how I could approach this? Should I maybe be doing it inside CodeGenMapper.map_index_lambda by collecting the reduction info ahead of time and creating temporaries before calling the exprgen_mapper?

@inducer
Copy link
Owner

inducer commented Jan 14, 2026

Yes, working inside of CodeGenMapper.map_index_lambda seems like a possibly promising route. I'm not sur what info you would need to collect ahead of time though.

@majosm majosm force-pushed the sparse-matrix branch 4 times, most recently from 21e7cb4 to 6b7a9a4 Compare January 22, 2026 20:45
@majosm
Copy link
Collaborator Author

majosm commented Jan 22, 2026

@inducer I think this is ready for a look. I'm not sure how to fix the lingering doc build errors.

Here's what the results are looking like. Code:

A_pl = pt.make_csr_matrix(  # noqa: N806
    shape=(n-2, n),
    elem_values=pt.make_placeholder("A_elem_values", (3*(n-2),)),
    elem_col_indices=pt.make_placeholder("A_elem_col_indices", (3*(n-2),)),
    row_starts=pt.make_placeholder("A_row_starts", (n-1,)))
u_pl = pt.make_placeholder("u", n)
prog = pt.generate_loopy(A_pl @ u_pl)

Kernel:

---------------------------------------------------------------------------
KERNEL: _pt_kernel
---------------------------------------------------------------------------
ARGUMENTS:
A_elem_values: type: np:dtype('float64'), shape: (54), dim_tags: (N0:stride:1), offset: <class 'loopy.typing.auto'> in aspace: global
A_elem_col_indices: type: np:dtype('float64'), shape: (54), dim_tags: (N0:stride:1), offset: <class 'loopy.typing.auto'> in aspace: global
A_row_starts: type: np:dtype('float64'), shape: (19), dim_tags: (N0:stride:1), offset: <class 'loopy.typing.auto'> in aspace: global
u: type: np:dtype('float64'), shape: (20), dim_tags: (N0:stride:1), offset: <class 'loopy.typing.auto'> in aspace: global
_pt_out: type: np:dtype('float64'), shape: (18), dim_tags: (N0:stride:1) out aspace: global
---------------------------------------------------------------------------
DOMAINS:
{  :  }
{ [_pt_temp_dim0] : 0 <= _pt_temp_dim0 <= 17 }
  [_pt_sum_r0_lbound, _pt_sum_r0_ubound] -> { [_pt_sum_r0] : _pt_sum_r0_lbound <= _pt_sum_r0 < _pt_sum_r0_ubound }
{ [_pt_out_dim0] : 0 <= _pt_out_dim0 <= 17 }
---------------------------------------------------------------------------
INAME TAGS:
_pt_out_dim0: None
_pt_sum_r0: None
_pt_temp_dim0: None
---------------------------------------------------------------------------
TEMPORARIES:
_pt_sum_r0_lbound: type: np:dtype('int64'), shape: () aspace: global
_pt_sum_r0_ubound: type: np:dtype('int64'), shape: () aspace: global
_pt_temp: type: np:dtype('float64'), shape: (18), dim_tags: (N0:stride:1) aspace: global
---------------------------------------------------------------------------
INSTRUCTIONS:
    for _pt_temp_dim0
↱     _pt_sum_r0_lbound = A_row_starts[_pt_temp_dim0]  {id=_pt_sum_r0_lbound_store}
│↱    _pt_sum_r0_ubound = A_row_starts[_pt_temp_dim0 + 1]  {id=_pt_sum_r0_ubound_store}
└└↱   _pt_temp[_pt_temp_dim0] = reduce(sum, [_pt_sum_r0], A_elem_values[_pt_sum_r0]*u[A_elem_col_indices[_pt_sum_r0]])  {id=_pt_temp_store}
  │ end _pt_temp_dim0
  │ for _pt_out_dim0
  └   _pt_out[_pt_out_dim0] = _pt_temp[_pt_out_dim0]  {id=_pt_out_store}
    end _pt_out_dim0
---------------------------------------------------------------------------

@majosm majosm requested a review from inducer January 22, 2026 22:01
@majosm majosm marked this pull request as ready for review January 22, 2026 22:19

redn_bounds = ReductionBoundsCollector()(idx_expr)

store_redn_bound_temps = any(
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically, this could be a per-bound decision.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do the actual temporary generation per bound, but I need to know ahead of time if any temporaries will be created so I can define the inames up front.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants