Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
56 commits
Select commit Hold shift + click to select a range
bd04f66
Add prototype for sptrsv
anyzelman Apr 3, 2025
7e139b0
simplify spec
anyzelman Apr 4, 2025
01f7ed0
First non-vectorised reference sptrsv implementation, not yet tested
anyzelman Apr 4, 2025
e8b921e
Add unmasked sptrsv to base spec
anyzelman Apr 4, 2025
f28db62
Bugfixes, remove compiler warnings, and some debug tracing
anyzelman Apr 4, 2025
6181cc3
Add test for new sptrsv function
anyzelman Apr 4, 2025
8b11500
Warn about experimental nature of new primitive
anyzelman Apr 4, 2025
2bb446b
Fix cast error
anyzelman Apr 4, 2025
ef2c4e4
Simplify code a bit
anyzelman Apr 4, 2025
ef89921
Enable new sptrsv unit test
anyzelman Apr 4, 2025
e28e1d4
Add parallel (OpenMP) sptrsv, which may take a schedule such as produ…
anyzelman Apr 4, 2025
20ea13f
Delete from move constructor should not occur
anyzelman Apr 4, 2025
45b9f46
If we set sptrsvSchedule to default in initialize(...), then we do no…
anyzelman Apr 4, 2025
2413a52
Coding blind for a sec here, apologies -- will test it once I have my…
anyzelman Apr 7, 2025
32905e5
Fix compile error and some more defensive programming
anyzelman Apr 7, 2025
98206d0
Add internal functions to set an Sptrsv schedule. Next up: test the p…
anyzelman Apr 7, 2025
42773ca
OSP integration error: input schedule has slightly different structure
anyzelman Apr 8, 2025
b680b6e
I still did not get it, but this should now be correct
anyzelman Apr 8, 2025
6f6cdec
Forgot to update friend signature
anyzelman Apr 8, 2025
ddf70bf
Bugfix
anyzelman Apr 8, 2025
18e7926
Bug Fixes
Apr 9, 2025
6617cad
There can be multiple ranges per superstep per thread that OSP return…
anyzelman Apr 10, 2025
fba4d08
Bugfixes in previous commit, apologies
anyzelman Apr 10, 2025
2dfa329
Update sptrsv implementation to make use of new schedule field. Tests OK
anyzelman Apr 10, 2025
bc6939e
Update schedule ingestion accordingly
anyzelman Apr 10, 2025
708959c
Forgot to update friend declaration
anyzelman Apr 10, 2025
bfe58a8
Should allow for empty ranges
anyzelman Apr 10, 2025
25684b9
I am not enthused about the exceptions that the schedule format appea…
anyzelman Apr 10, 2025
834e375
There can be processes without any range
anyzelman Apr 10, 2025
471f847
Partial undo of one or two commits ago -- it should not happen that t…
anyzelman Apr 10, 2025
0cb6295
Found the likely culprit!
anyzelman Apr 10, 2025
64c64ee
Also update friend declaration
anyzelman Apr 10, 2025
dd17fba
Additional hardening of schedule ingestion logic
anyzelman Apr 10, 2025
a5b5454
Metabugfix
anyzelman Apr 10, 2025
f9bf41d
Silly bug causing difficult-to-track memory errors, apologies
anyzelman Apr 10, 2025
573c80d
Fix typo
anyzelman Apr 10, 2025
4f42d4e
Move assertion to before writing to the location the assert checks
anyzelman Apr 10, 2025
fc968dc
Fix meta-bug
anyzelman Apr 11, 2025
3f5c4c1
Add back support for simple schedules, as the other OSP sptrsv varian…
anyzelman Apr 11, 2025
cd2c967
I rehashed a classic performance bug there, apologies -- now fixed!
anyzelman Apr 11, 2025
3e0eda8
Another tricky performance bug-- and a silly mistake once more...
anyzelman Apr 11, 2025
ad39299
fixed bug in blas2.hpp
tonibohnlein Apr 14, 2025
ba98f0b
Missing optimisation introduced
anyzelman Apr 14, 2025
220b6f8
Changes towards supporting sptrsv on sorted matrices
anyzelman Apr 14, 2025
abc397b
Move index computation into for-loop
anyzelman Apr 14, 2025
4ec4720
Tried and failed vectorisation
anyzelman Apr 14, 2025
19b9473
Const-reference passing where possible
anyzelman Apr 14, 2025
90c12b8
Add in CRS column-sorting as a pre-processing / tuning step for SpTrsv
anyzelman Apr 16, 2025
0a0fe6f
Remove debug statements, also sort CCS, and parallelise the sorting
anyzelman Apr 16, 2025
3853424
Save testing code, assuming sorted arrays
anyzelman Apr 16, 2025
e68d23a
Now that we are sorting, we can re-enable the earlier sorting code
anyzelman Apr 17, 2025
555bc5f
Make sorted-ness configurable
anyzelman Apr 17, 2025
e23d8ee
Make unsorted the default (no sorting overhead
anyzelman Apr 17, 2025
fd1ce27
Make forward a template parameter rather than a run-time switch
anyzelman Apr 17, 2025
a346aa1
Optimisations and bugfixes
anyzelman Apr 17, 2025
763b89f
More robust implementation of sorting mode 1
anyzelman Apr 17, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
143 changes: 143 additions & 0 deletions include/graphblas/base/blas2.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -337,6 +337,149 @@ namespace grb {
return UNSUPPORTED;
}

/**
* Computes \f$ x \f$ from \f$ Tx=b \f$.
*
* \warning This is an experimental feature.
*
* Here, \f$ T, b \f$ are given while \f$ T \f$ additionally must be either
* lower- or upper-triangular. The output \f$ x \f$ may furthermore be masked.
*
* @param[in,out] xb On input: the vector b. On output: the vector x.
* When #grb::EXECUTE is given this vector must have
* sufficient capacity to store both any pre-existing
* nonzeroes in \f$ b \f$ plus the nonzeroes in the (masked)
* result of \f$ x \f$.
* When #grb::RESIZE is given, the vector contents are
* unchanged, however its capacity may be enlarged in order
* to ensure the above-described condition.
*
* \note On input, \a xb may contain both implicit and explicit zeroes.
*
* \note On output, \a xb may contain both implicit and explicit zeroes, even
* if on input it did not.
*
* @param[in] mask The mask that acts on \f$ x \f$. This vector has either
* size zero or size equal to that of \a x.
*
* \note If \a mask has size zero, it is interpreted as though the operation
* unmasked.
*
* @param[in] T The upper- or lower-triangular input matrix. This must be
* a square matrix with size equal to that of \a xb.
* @param[in] forward Whether to perform forward substitution (i.e., \f$ T \f$
* is lower-triangular), or to perform backward substitution
* instead (i.e., \f$ T \f$ is upper-triangular).
*
* \note With the structural information ALP/Dense passes as template
* information, \a forward would not be required.
*
* @param[in] semiring The semiring under which to perform the SpTrsv.
* @param[in] subtraction The inverse of the additive operator of \a semiring.
* @param[in] division The inverse of the multiplicative operator of
* \a semiring.
*
* \note That is, this operation in fact requires a field structure and not a
* semiring. A future extension to ALP may provide such a structure
* explicitly.
*
* @param[in] phase The requested phase of the computation. Only
* #grb::EXECUTE and #grb::RESIZE are supported.
*
* \todo Expand documentation.
*/
template<
Descriptor descr = descriptors::no_operation,
class Semiring,
class Subtraction,
class Division,
typename IOType, typename InputType1, typename InputType2,
typename Coords, typename RIT, typename CIT, typename NIT,
Backend backend
>
RC sptrsv(
Vector< IOType, backend, Coords > &xb,
const Vector< InputType2, backend, Coords > &mask,
const Matrix< InputType1, backend, RIT, CIT, NIT > &T,
const bool forward,
const Semiring &semiring = Semiring(),
const Subtraction &subtraction = Subtraction(),
const Division &division = Division(),
const Phase &phase = EXECUTE,
const typename std::enable_if<
grb::is_semiring< Semiring >::value &&
grb::is_operator< Subtraction >::value &&
grb::is_operator< Division >::value &&
!grb::is_object< IOType >::value &&
!grb::is_object< InputType1 >::value &&
!grb::is_object< InputType2 >::value,
void >::type * const = nullptr
) {
#ifdef _DEBUG
std::cerr << "Selected backend does not implement masked grb::sptrsv\n";
#endif
#ifndef NDEBUG
const bool selected_backend_does_not_support_masked_sptrsv = false;
assert( selected_backend_does_not_support_masked_sptrsv );
#endif
(void) xb;
(void) mask;
(void) T;
(void) semiring;
(void) subtraction;
(void) division;
(void) phase;
return UNSUPPORTED;
}

/**
* Computes \f$ x \f$ from \f$ Tx = b \f$, unmasked variant.
*
* \warning This is an experimental feature.
*
* \todo Extend documentation
*/
template<
Descriptor descr = descriptors::no_operation,
class Semiring,
class Subtraction,
class Division,
typename IOType, typename InputType1,
typename Coords, typename RIT, typename CIT, typename NIT,
Backend backend
>
RC sptrsv(
Vector< IOType, backend, Coords > &xb,
const Matrix< InputType1, backend, RIT, CIT, NIT > &T,
const bool forward,
const Semiring &semiring = Semiring(),
const Subtraction &subtraction = Subtraction(),
const Division &division = Division(),
const Phase &phase = EXECUTE,
const typename std::enable_if<
grb::is_semiring< Semiring >::value &&
grb::is_operator< Subtraction >::value &&
grb::is_operator< Division >::value &&
!grb::is_object< IOType >::value &&
!grb::is_object< InputType1 >::value,
void >::type * const = nullptr
) {
#ifdef _DEBUG
std::cerr << "Selected backend does not implement masked grb::sptrsv\n";
#endif
#ifndef NDEBUG
const bool selected_backend_does_not_support_unmasked_sptrsv = false;
assert( selected_backend_does_not_support_unmasked_sptrsv );
#endif
(void) xb;
(void) T;
(void) semiring;
(void) subtraction;
(void) division;
(void) phase;
return UNSUPPORTED;
}

/**
* Executes an arbitrary element-wise user-defined function \a f on all
* nonzero elements of a given matrix \a A.
Expand Down
236 changes: 236 additions & 0 deletions include/graphblas/reference/SptrsvSchedule.hpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,236 @@

/*
* Copyright 2021 Huawei Technologies Co., Ltd.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/

/*
* @author A. N. Yzelman
* @date 4th of April, 2025
*/

#ifndef _H_GRB_REFERENCE_SPTRSVSCHEDULE
#define _H_GRB_REFERENCE_SPTRSVSCHEDULE

#include <limits>
#include <vector>

#include <graphblas/utils/autodeleter.hpp>


namespace grb {

namespace internal {

template< typename NIT >
class SptrsvSchedule {

private:

/** One deleter for each chunk of thread-local data. */
std::vector< utils::AutoDeleter< char > > _deleters;

/**
* Fixed-size buffer for realising the default schedule.
*
* The first two positions are reserved for the range of the default
* schedule, while the last position is reserved for the end-position of
* the default schedule.
*/
NIT default_schedule[ 3 ];

/** Computes and initialises a trivial schedule. */
void initTrivial( const NIT n ) {
// dynamic sanity check
assert( data[ 0 ] == nullptr );
// set trivial schedule
default_schedule[ 0 ] = 0;
default_schedule[ 1 ] = n;
default_schedule[ 2 ] = 1;
nRanges[ 0 ] = 1;
// set pointers to trivial schedule
data[ 0 ] = reinterpret_cast< char * >( &(default_schedule[0]) );
endPositions[ 0 ] = reinterpret_cast< char * >( &(default_schedule[2]) );
// verify trivial schedule
#ifndef NDEBUG
{
NIT * const interpreted = reinterpret_cast< NIT * >(data[0]);
assert( interpreted[ 0 ] == 0 );
assert( interpreted[ 1 ] == n );
}
{
NIT * const interpreted = reinterpret_cast< NIT * >(endPositions[0]);
assert( *interpreted == 1 );
}
#endif
// note that this is a trivial schedule
is_simple = true;

// the default schedule does not (cannot) sort input matrices
is_sorted = false;
}

/**
* Implementation of move construction and assignment.
*/
void moveImpl( SptrsvSchedule &&toMove ) {
_deleters = std::move( toMove._deleters );
default_schedule = std::move( toMove.default_schedule );
is_sorted = toMove.is_sorted;
is_simple = toMove.is_simple;
supersteps = toMove.supersteps;
nThreads = toMove.nThreads;
data = std::move( toMove.data );
endPositions = std::move( toMove.endPositions );
nRanges = std::move( toMove.nRanges );
toMove.is_sorted = true;
toMove.is_simple = false;
toMove.supersteps = 0;
toMove.nThreads = 0;
}


public:

/** Whether the matrix is sorted as part of tuning. */
bool is_sorted;

/** Whether the schedule is simple.*/
bool is_simple;

/** Number of schedule steps. */
size_t supersteps;

/**
* The number of threads the schedule is designed for.
*/
size_t nThreads;

/** One data pointer per thread. */
std::vector< char * > data;

/** One end-position array per thread. */
std::vector< char * > endPositions;

/** The total number of ranges per thread. */
std::vector< NIT > nRanges;

/** Move constructor. */
SptrsvSchedule( SptrsvSchedule &&toMove ) {
moveImpl( toMove );
}

/**
* Base constructor.
*
* @param[in] n The size of the linear system.
*
* Initialises the schedule to the trivial always-valid schedule, which is
* a single-thread single-superstep schedule that assigns all work to the
* first (and only) thread.
*/
SptrsvSchedule( const NIT n ) :
_deleters( 1 ), is_sorted( true ), is_simple( false ),
supersteps( 1 ), nThreads( 1 ),
data( 1 ), endPositions( 1 ), nRanges( 1 )
{
data[ 0 ] = nullptr;
endPositions[ 0 ] = nullptr;
initTrivial( n );
}

/**
* Specialised constructor for multiple threads.
*
* @param[in] n The size of the linear system.
* @param[in] T The number of threads a schedule should be created for.
*
* Initialises the schedule to the trivial always-valid schedule, which is
* a single-superstep schedule that assigns all work to the first thread.
*/
SptrsvSchedule( const NIT n, const size_t T ) :
_deleters( 2 * T ), is_sorted( true ), is_simple( false ),
supersteps( 1 ), nThreads( T ),
data( T, nullptr ), endPositions( T, nullptr ), nRanges( T, 0 )
{
if( T == 0 || T > std::numeric_limits< int >::max() ) {
throw std::runtime_error( "Invalid number of threads" );
}
initTrivial( n );
}

/**
* Base destructor.
*
* Dynamic memory is freed through the #_deleters.
*/
~SptrsvSchedule() {
// sanity checks only
assert( nThreads != 0 );
if( nThreads == 1 ) {
assert( supersteps == 1 );
}
}

/** Move assignment. */
SptrsvSchedule& operator=( SptrsvSchedule &&toMove ) {
moveImpl( toMove );
return *this;
}

/**
* Allocates a thread-local chunk of data.
*
* Must be called from within the thread that will use it(!)
*/
void alloc( const size_t s, const size_t nRanges_in ) {
assert( s < nThreads );
assert( supersteps > 0 );
assert( _deleters.size() >= s );
assert( data.size() >= s );
assert( data[ s ] == nullptr );
assert( endPositions[ s ] == nullptr );
if( nRanges[ s ] != nRanges_in ) {
throw std::runtime_error( "Given nRanges differs from the one given at "
"allocation time" );
}
grb::RC rc = grb::SUCCESS;
if( nRanges_in > 0 ) {
rc = utils::alloc(
"grb::internal::SptrsvSchedule (default constructor)",
"default thread-local data allocation, variant I",
data[ s ], 2 * nRanges_in * sizeof(NIT), false, _deleters[ s ],
endPositions[ s ], supersteps * sizeof( NIT ), false, _deleters[ s + nThreads ]
);
} else {
data[ s ] = nullptr;
rc = utils::alloc(
"grb::internal::SptrsvSchedule (default constructor)",
"default thread-local data allocation, variant II",
endPositions[ s ], supersteps * sizeof( NIT ), false, _deleters[ s + nThreads ]
);
}
if( rc != grb::SUCCESS ) {
throw std::bad_alloc();
}
}

};

} // end namespace `grb::internal'

} // end namespace `grb'

#endif // _H_GRB_REFERENCE_SPTRSVSCHEDULE

Loading