Skip to content

Commit 67ace8b

Browse files
authored
Merge branch 'main' into refactor/exec-place-unified-grid
2 parents 2404589 + ab58dd0 commit 67ace8b

File tree

66 files changed

+10163
-245
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

66 files changed

+10163
-245
lines changed
Lines changed: 90 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,90 @@
1+
---
2+
name: libcudacxx-style
3+
description: Make the code in libcudacxx/include, cudax/include compliant with the coding style
4+
---
5+
6+
# libcudacxx Style
7+
8+
## Naming style
9+
10+
- Macros: macro style, e.g. `MY_MACRO`.
11+
- Template parameters: CamelCase, e.g. `MyParameter`.
12+
- All other symbols: snake style, e.g. `my_variable`.
13+
14+
All non-public symbols must be C++ reserved identifiers:
15+
16+
- `_` for macros and template parameters, e.g. `_MY_MACRO`., `_MyParameter`.
17+
- `__` for all other symbols, e.g. `__my_variable`.
18+
19+
- Avoid single letter names for template parameters. Wrong: `_T`, correct: `_Tp`.
20+
21+
## Variables
22+
23+
- All variables that are not modified must use `const`. This includes variables initialized by casts (`static_cast`, `reinterpret_cast`, `bit_cast`), function return values, and loop-invariant computations.
24+
- All variables that can be evaluated at compile-time must use `constexpr`.
25+
- Consider using plural names for array, span, list, e.g. `int values[4]` instead of `int value[4]`.
26+
27+
## Function
28+
29+
Declaration/Definition:
30+
31+
- All functions must be marked `_CCCL_HOST_API`, `_CCCL_DEVICE_API`, or `_CCCL_API`.
32+
- Non-template, non-`constexpr` functions must use `inline`.
33+
- Most functions with a non-void return type shall use `[[nodiscard]]`. Exceptions are functions with known side effects, e.g. `cuda::std::copy`
34+
- All functions that don't throw exception must use `noexcept`
35+
- `constexpr` must be used for all functions that don't depend on run-time features, e.g. pointers.
36+
- If the return type is not explicit (`auto`), then a trailing return type is strongly preferred, e.g. `auto abs(float) -> float`
37+
38+
Function call:
39+
40+
- All calls to free functions must be fully qualified starting from the global namespace, e.g. `::cuda::ceil_div`. This includes calls to functions defined in the same namespace, e.g. inside `cuda::`, call `::cuda::ceil_div(...)`, not `ceil_div(...)`. This does not apply to (static) member functions of classes.
41+
42+
## Types
43+
44+
- Type names must be fully qualified, except when they are already declared in the current namespace.
45+
- This includes standard integer type aliases (`::cuda::std::size_t`, `::cuda::std::uintptr_t`, `::cuda::std::int32_t`, etc.) and any other `cuda::std` or standard library types. A local `using` declaration (e.g. `using ::cuda::std::size_t;`) is acceptable to avoid repetition within a function body.
46+
47+
## Headers
48+
49+
- All header inclusions must use the syntax `<header>`.
50+
- Files must include all headers related to the symbols that they are using.
51+
- No transitive header inclusion are allowed.
52+
- Unneeded headers must be removed.
53+
- The headers must be the most precise one, e.g. `#include <cuda/std/__type_traits/is_array.h>`.
54+
- Headers in `cuda/std/__cccl/` must not be included directly (they are provided by `__config` or the prologue/epilogue mechanism).
55+
56+
- All headers must have the correct license. If the file is ported from LLVM libc++ then we *must* use the LLVM license.
57+
- All headers must have the include guard, with the correct name: uppercase full path from the root, separated by `_`.
58+
- The closing `#endif` always carries a comment repeating the guard name.
59+
- Right after the include guard, the code must include:
60+
```cpp
61+
#include <cuda/std/detail/__config>
62+
63+
#if defined(_CCCL_IMPLICIT_SYSTEM_HEADER_GCC)
64+
# pragma GCC system_header
65+
#elif defined(_CCCL_IMPLICIT_SYSTEM_HEADER_CLANG)
66+
# pragma clang system_header
67+
#elif defined(_CCCL_IMPLICIT_SYSTEM_HEADER_MSVC)
68+
# pragma system_header
69+
#endif // no system header
70+
```
71+
- The last included header must be `#include <cuda/std/__cccl/prologue.h>` before the code, and `#include <cuda/std/__cccl/epilogue.h>` at the end of a file.
72+
73+
## Comments
74+
75+
- Commented code without a description is not allowed.
76+
- Use Doxygen-style `//! @brief comments`.
77+
- When a function is documented with Doxygen, it must include: `//! @brief`, `//! @param[in/out/in,out]` for every parameter, and `//! @return` for non-void functions.
78+
- The `@brief/@param/@return` description must accurately reflect the current functionality of the function.
79+
80+
## General guidelines
81+
82+
- The code must reuse `cuda/` or `cuda/std` functionalities as much as possible, including macros.
83+
- Try to use modern C++ as much as possible. The repository supports C++17 but many more recent functionalities have been backported with functions and macros.
84+
85+
## Prevent compiler errors and improve compatibility
86+
87+
- Never allow lambda expressions in device-only or host-device code.
88+
- Protect host-only code with `#if !_CCCL_COMPILER(NVRTC)`.
89+
- Remove unused code, variables, functions, types, template parameters, headers, etc.
90+
- Variables that are unsigned, or that can become unsigned after template instantiation, must not check for negative values directly. Use `cuda::std::is_unsigned_v<T> ? false : (var < 0)` instead.
Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
---
2+
name: libcudacxx-style
3+
description: Make the code in libcudacxx/include, cudax/include compliant with the coding style
4+
---
5+
6+
The skill content is in .agent/skills/licudacxx-style/SKILL.md

cudax/examples/stf/CMakeLists.txt

Lines changed: 15 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -21,28 +21,42 @@ set(
2121
01-axpy-launch.cu
2222
01-axpy-parallel_for.cu
2323
binary_fhe.cu
24+
binary_fhe_stackable.cu
2425
09-dot-reduce.cu
2526
cfd.cu
2627
custom_data_interface.cu
2728
fdtd_mgpu.cu
29+
fdtd_while.cu
30+
fdtd_repeat_n.cu
2831
frozen_data_init.cu
2932
graph_algorithms/degree_centrality.cu
3033
graph_algorithms/jaccard.cu
3134
graph_algorithms/pagerank.cu
35+
graph_algorithms/pagerank_batched.cu
36+
graph_algorithms/pagerank_while.cu
3237
graph_algorithms/tricount.cu
38+
graph_scope.cu
3339
heat.cu
3440
heat_mgpu.cu
3541
jacobi.cu
3642
jacobi_pfor.cu
43+
jacobi_stackable.cu
44+
jacobi_stackable_raii.cu
45+
jacobi_update_cond.cu
3746
launch_histogram.cu
3847
launch_scan.cu
3948
launch_sum.cu
4049
launch_sum_cub.cu
50+
linear_algebra/burger.cu
51+
linear_algebra/burger_sensitivity.cu
52+
linear_algebra/cg_csr.cu
53+
linear_algebra/cg_csr_stackable.cu
4154
logical_gates_composition.cu
4255
mandelbrot.cu
4356
parallel_for_2D.cu
4457
pi.cu
4558
scan.cu
59+
sqrt_newton_stackable.cu
4660
standalone-launches.cu
4761
word_count.cu
4862
word_count_reduce.cu
@@ -52,9 +66,9 @@ set(
5266
set(
5367
stf_example_mathlib_sources
5468
linear_algebra/06-pdgemm.cu
69+
linear_algebra/06-pdgemm-stackable.cu
5570
linear_algebra/07-cholesky.cu
5671
linear_algebra/07-potri.cu
57-
linear_algebra/cg_csr.cu
5872
linear_algebra/cg_dense_2D.cu
5973
linear_algebra/strassen.cu
6074
)
Lines changed: 240 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,240 @@
1+
//===----------------------------------------------------------------------===//
2+
//
3+
// Part of CUDASTF in CUDA C++ Core Libraries,
4+
// under the Apache License v2.0 with LLVM Exceptions.
5+
// See https://llvm.org/LICENSE.txt for license information.
6+
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
7+
// SPDX-FileCopyrightText: Copyright (c) 2024-2025 NVIDIA CORPORATION & AFFILIATES.
8+
//
9+
//===----------------------------------------------------------------------===//
10+
11+
/**
12+
* @file
13+
* @brief A toy example to illustrate how we can compose logical operations over encrypted data
14+
*/
15+
16+
#include <cuda/experimental/stf.cuh>
17+
18+
using namespace cuda::experimental::stf;
19+
20+
#include <memory>
21+
22+
class ciphertext;
23+
24+
class plaintext
25+
{
26+
public:
27+
plaintext(const stackable_ctx& ctx)
28+
: ctx(ctx)
29+
{}
30+
31+
plaintext(stackable_ctx& ctx, ::std::vector<char> v)
32+
: values(mv(v))
33+
, ctx(ctx)
34+
, ld(ctx.logical_data(values.data(), values.size()))
35+
{}
36+
37+
auto& set_symbol(const std::string& s)
38+
{
39+
ld.set_symbol(s);
40+
symbol = s;
41+
42+
return *this;
43+
}
44+
45+
const std::string& get_symbol() const
46+
{
47+
return symbol;
48+
}
49+
50+
// This will asynchronously fill string s
51+
void convert_to_vector(std::vector<char>& v)
52+
{
53+
ctx.host_launch(ld.read()).set_symbol("to_vector")->*[&](auto dl) {
54+
v.resize(dl.size());
55+
for (size_t i = 0; i < dl.size(); i++)
56+
{
57+
v[i] = dl(i);
58+
}
59+
};
60+
}
61+
62+
ciphertext encrypt() const;
63+
64+
private:
65+
std::vector<char> values;
66+
mutable stackable_ctx ctx;
67+
::std::string symbol;
68+
69+
public:
70+
mutable stackable_logical_data<slice<char>> ld;
71+
};
72+
73+
class ciphertext
74+
{
75+
public:
76+
ciphertext() = default;
77+
78+
// We need a deep-copy semantic
79+
ciphertext(const ciphertext& other)
80+
: ctx(other.ctx)
81+
, symbol(other.symbol)
82+
{
83+
copy_content(ctx, other, *this);
84+
}
85+
86+
ciphertext(const stackable_ctx& ctx)
87+
: ctx(ctx)
88+
{}
89+
90+
ciphertext(ciphertext&&) = default;
91+
ciphertext& operator=(ciphertext&&) = default;
92+
93+
static void copy_content(stackable_ctx& ctx, const ciphertext& src, ciphertext& dst)
94+
{
95+
dst.ld = ctx.logical_data(src.ld.shape());
96+
ctx.parallel_for(src.ld.shape(), src.ld.read(), dst.ld.write()).set_symbol("copy")->*
97+
[] __device__(size_t i, auto src, auto dst) {
98+
dst(i) = src(i);
99+
};
100+
}
101+
102+
auto& set_symbol(std::string s)
103+
{
104+
ld.set_symbol(s);
105+
symbol = mv(s);
106+
107+
return *this;
108+
}
109+
110+
const std::string& get_symbol() const
111+
{
112+
return symbol;
113+
}
114+
115+
plaintext decrypt() const
116+
{
117+
plaintext p(ctx);
118+
p.ld = ctx.logical_data(shape_of<slice<char>>(ld.shape().size()));
119+
ctx.parallel_for(ld.shape(), ld.read(), p.ld.write()).set_symbol("decrypt")->*
120+
[] __device__(size_t i, auto cipher_data, auto plain_data) {
121+
plain_data(i) = static_cast<char>(cipher_data(i) >> 32);
122+
};
123+
return p;
124+
}
125+
126+
// Copy assignment operator
127+
// We need a deep-copy semantic
128+
ciphertext& operator=(const ciphertext& other)
129+
{
130+
if (this != &other)
131+
{
132+
ctx = other.ctx;
133+
symbol = other.symbol;
134+
copy_content(ctx, other, *this);
135+
}
136+
return *this;
137+
}
138+
139+
ciphertext operator|(const ciphertext& other) const
140+
{
141+
ciphertext result(ctx);
142+
result.ld = ctx.logical_data(ld.shape());
143+
144+
ctx.parallel_for(ld.shape(), ld.read(), other.ld.read(), result.ld.write()).set_symbol("OR")->*
145+
[] __device__(size_t i, auto d_c1, auto d_c2, auto d_res) {
146+
d_res(i) = d_c1(i) | d_c2(i);
147+
};
148+
149+
return result;
150+
}
151+
152+
ciphertext operator&(const ciphertext& other) const
153+
{
154+
ciphertext result(ctx);
155+
result.ld = ctx.logical_data(ld.shape());
156+
157+
ctx.parallel_for(ld.shape(), ld.read(), other.ld.read(), result.ld.write()).set_symbol("AND")->*
158+
[] __device__(size_t i, auto d_c1, auto d_c2, auto d_res) {
159+
d_res(i) = d_c1(i) & d_c2(i);
160+
};
161+
162+
return result;
163+
}
164+
165+
ciphertext operator~() const
166+
{
167+
ciphertext result(ctx);
168+
result.ld = ctx.logical_data(ld.shape());
169+
170+
ctx.parallel_for(ld.shape(), ld.read(), result.ld.write()).set_symbol("NOT")->*
171+
[] __device__(size_t i, auto d_c, auto d_res) {
172+
d_res(i) = ~d_c(i);
173+
};
174+
175+
return result;
176+
}
177+
178+
mutable stackable_logical_data<slice<uint64_t>> ld;
179+
180+
private:
181+
mutable stackable_ctx ctx;
182+
::std::string symbol;
183+
};
184+
185+
ciphertext plaintext::encrypt() const
186+
{
187+
ciphertext c(ctx);
188+
c.ld = ctx.logical_data(shape_of<slice<uint64_t>>(ld.shape().size()));
189+
190+
ctx.parallel_for(ld.shape(), ld.read(), c.ld.write()).set_symbol("encrypt")->*
191+
[] __device__(size_t i, auto dptxt, auto dctxt) {
192+
// A super safe encryption !
193+
dctxt(i) = ((uint64_t) (dptxt(i)) << 32 | 0x4);
194+
};
195+
196+
return c;
197+
}
198+
199+
template <typename T>
200+
T circuit(const T& a, const T& b)
201+
{
202+
return ~((a | ~b) & (~a | b));
203+
}
204+
205+
int main()
206+
{
207+
stackable_ctx ctx;
208+
209+
const std::vector<char> vA{3, 3, 2, 2, 17};
210+
plaintext pA(ctx, std::vector<char>(vA));
211+
pA.set_symbol("A");
212+
213+
const std::vector<char> vB{1, 7, 7, 7, 49};
214+
plaintext pB(ctx, std::vector<char>(vB));
215+
pB.set_symbol("B");
216+
217+
auto s_encrypt = ctx.dot_section("encrypt");
218+
auto eA = pA.encrypt().set_symbol("A");
219+
auto eB = pB.encrypt().set_symbol("B");
220+
s_encrypt.end();
221+
222+
ctx.push();
223+
224+
auto s_circuit = ctx.dot_section("circuit");
225+
auto out = circuit(eA, eB);
226+
s_circuit.end();
227+
228+
ctx.pop();
229+
230+
std::vector<char> v_out;
231+
out.decrypt().convert_to_vector(v_out);
232+
233+
ctx.finalize();
234+
235+
for (size_t i = 0; i < v_out.size(); i++)
236+
{
237+
char expected = circuit(vA[i], vB[i]);
238+
EXPECT(expected == v_out[i]);
239+
}
240+
}

0 commit comments

Comments
 (0)