Conversation
Fix bugs found while building CPython with pcc: - Bug A: Preprocessor drops line after #define starting with /* comment */ - Bug B: Inline asm +r operand numbering and constraint handling (6 sub-fixes) - Bug C: Float constant expressions in global initializers - Bug D: Arrow expressions in eval_static_address - Bug E: Inliner drops second half of 16-byte struct returns (two-reg ABI) - Bug 5: u32 overflow in size_bits for large types - Bug 6: Directive::Zero(u32) → Zero(usize) for large zero-fills CPython now compiles fully and _freeze_module works at -O0/-O1. A separate inliner bug (Bug F) remains at -O2+ when functions with 10+ IR instructions are inlined. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… trigger Bisection reveals the CPython _freeze_module -O2 crash is caused specifically by inlining PyTuple_SET_ITEM (size 19, void, non-inline-hinted). Skipping its inlining produces a working binary. Standalone tests with the same pattern pass — the bug only manifests in complex CPython contexts with many other inlined functions in the same caller. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…TEM inlining Key finding: inlining PyTuple_SET_ITEM into any single caller works fine. The crash only occurs when it's inlined into multiple callers across many files simultaneously (35 callers total). This points to an interaction between the module-level inlining pass and subsequent codegen phases, not a single-function inlining correctness issue. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…t -O2 The ALWAYS_INLINE_SIZE * 2 rule at -O2 inlined non-inline-hinted functions of size 11-20 with multiple call sites. This triggered a latent bug causing uninitialized stack values in large callers (CPython's _PySys_InitCore segfaulted via PyTuple_SET_ITEM inlining). Fix: Remove the multi-call-site rule. Single-call-site and inline-hinted functions are still aggressively inlined at O2. The underlying inliner defect (likely in block reordering or register allocation liveness after complex inlining) remains for future investigation. Verified: CPython _freeze_module passes at -O0 through -O3 with 0 valgrind errors. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…initialized Root cause: emit_copy_with_type() stored 32-bit values to stack slots with movl (4 bytes), leaving the upper 4 bytes of the 8-byte slot uninitialized. When a subsequent 64-bit load read the full 8 bytes, it picked up garbage in the upper half. This manifested when a C `int` value (e.g. `int pos` in make_version_info) was inlined into a function that passes it as `Py_ssize_t` (64-bit) to PyTuple_SET_ITEM. The inlined store.64 to the index local read back garbage from the uninitialized upper bytes. Fix: In emit_copy_with_type(), store 32-bit values to stack using 64-bit width. On x86-64, movl to a 32-bit register zero-extends to 64-bit, so the subsequent movq to stack writes all 8 bytes correctly. Also removes the workaround that disabled multi-call-site inlining at -O2 (that rule now works correctly) and cleans up dead debug code. Verified: CPython _freeze_module passes at -O0 through -O3 with 0 valgrind errors. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Bug F: The linearizer didn't insert implicit integer widening conversions when passing a narrower argument (e.g., int) to a wider parameter (e.g., Py_ssize_t). After inlining, the 32-bit pseudo was used in 64-bit contexts with garbage upper bits. Fix: insert sign/zero extension at call sites when actual argument type is narrower than formal parameter. Bug G: emit_assign() only used block-copy for structs > 64 bits. Structs of exactly 64 bits (like PyCompilerFlags: 2 ints) fell through to linearize_expr() which returns the struct's ADDRESS, not its value. The address was stored at the target instead of the dereferenced struct data. Fix: use block-copy for ALL struct sizes (target_size > 0). Also reverts the incorrect Bug F workaround (64-bit stack stores for 32-bit copies) which clobbered adjacent stack data. Verified: _freeze_module output now byte-identical to gcc at -O3. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… slot stale bytes Bug H (Zext): emit_extend() for Opcode::Zext loaded 8-bit values at 32-bit width via .max(32), causing sign-extended char values (e.g., 0xF3 → -13) to pass through unmasked. Fix: AND-mask to source width after loading. This fixes (unsigned char) casts and CPython's marshal type code parsing. Bug I (stack stores): Created store_to_stack_slot() that always writes 64-bit to regalloc stack slots (which are 8 bytes each). Updated emit_move_to_loc() and emit_store() offset-0 local stores to use it. Prevents stale upper bytes from subsequent 64-bit loads. Also adds implicit integer widening at call sites (Bug F) when actual argument type is narrower than formal parameter type. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ring config init After fixing Bug H (Zext), _bootstrap_python gets past marshal parsing into importlib execution. New crash: NULL pointer dereference in _Py_Instrument called from config_init_stdio_encoding. Likely another codegen correctness issue exposed by getting further in CPython init. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Progress: _bootstrap_python now loads frozen importlib and begins executing bytecode. Crashes in _PyDict_GetItemWithError from LOAD_BUILD_CLASS instruction — dict or builtins pointer corruption from pcc codegen. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
pcc ignored __attribute__((packed)) on struct definitions, causing packed
structs to have wrong sizes. CPython's tracemalloc_frame (packed: 12 bytes)
was laid out as 16 bytes, shifting all fields after it in _PyRuntimeState
by 8 bytes — corrupting every runtime data access.
Fix: Parse packed attribute at all three positions on struct definitions
(before tag, between tag and {, after }). Pass packed flag to
compute_struct_layout() which suppresses alignment padding between members
and in trailing padding.
Also includes store_to_stack_slot() function and emit_store offset-0
widening for Bug I.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ilure The _bootstrap_python crashes in the opcode dispatch switch of ceval.c. pcc defines __GNUC__=4 which makes Py_UNREACHABLE() expand to __builtin_unreachable() instead of Py_FatalError(). The switch on uint8_t opcode doesn't match valid opcodes — likely a switch statement codegen bug for large case counts. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Switch codegen used insn.size.max(32) to load the switch variable, upgrading uint8_t (8-bit) to 32-bit. emit_move with size=32 did a movl from a stack slot that was stored as 8-bit, reading 3 garbage bytes. This caused CPython's opcode dispatch (switch on uint8_t opcode) to read wrong opcodes and hit __builtin_unreachable(). Fix: load at insn.size (actual type width), then CMP at B32. For sub-32 bit values, emit_move uses Movzx (zero-extending load) which correctly reads only the stored bytes. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
All function prologues now zero the stack frame using rep stosq after allocating stack space but before storing arguments. This ensures all 8-byte regalloc stack slots start as zero, so narrow writes (8/16/32-bit) leave zero in the unwritten upper bytes instead of garbage from prior stack usage. Implementation: save RDI/RCX to R10/R11 scratch registers, set up rep stosq (RDI=RSP, RCX=qwords, RAX=0), zero the frame, restore RDI/RCX. Added RepStosq LIR instruction type. This is a comprehensive fix for the class of bugs where pcc codegen writes narrow values to 8-byte stack slots and later loads them at wider widths. Verified: _freeze_module produces 0 valgrind errors. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…f value linearize_expr for Deref of struct/union ALWAYS returned the pointer address, even for small types (<= 64 bits). Callers that used the result as a value (e.g., _Py_CODEUNIT word = *next_instr) stored the 64-bit pointer where a 16-bit value was expected, causing total data corruption. Fix: only return the address for large structs (> 64 bits). For small structs/unions, emit a Load instruction to read the value through the pointer, matching how scalars are handled. This fixes CPython's bytecode opcode dispatch — the switch now receives correct opcode values and no longer hits __builtin_unreachable(). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Struct types must return addresses from Deref (needed for member-offset access like s->field). Only union types <= 64 bits get the value-load treatment, since unions are accessed as whole values. Also updates bugs.md with Bug M analysis: 4 specific CPython files (pylifecycle, import, ceval, sysmodule) break _bootstrap_python when pcc-compiled, while other files work fine. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
pcc emitted compilation errors during linearization (e.g., "unsupported expression in global initializer") but exited with code 0, producing corrupt .o files with missing/zero data. Build systems like CPython's Makefile don't detect the error and link the corrupt objects, causing mysterious runtime crashes. Fix: add diag::has_error() check after linearization in process_file(). pcc now correctly fails compilation when linearization produces errors. The "unsupported expression" error itself (in pycore_pyerrors.h's Py_CLEAR macro) still needs to be fixed separately — it prevents compilation of pylifecycle.c, import.c, ceval.c, sysmodule.c. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…heck Three fixes for global initializer handling: 1. Added diag::has_error() check after linearization in main.rs. Previously pcc emitted errors during linearization but exited 0, producing corrupt .o files that caused mysterious runtime crashes. 2. Added eval_static_address() as fallback in ast_init_to_ir() catch-all. Handles complex &global.field->subfield address chains used in CPython's _PyRuntimeState_INIT macro (PYDBGRAW_ALLOC etc.). 3. Added Conditional (ternary) expression handling in ast_init_to_ir(). CPython's _Py_LATIN1_CHR() macro uses compile-time ternaries in static initializers: 'r' < 128 ? &ascii['r'] : &latin1['r'-128]. All CPython .c files now compile without errors. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ks _bootstrap_python From a clean gcc baseline, swapping ANY single CPython file to pcc-compiled causes _bootstrap_python to crash. This is NOT stale stack bytes (zeroing doesn't fix), NOT struct layout (all match), NOT the linker, NOT pyconfig.h. The issue is a fundamental codegen pattern that affects virtually every file. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
ROOT CAUSE CONFIRMED: pcc passes struct parameters > 16 bytes as hidden pointers (sret-style, pointer in RDI), while the SysV AMD64 ABI requires them to be passed BY VALUE on the stack (MEMORY class). Evidence: PyStatus_Exception(PyStatus status) — GCC reads status from 16(%rbp) (stack argument), PCC reads from (%rdi) (pointer dereference). PyStatus is 32 bytes, which triggers MEMORY classification in the ABI. This is the root cause of the systematic crash where ANY pcc-compiled file breaks _bootstrap_python — every file uses PyStatus extensively. Fix requires changes to: ABI classification (classify_param), caller codegen (push struct bytes to stack), callee codegen (receive via IncomingArg), and callee linearizer (remove pointer conversion). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Two fixes: 1. Large struct params (> 16 bytes) now passed by value on the stack per SysV AMD64 ABI MEMORY class, instead of incorrectly as hidden pointers. Caller pushes all qwords; callee reads via IncomingArg with SymAddr. Fixes PyStatus (32 bytes) ABI mismatch that broke every CPython init function. 2. Remove incorrect 32→64 bit store widening at offset 0 in emit_store. A 32-bit store to a struct's first field was widened to 64-bit, clobbering the adjacent field at offset 4. Fixes compound literal + designated initializer failures. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Store-widening (32→64 bit at offset 0) now only applies to scalar locals (type size ≤ 32 bits). For structs, the first field at offset 0 gets exact-size stores to avoid clobbering the adjacent field. Uses sym_type_sizes map populated during function codegen to distinguish scalar locals from struct fields. Also: removed dead regalloc post-pass code. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Pre-existing bug: stale caller-saved register across call in goto-dispatch loops. Blocks CPython make test. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace implicit phi operand references with explicit PhiSource instructions in predecessor blocks, following sparse's OP_PHISOURCE design. This makes phi data flow visible to all passes without special-casing phi_list semantics. Key changes: - Add Opcode::PhiSource to IR with back-pointer to owning phi - SSA pass (fill_phi_operands) emits PhiSource in predecessors - Linearizer emits PhiSource for ternary/logical-or/logical-and phis - Phi elimination converts PhiSource→Copy with proper sequentialization - Inliner clone_instruction handles PhiSource remapping - DCE excludes PhiSource back-pointer from use tracking - Regalloc removes dead phi interval extension code (PhiSource handles it) - Remove decl_block dominance filter from SSA phi insertion - Fix Function::dominates warning (#[cfg(test)]) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Bug P — integer argument sign-extension: when passing a 32-bit int literal (e.g. -1) to a 64-bit long parameter, the linearizer emitted a widening conversion but kept the original 32-bit type in arg_types_vec, causing codegen to emit movl (zero-extending) instead of movq. Fix: update arg_type to formal parameter type after widening. Bug Q — function pointer dereference as no-op: *func_ptr must be a no-op in C (6.5.3.2). Two fixes: (1) linearizer Deref handler now returns src for TypeKind::Function, (2) parser Deref type computation keeps function type instead of computing base_type (return type). -O flag handling: pcc kept first -O flag, discarding subsequent ones. GCC convention is last wins. CPython passes -O3 then -O0; pcc was compiling at -O3. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
In `cond ? long_count : -1`, the int literal -1 was not promoted to the 64-bit result type. pcc generated `movl $1; negl` (32-bit) producing 0x00000000FFFFFFFF instead of 0xFFFFFFFFFFFFFFFF. This broke CPython's string search (fastsearch default_find returns `mode == 2 ? count : -1`) causing str.find() to return 4294967295 instead of -1 for multi-char not-found, which broke Python's regex module and blocked the deepfreeze build step. Fix: insert emit_convert() for narrower operands in both the Select (pure/cmov) and control-flow (impure/phi) paths of Conditional expression linearization. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The cbr instruction never captured the condition value's size, so emit_cbr defaulted to testl (32-bit test) via insn.size.max(32). When the condition is a 64-bit AND result (e.g., value & 0x8080...80), and only the upper 32 bits are non-zero, testl sees zero and skips the branch. This broke CPython's UTF-8 decoder (ascii_decode fast path) which checks 8-byte chunks via `if (value & ASCII_CHAR_MASK)`. Non-ASCII bytes in the upper half of a chunk were missed, causing the decoder to advance past multi-byte UTF-8 lead bytes. Fix: when insn.size is 0 (unset), use 64-bit testq instead of 32-bit. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
R1: ternary int-to-long promotion (str.find/regex) R2: 64-bit cbr conditional (UTF-8 decoder) R3: finalization crash in _PyArg_Fini (still open) R4: deepfreeze struct init produces wrong opcodes (workaround: disable) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Extended store-widening: ALL 32-bit stores at offset 0 to local variables are now widened to 64-bit, except for struct/union types where the first field must keep exact size. Previously, only scalar locals with type <= 32 bits were widened. This missed cases where a 32-bit intermediate (e.g., from int-to- pointer conversion) was stored into a 64-bit local (long, pointer), truncating the upper 32 bits. This caused the CPython finalization crash: signal handler PyObject* pointers were stored as 32-bit, losing the upper address bits. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
CPython 3.12.9 compiled entirely by pcc at -O0: - Compiles, links, starts, runs Python code correctly - Imports standard library modules (os, json partial, re) - Crashes during Py_FinalizeEx in _PySignal_Fini (pointer truncation) - Blocks make test due to exit code 139 on every process Bugs fixed this session: P (arg sign-ext), Q (funcptr deref), R1 (ternary promotion), R2 (64-bit cbr), R3 partial (store-widening), -O flag handling. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Update TODO.md status tables for atomics, alignment, TLS, and other C11 features to reflect current implementation state. Update BUILTIN.md to document all implemented builtins (memory, FP math, FP constants, stack introspection, C11 atomics). Trim README.md not-yet-implemented list to match reality. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Wire __attribute__((aligned(N))) into the explicit_align infrastructure alongside _Alignas. Add stdalign.h and __has_feature(c_alignas). Parser: - AttributeList::get_alignment() extracts aligned/\__aligned__ attrs - skip_extensions() now wires aligned attrs into pending_alignas - Struct-level aligned attr tracked across all 4 attribute positions - Typedef alignment: add explicit_align field to Type, apply at all typedef binding sites, skip type deduplication for aligned types - validated_explicit_align() propagates typedef alignment to variables - pending_alignas cleared at parse_external_decl entry to prevent leakage Type system: - Type::explicit_align overrides natural alignment in TypeTable::alignment() - Type constructors cleaned up to use ..Default::default() x86-64 backend: - Pad callee_saved_offset to multiple of 16 for correct local alignment - Track max_local_align in regalloc; round stack_size to max alignment - Dynamic stack alignment (>16): emit andq $-N,%rsp; address locals via RSP-relative offsets. Refactor all 24 callee_saved_offset sites to use stack_mem() helper for correct RSP/RBP switching. aarch64 backend: - Track max_local_align in regalloc; over-allocate frame for >16 - Compute aligned base register (x19) in prologue for over-aligned locals - Wire stack_base_reg()/stack_mem()/stack_mem_plus() into all 16 codegen sites that access local variables Tests: - 20-section integration mega-test: globals, locals, struct members, struct tags, typedef propagation, stdalign.h, callee-saved pressure, dynamic alignment (32/64), combined _Alignas+attr, typedef-as-member, non-power-of-2 ignored, trailing struct attr - Optimized (-O1) integration test - cfg-guarded aarch64 over-aligned locals test (x19 path) - 13 parser unit tests covering all alignment mechanisms Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The stack_offset() formula for use_aligned_base mode was wrong: stack_alloc_size + offset produced offsets that didn't preserve alignment (e.g., x19+31 for a 32-byte aligned local, which is NOT 32-aligned). Fix: compute base_rounded = stack_alloc_size - (max_align - 1), then use base_rounded + offset. Since base_rounded is a multiple of max_align (from regalloc rounding) and regalloc aligns each local's position, the result preserves alignment. Also fix regalloc stack_size() to round the base to max_align (not just 16) before adding over-allocation padding, ensuring base_rounded is correctly aligned. Before: str w9, [x19, #31] (31 is NOT 32-aligned) After: str w9, [x19] (0 is 32-aligned) Verified: all three CI-failing aarch64 tests now generate valid assembly that passes aarch64-linux-gnu-as. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Multi-char constants in #if: pack all chars big-endian (GCC-compatible) instead of using only the first character - #line directive: macro-expand tokens before parsing (C99 6.10.4) - #pragma STDC: recognize FP_CONTRACT/FENV_ACCESS/CX_LIMITED_RANGE with ON/OFF/DEFAULT arguments; warn on invalid combinations - Stringification (#): new stringify_arg() escapes " and \ in string/char literals per C99 6.10.3.2p2, replacing tokens_to_text at stringify sites - _Pragma: document C99 6.10.9p1 destringification requirement; pragmas are no-ops so the result is correctly discarded - Lexer: add C99 6.4.6 digraph support (<: :> <% %> %: %:%:) - #line directive: implement line_offset/line_file_override for __LINE__ and __FILE__ builtin macros Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Four root causes fixed:
1. NaN comparison: aarch64 fcmp uses Slt/Sle condition codes which
return true for NaN (N≠V when V=1). Changed to Ult/Ule (lo/ls)
which are NaN-safe (C=1 for NaN → false).
2. FP ternary select: emit_select only handled integers via CSEL.
Added emit_select_fp using BCond/B branches to load FP values,
mirroring x86_64's branch-based approach.
3. HFA struct passing: {double,double} structs classified as
ArgClass::Hfa on aarch64 weren't recognized by is_two_sse checks.
Extended linearizer (2 locations), store_args_to_stack prologue,
and setup_register_args to handle HFA count=2 alongside complex.
4. macOS compatibility:
- Added _Nonnull/_Nullable qualifier parsing (consumed, no effect)
- Added __builtin_object_size (returns (size_t)-1) and 13 fortified
__builtin___*_chk builtins (strip prefix, call libc __*_chk)
- Made inline asm test arch-portable with #[cfg(target_arch)]
- Added StringTable::lookup() for immutable name resolution
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- HFA-2 struct return: emit_ret now detects Hfa{count:2} returns via
ABI classification. Two-source path moves GP values to V0/V1 via
FmovFromGp; single-source path loads from struct address. Fixes
codegen_two_sse_struct_abi exit 1.
- Nullability qualifiers: added _Nonnull/_Nullable/_Null_unspecified
to 3 inline pointer-qualifier parsers in parser.rs (parse_declarator
and parse_function_def) that were separate from consume_type_qualifiers.
Fixes codegen_ternary_fptr_return_type parse error on macOS stdlib.h.
- New test: codegen_nullability_qualifiers — portable integration test
covering nullability on pointer declarators, function params, and
function pointer typedefs.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Nullability qualifiers: extract single-source-of-truth function is_nullability_qualifier() in parse/mod.rs. All 5 call sites now reference it. Also handle qualifiers in skip_extensions() so they're consumed after function declarator param lists (fixes stdlib.h parse error at line 540). - Inliner: clone_instruction now remaps pseudos inside asm_data (outputs[].pseudo and inputs[].pseudo). Previously inlined inline asm kept stale callee pseudo IDs, causing wrong register allocation and 32-bit truncation on aarch64. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Added skip_extensions() before expect_special(b';') at two more declaration paths in parse_external_decl (non-function declarations and declarations with initializers). These paths lacked attribute/ nullability consumption, causing "expected ';'" on macOS headers. Also improved expect_special error diagnostic to show the actual token found, aiding future debugging of header parse failures. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds codegen_ternary_fptr_diag test that preprocesses #include <stdlib.h> with -E and prints lines around 540 plus any lines containing __v. Always passes — diagnostic output will appear in CI stderr to identify the exact macOS header construct causing "expected ';', found '__v'". Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Previous diag test used eprintln which CI swallowed. Now builds a diagnostic string and panics with it on macOS, ensuring the output appears in CI test failure logs. Shows lines 530-555 of preprocessed stdlib.h plus all lines containing __v. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add complete software-emulated 128-bit integer support across all
compiler layers. Int128 values live on the stack (16-byte aligned,
never in GP registers); operations load lo/hi 64-bit halves into
scratch registers, operate, and store back.
Type system: ExprKind::Int128Lit(i128), PseudoKind::Val128(i128),
Initializer::Int128(i128), Loc::Imm128(i128) on both backends.
ABI: Int128 classified as Direct{[Integer, Integer], 128} on both
SysV AMD64 and AAPCS64 (two GP registers for param/return).
Runtime library: int128_divmod() for __divti3/__modti3/__udivti3/
__umodti3, int128_convert() for 20 float<->int128 conversion
functions. Division/modulo and float conversions dispatch to rtlib
calls from the linearizer.
x86-64 backend: add/adc, sub/sbb, mulq+imulq cross products,
shld/shrd shifts (constant and variable with >=64 branching),
eq/ne via xor+or, ordered comparisons via hi-first branching,
neg (not+add+adc), not, zext/sext/trunc. New LIR instructions:
Adc, Sbb, Mul1, Shld, Shrd. Register allocator detects int128
pseudos by TypeKind (excluding comparison results and Load
addresses) and forces 16-byte aligned stack slots. Mul constraint
registers RAX/RDX clobber.
aarch64 backend: adds/adc, subs/sbc, mul/umulh/madd, branch-based
ordered comparisons (replacing incorrect ccmp approach), eq/ne via
eor+orr, negs/ngc, mvn, shifts with zero-amount short-circuit.
New LIR instructions: Adds, Adc, Subs, Sbc, Umulh, MAdd, Negs,
Ngc. CBR handles 128-bit stack values by ORing both halves.
Constant evaluation: eval_const_expr returns Option<i128> (was i64),
enabling full-precision compile-time arithmetic for __int128 types.
Parser folds (__int128)constant → Int128Lit at the AST level.
InstCombine: full 128-bit support via FoldToConst128(i128) variant,
const_val128(), and create_const128_pseudo(). All algebraic
identities (x-x, x*0, x+0, x*1, etc.) and constant folding work
correctly for 128-bit values.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
BlockItem::Statement(Stmt) was 256 bytes vs Declaration at 24 bytes. Box the Stmt to reduce enum size from 256 to 32 bytes. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The aarch64 regalloc hardcoded alignment=8 for all local variables, violating __int128's natural 16-byte alignment. LDP/STP on macOS aarch64 enforces strict alignment, causing SIGBUS crashes. Fix: use types.alignment() for Sym locals (returns 16 for Int128), and force int128 pseudos to 16-byte aligned stack slots before the main linear scan (matching the x86-64 pattern). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Four bugs found by auditing cc/arch/ for ignored type system attributes: 1. x86-64 Sym alignment: replace size>=16?16:8 heuristic with types.alignment(typ), matching the aarch64 fix. Correctly handles __attribute__((aligned(N))) on typedefs. 2. store_args_to_stack: __int128 params use two GP registers per ABI but were treated as single-register. Now stores both halves and increments int_arg_idx by 2. Fixed in both x86-64 and aarch64. 3. emit_ret: __int128 return values now load both halves into RAX+RDX (x86-64). The linearizer didn't set two_reg_return for Int128, so only lo half was returned. 4. aarch64 multi-reg return stack allocation: replaced open-coded stack_offset manipulation with alloc_stack_slot using types.alignment(typ) and types.size_bits(typ). Tests: codegen_int128_param_return and _optimized cover int128 function params (including mixed with regular args) and return values. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Three aarch64-specific fixes for macOS CI failures: 1. Variable shift amount: load_int128() was called for the shift amount (a regular int), causing LDP from an 8-byte slot. Now uses emit_move() for 64-bit load of shift operand. 2. allocate_arguments: Int128 params consume two GP registers per AAPCS64 but were assigned a single register. Now allocates a 16-byte aligned stack slot and reserves both GP registers. 3. emit_ret: Int128 return values need lo→X0, hi→X1 via LDP. Added Int128 branch before the generic integer return path. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When a compile_and_run test fails, re-compile with -S and print the generated assembly to stderr. This makes aarch64-only failures debuggable from CI logs without needing access to the runner. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Comparison results (cset → 0 or 1) were stored as 32-bit (str w__) but CBR loads them as 64-bit (ldr x__). The upper 32 bits contained stack garbage, causing wrong branch decisions when reused stack slots had non-zero upper halves. Fix: store comparison results as 64-bit. cset already zero-extends into the full 64-bit register, so str x__ is correct and matches what CBR expects. This was a pre-existing bug exposed by int128 tests which generate more stack-spilled comparison results. Affects both regular and int128 comparison paths. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
setup_register_args in call.rs treated __int128 args as single-register, loading only the lo half and advancing int_arg_idx by 1. This shifted all subsequent arguments to wrong registers. Fix: detect Int128 arg type, load both halves via LDP into two consecutive GP registers, advance int_arg_idx by 2. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
store_args_to_stack stored X1:X2 to the local variable's stack slot, but the IR then generates a Copy from the arg pseudo (at a different slot) to the local, overwriting the correct value with uninitialized data. Fix: store to the arg pseudo's allocated location so the IR's Copy correctly transfers the value. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Fix four int128 codegen bugs and add comprehensive test coverage: Bug 1: Ternary ?: with int128 operands used Select instruction which caps at 64 bits. Add size<=64 guard to force branch+phi path. Bug 2: x86_64 stack-spilled int128 call args pushed only 8 bytes instead of 16. Add int128 branch in push_stack_args, fix stack qword counting in classify_call_args, and copy incoming stack int128 params to local slots in the callee prologue (store_args_to_stack). Bug 3: aarch64 stack-spilled int128 call args allocated 8 bytes instead of 16. Fix allocation calculation and add LDP+STR pair for 16-byte stack arg stores. Bug 4: Global int128 loads on x86_64 double-dereferenced the GOT address (treating the int128 value as a pointer). Add Loc::Global branch in emit_load that loads the address without dereferencing. Add 11 new tests: ternary, many_args (stack spill), divmod, globals, ptr_deref, compound_assign, shift_boundaries, float_convert, struct_array, inc_dec, optimized_mega. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
emit_struct_store used ADD/LEA to compute the destination address from a Stack location, always treating it as a direct local variable address. When the destination is a spilled pointer (e.g., `*p = val` where p is on the stack), the code must load the pointer value from the slot first, not take the address of the slot itself. Add is_symbol check (matching compute_mem_addr) to distinguish local variables from spilled pointers. For spilled pointers, emit LDR/MOV to load the pointer value before using it as the store destination. Fixes codegen_int128_ptr_deref and codegen_int128_struct_array on aarch64-apple-darwin. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Ari4ka
approved these changes
Mar 23, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.