Skip to content

CC cleanups and optimization#570

Merged
jgarzik merged 18 commits intomainfrom
updates
Mar 23, 2026
Merged

CC cleanups and optimization#570
jgarzik merged 18 commits intomainfrom
updates

Conversation

@jgarzik
Copy link
Copy Markdown
Contributor

@jgarzik jgarzik commented Mar 23, 2026

No description provided.

jgarzik and others added 13 commits March 22, 2026 22:57
Add cc/ir/hwmap.rs, a new pass running after linearize and before
optimize. It inspects each IR instruction and decides whether the
target handles it natively (Legal), via a runtime library call
(LibCall), or via a comparison-expansion pattern (CmpLibCall).

Phase 1: infrastructure — HwMapAction enum, TargetHwMap trait,
X86_64HwMap/Aarch64HwMap, wired into pipeline.

Phase 2a: int128 div/mod → LibCall (__divti3 etc.) on all targets.
Phase 2b: int128↔float conversions → LibCall (__floattisf etc.).
Phase 2c: long double ops on aarch64/Linux → LibCall/CmpLibCall
  (__addtf3, __negtf2, __lttf2+SetLt, __extendsftf2, etc.).
  x86_64 and macOS aarch64 remain Legal (native x87 / ld==double).

Removes ~350 lines of rtlib decision logic from the linearizer and
~380 lines of dead RtlibNames methods + tests from rtlib.rs.
30 unit tests in hwmap.rs; all 159 integration tests pass.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
--dump-ir now accepts an optional stage name:
  post-linearize, post-hwmap, post-opt (default), post-lower, all
Bare --dump-ir remains backward compatible (dumps post-opt).

--dump-ir-func=<name> filters output to a single function.

Also add codegen_float16_mega integration test covering arithmetic,
negation, comparisons, float/double conversions, and compound
assignment for _Float16 on x86-64.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
On x86-64, Float16 arithmetic/negation/comparisons have no native
hardware support. Previously the linearizer intercepted these and
emitted a promote-operate-truncate pattern inline. Now the linearizer
emits generic FAdd/FSub/FMul/FDiv/FNeg/FCmpO* with Float16 type, and
the hwmap pass detects these on x86-64 and expands them:

  - Arithmetic: __extendhfsf2(left), __extendhfsf2(right), native
    float op, __truncsfhf2(result)
  - Negation: __extendhfsf2(src), native FNeg, __truncsfhf2(result)
  - Comparisons: __extendhfsf2(left), __extendhfsf2(right), native
    float compare (no truncate — result is int)

AArch64 has native FP16 and remains Legal.

Removes ~230 lines from linearizer (5 helper methods + 3 intercepts).
Removes dead RtlibNames::float16_needs_softfloat().

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
complex_mul_name() and complex_div_name() centralize the
target-dependent function name selection (__mulsc3/__muldc3/__mulxc3/
__multc3 etc.) in hwmap.rs. The linearizer calls these instead of
RtlibNames methods.

Removes complex_mul(), complex_div(), longdouble_is_double() from
RtlibNames and their tests. Adds 6 unit tests in hwmap.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
New opcodes: Lo64, Hi64, Pair64, AddC, AdcC, SubC, SbcC, UMulHi.
These enable shared int128 decomposition in hwmap.rs instead of
duplicated per-backend code.

hwmap now expands int128 operations into 64-bit sequences:
  - Bitwise (And/Or/Xor): Lo64+Hi64 each operand, 64-bit op, Pair64
  - Neg: SubC(0,lo), SbcC(0,hi,carry), Pair64
  - Not: Lo64+Hi64, Not each, Pair64
  - Add: AddC(lo,lo), AdcC(hi,hi,carry), Pair64
  - Sub: SubC(lo,lo), SbcC(hi,hi,borrow), Pair64
  - Mul: cross-product via Mul+UMulHi+Add
  - Eq/Ne: xor+or reduction, then 64-bit compare
  - Ordered comparisons: hi compare + Select(hi_eq, lo_cmp, hi_cmp)
  - Zext/Sext: Pair64 with zero/sign-extended halves

Backend support for all 8 opcodes on x86_64 and aarch64.
Fix x86_64 int128_pseudos to not mark Lo64/Hi64/Pair64 operands
incorrectly (caused Sext misrouting to emit_int128_extend).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Conversion opcodes (Sext, Zext, Trunc, FCvtS, FCvtU, SCvtF, UCvtF,
FCvtF) now display as e.g. sext.32to64 instead of sext.64, making
the source width visible in --dump-ir output.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Now that hwmap expands int128 add/sub/mul/bitwise/neg/not/comparisons/
zext/sext into 64-bit sequences using Lo64/Hi64/Pair64/AddC/etc.,
the per-backend implementations are dead code.

x86_64: removed emit_int128_mul (~120 lines), emit_int128_div (~50),
  emit_int128_compare (~145), emit_int128_unary (~55), Zext/Sext-to-128
  in emit_int128_extend (~75), Add/Sub/And/Or/Xor in emit_int128_binop
  (~100), int128_src2_lo/hi_operand helpers (~50). Total: ~645 lines.

aarch64: removed Add/Sub/And/Or/Xor/Mul from emit_int128_binop (~140),
  emit_int128_div (~50), emit_int128_compare (~120), emit_int128_unary
  (~50), Zext/Sext-to-128 in emit_extend (~75). Removed dead LIR
  variants MAdd/Negs/Ngc (~60). Total: ~550 lines.

Only int128 shifts (Shl/Lsr/Asr) remain in the backends — these
require arch-specific branching (SHLD/SHRD vs LSL+LSR+ORR).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Fixes from self-review:

1. expand_insn: explicit dispatch instead of fallthrough to
   expand_float16. Panics on unhandled Expand with context.

2. map_int128_expand: comparison arms now verify insn.typ is
   Int128, preventing misrouting of 128-bit long double comparisons
   if map_common ordering ever changes.

3. Remove 7 duplicate func.add_pseudo calls after alloc_reg64
   (which already registers the pseudo).

4. Factor out extract_halves() helper — replaces ~120 lines of
   repeated Lo64+Hi64 pairs across 8 expansion arms.

5. Simplify Float16 comparison detection — remove redundant inner
   matches!() and dead `let _ = typ`.

6. Validate --dump-ir stage names early with helpful error message.

7. Add codegen_int128_carry_chain_optimized integration test that
   exercises AddC→AdcC carry propagation, SubC→SbcC borrow, and
   cross-boundary multiply under -O2. Confirms optimizer does not
   break the flag-chain invariant.

8. Add explanatory comments on int128_load_lo/hi Loc::Reg handling.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Move arch-specific instruction mapping logic from the monolithic
cc/ir/hwmap.rs into cc/arch/{x86_64,aarch64}/mapping.rs, with shared
trait and helpers in cc/arch/mapping.rs. This follows the existing
cc/arch/codegen.rs pattern of shared trait + per-arch implementations.

Key design changes:
- Replace HwMapAction enum + shared dispatcher with ArchMapper trait
  where arch code directly builds replacement IR via MappedInsn::Replace
- Split expand_int128 monolith into 10 individual functions
- Split expand_float16 into 3 individual functions
- Float16 soft-float handling stays in x86_64 mapper only
- Long double rtlib handling stays in aarch64 mapper only
- Shared test helpers extracted into test_helpers module

Pure refactoring — zero behavioral changes. All 161 tests pass.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add Function::create_reg_pseudo() and Instruction::call_with_abi() as
canonical IR primitives for passes that synthesize instructions. These
replace duplicated patterns across mapping.rs (alloc_reg64), lower.rs
(open-coded pseudo alloc), and the 6+ copies of the ABI-classify-and-
build-call sequence (build_rtlib_call, build_rtlib_call_explicit,
RtlibCallParams — all deleted).

Migrate int128 constant shifts (Shl/Lsr/Asr) from backend codegen to
the mapping pass as expand_int128_const_shl/lsr/asr. Variable shifts
remain in backends (require arch-specific branching).

Migrate Float16 conversions from linearizer (emit_float16_convert_call)
to the x86_64 mapping pass. The linearizer now emits standard FCvtF/
FCvtS/FCvtU/SCvtF/UCvtF ops; the mapper lowers Float16 variants to
rtlib calls with proper compiler-rt vs libgcc ABI dispatch.

Fix uint128 large constant sign-extension: (*v >> 64) as i64 sign-
extends for values like 0xFFFFFFFFFFFFFFFF; change to as u64 as i64
in three x86_64 locations.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The mapping pass now expands all int128 binops except variable shifts
(Shl/Lsr/Asr). Replace the silent `_ => {}` catch-all in
emit_int128_binop with a panic so any unexpanded opcode reaching the
backend is caught immediately instead of silently dropped.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…p table

Add QUOTE and COMMENT flags to the character classification, build the
table via const fn at compile time, and use pre-computed class bits in
get_special() to dispatch string literals and comments. Includes unit
tests covering every byte category.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Pre-intern ~240 well-known strings (C keywords, builtins, attribute names,
preprocessor directives) at StringTable creation time. Each gets a
deterministic StringId and a u32 tag bitmask, replacing string comparisons
with integer comparisons in all hot parser/preprocessor paths.

New file cc/kw.rs provides: define_keywords! macro, 14 tag bit constants,
DECL_START composite, has_tag()/tags() query API, and 11 unit tests.

Converted dispatch sites: is_declaration_start (43 arms), parse_type_specifiers
(35 arms), parse_statement (12 arms), try_parse_type_name (25 arms),
consume_type_qualifiers (5 arms), builtin dispatch (60 arms),
handle_directive (14 arms), is_type_keyword, is_nullability_qualifier,
is_attribute_keyword, is_asm_keyword, is_static_assert, is_builtin,
is_supported_attribute, sizeof/alignof, pointer/array qualifier parsing
(4 sites), and parse_asm_statement qualifiers.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@jgarzik jgarzik requested a review from Copilot March 23, 2026 14:12
@jgarzik jgarzik self-assigned this Mar 23, 2026
@jgarzik jgarzik added the enhancement New feature or request label Mar 23, 2026
@jgarzik jgarzik changed the title Updates CC cleanups and optimization Mar 23, 2026
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors the cc (pcc) compiler to replace many string-based keyword/builtin/directive checks with deterministic StringId/tag lookups, and centralizes target-specific lowering decisions into a new “mapping” pass that expands complex IR ops into simpler sequences or rtlib calls.

Changes:

  • Add a pre-interned keyword system (cc/kw.rs) with tag-based membership checks, and update parser/preprocessor/builtin checks to use StringId comparisons instead of &str.
  • Introduce an architecture mapping pass (shared + per-arch mappers) to lower int128 ops, Float16 soft-float on x86-64, and long double rtlib calls on aarch64/Linux.
  • Enhance developer tooling and tests: staged --dump-ir, lexer char-classification table + tests, and expanded codegen/IR tests for Float16 and int128 behaviors.

Reviewed changes

Copilot reviewed 29 out of 29 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
cc/token/preprocess.rs Switch directive dispatch and __has_* evaluation to StringId/tag-based checks.
cc/token/lexer.rs Replace match-based char classification with a compile-time lookup table; update tokenization and add unit tests.
cc/tests/codegen/misc.rs Add large codegen regression tests for Float16 and int128 carry/shift/constant behaviors.
cc/strings.rs Pre-intern all keyword strings at startup to ensure deterministic StringId assignments.
cc/rtlib.rs Remove complex/longdouble/int128 rtlib name selection helpers and their tests (logic moved elsewhere).
cc/parse/parser.rs Replace many keyword string comparisons with crate::kw StringId constants/tags.
cc/parse/mod.rs Change nullability qualifier predicate to operate on StringId via tags.
cc/main.rs Add kw module, recursion limit, staged --dump-ir with optional function filter, and integrate mapping pass in pipeline.
cc/lib.rs Export kw module and set recursion limit for macro-heavy keyword generation.
cc/kw.rs New keyword/tag system: deterministic IDs + bitmask tags + tests.
cc/ir/test_linearize.rs Update Float16 conversion tests to expect IR conversion opcodes (mapping pass performs rtlib lowering).
cc/ir/mod.rs Add new opcodes for int128 decomposition, add Instruction::call_with_abi, improve instruction display for conversions, add Function::create_reg_pseudo.
cc/ir/lower.rs Use Function::create_reg_pseudo helper when creating temporaries.
cc/ir/linearize.rs Remove direct rtlib emission/Float16 soft-float helpers and rely on mapping pass; route complex rtlib name selection through mapping utilities.
cc/builtins.rs Add is_builtin_id fast-path using keyword tags.
cc/arch/x86_64/regalloc.rs Adjust int128 pseudo tracking for new decomposition opcodes.
cc/arch/x86_64/mod.rs Expose x86-64 mapping module.
cc/arch/x86_64/mapping.rs New x86-64 mapper: expand int128 ops, map int128 div/mod + int128↔float, and expand Float16 soft-float sequences.
cc/arch/x86_64/expression.rs Remove backend-handled int128 ops now expanded by mapping pass; add support for new decomposition ops and fix hi-half constant handling.
cc/arch/x86_64/codegen.rs Dispatch new decomposition opcodes (Lo64/Hi64/Pair64/AddC/AdcC/SubC/SbcC/UMulHi).
cc/arch/mod.rs Export the new shared mapping module.
cc/arch/codegen.rs Fix hi-half emission of 128-bit immediates to avoid sign-extension artifacts.
cc/arch/aarch64/mod.rs Expose aarch64 mapping module.
cc/arch/aarch64/mapping.rs New aarch64 mapper: expand int128 ops, map int128 div/mod + int128↔float, and lower long double ops via rtlib on Linux.
cc/arch/aarch64/lir.rs Remove AArch64 LIR ops that were used by backend int128 handling (now mapping-expanded).
cc/arch/aarch64/expression.rs Limit backend int128 handling to shifts + truncation-from-128; add support for new decomposition ops.
cc/arch/aarch64/codegen.rs Dispatch new decomposition opcodes (Lo64/Hi64/Pair64/AddC/AdcC/SubC/SbcC/UMulHi).

jgarzik and others added 5 commits March 23, 2026 14:23
Use `_` as the name for keyword entries that are only needed for interning
and tag-based membership (nullability qualifiers, attribute names, fortified
builtins, etc.). The define_ids! macro skips pub const emission for `_`
entries, eliminating 71 dead_code warnings without #[allow(dead_code)].

Also fix redundant_closure clippy lint in is_nullability_qualifier call,
and remove unused tags() function.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Keyword ID determinism is a correctness requirement — wrong IDs cause
silent misparsing in release builds. Use unconditional assert_eq!.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Extract first_tok early and return false if args[0] is empty, avoiding
unwrap() panic on __has_builtin()/__has_attribute() with no tokens.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…sync

Cross-checks that every entry in the SUPPORTED_BUILTINS string list is
pre-interned in kw.rs with the BUILTIN tag, preventing the two sources
from silently diverging.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
emit_float_to_float() only recognized Float↔Double conversions in its
needs_convert check, causing Float16↔Float/Double conversions to emit
fmov (bit-copy) instead of fcvt (type conversion). The float16 bit
pattern was zero-extended to 32 bits rather than properly converted,
producing wrong values for all float16 arithmetic on aarch64.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@jgarzik jgarzik merged commit aecc388 into main Mar 23, 2026
6 checks passed
@jgarzik jgarzik deleted the updates branch March 23, 2026 14:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants