Skip to content

Conversation

RobinMalfait
Copy link
Member

@RobinMalfait RobinMalfait commented Oct 6, 2025

This PR introduces a new canonicalizeCandidates function on the internal Design System.

The big motivation to moving this to the core tailwindcss package is that we can use this in various places:

  • The Raycast extension
  • The VS Code extension / language server
  • 3rd party tools that use the Tailwind CSS design system APIs

This PR looks very big, but I think it's best to go over the changes commit by commit. Basically all of these steps already existed in the upgrade tool, but are now moved to our core tailwindcss package.

Here is a list of all the changes:

  • Added a new canonicalizeCandidates function to the design system
  • Moved various migration steps to the core package. I inlined them in the same file and because of that I noticed a specific pattern (more on this later).
  • Moved printCandidate tests to the tailwindcss package
  • Setup tests for canonicalizeCandidates based on the existing tests in the upgrade tool.

I noticed that all the migrations followed a specific pattern:

  1. Parse the raw candidate into a Candidate[] AST
  2. In a loop, try to migrate the Candidate to a new Candidate (this often handled both the Candidate and its Variant[])
  3. If something changed, print the new Candidate back to a string, and pass it to the next migration step.

While this makes sense in isolation, we are doing a lot of repeated work by parsing, modifying, and printing the candidate multiple times. This let me to introduce the big refactor commit. This changes the steps to:

  1. Up front, parse the raw candidate into a Candidate[] once.
  2. Strip the variants and the important marker from the candidate. This means that each migration step only has to deal with the base utility and not care about the variants or the important marker. We can re-attach these afterwards.
  3. Instead of a rawCandidate: string, each migration step receives an actual Candidate object (or a Variant object).
  4. I also split up the migration steps for the Candidate and the Variant[].

All of this means that there is a lot less work that needs to be done. We can also cache results between migrations. So [@media_print]:flex and [@media_print]:block will result in print:flex and print:block respectively, but the [@media_print] part is only migrated once across both candidates.

One migration step relied on the postcss-selector-parser package to parse selectors and attribute selectors. I didn't want to introduce a package just for this, so instead used our own SelectorParser in the migration and wrote a small AttributeSelectorParser that can parse the attribute selector into a little data structure we can work with instead.

If we want, we can split this PR up into smaller pieces, but since the biggest chunk is moving existing code around, I think it's fairly doable to review as long as you go commit by commit.


With this new API, we can turn:

[
  'bg-red-500',
  'hover:bg-red-500',
  '[@media_print]:bg-red-500',
  'hover:[@media_print]:bg-red-500',
  'bg-red-500/100',
  'hover:bg-red-500/100',
  '[@media_print]:bg-red-500/100',
  'hover:[@media_print]:bg-red-500/100',
  'bg-[var(--color-red-500)]',
  'hover:bg-[var(--color-red-500)]',
  '[@media_print]:bg-[var(--color-red-500)]',
  'hover:[@media_print]:bg-[var(--color-red-500)]',
  'bg-[var(--color-red-500)]/100',
  'hover:bg-[var(--color-red-500)]/100',
  '[@media_print]:bg-[var(--color-red-500)]/100',
  'hover:[@media_print]:bg-[var(--color-red-500)]/100',
  'bg-(--color-red-500)',
  'hover:bg-(--color-red-500)',
  '[@media_print]:bg-(--color-red-500)',
  'hover:[@media_print]:bg-(--color-red-500)',
  'bg-(--color-red-500)/100',
  'hover:bg-(--color-red-500)/100',
  '[@media_print]:bg-(--color-red-500)/100',
  'hover:[@media_print]:bg-(--color-red-500)/100',
  'bg-[color:var(--color-red-500)]',
  'hover:bg-[color:var(--color-red-500)]',
  '[@media_print]:bg-[color:var(--color-red-500)]',
  'hover:[@media_print]:bg-[color:var(--color-red-500)]',
  'bg-[color:var(--color-red-500)]/100',
  'hover:bg-[color:var(--color-red-500)]/100',
  '[@media_print]:bg-[color:var(--color-red-500)]/100',
  'hover:[@media_print]:bg-[color:var(--color-red-500)]/100',
  'bg-(color:--color-red-500)',
  'hover:bg-(color:--color-red-500)',
  '[@media_print]:bg-(color:--color-red-500)',
  'hover:[@media_print]:bg-(color:--color-red-500)',
  'bg-(color:--color-red-500)/100',
  'hover:bg-(color:--color-red-500)/100',
  '[@media_print]:bg-(color:--color-red-500)/100',
  'hover:[@media_print]:bg-(color:--color-red-500)/100',
  '[background-color:var(--color-red-500)]',
  'hover:[background-color:var(--color-red-500)]',
  '[@media_print]:[background-color:var(--color-red-500)]',
  'hover:[@media_print]:[background-color:var(--color-red-500)]',
  '[background-color:var(--color-red-500)]/100',
  'hover:[background-color:var(--color-red-500)]/100',
  '[@media_print]:[background-color:var(--color-red-500)]/100',
  'hover:[@media_print]:[background-color:var(--color-red-500)]/100'
]

Into their canonicalized form:

[
  'bg-red-500',
  'hover:bg-red-500',
  'print:bg-red-500',
  'hover:print:bg-red-500'
]

The list is also unique, so we won't end up with bg-red-500 bg-red-500 twice.

While the canonicalization itself is fairly fast, we still pay a ~1s startup cost for some migrations (once, and cached for the entire lifetime of the design system). I would like to keep improving the performance and the kinds of migrations we do, but I think this is a good start.

The cost we pay is for:

  1. Generating a full list of all possible utilities based on the getClassList suggestions API.
  2. Generating a full list of all possible variants.

The canonicalization step for this list takes ~2.9ms on my machine.

Just for fun, if you use the getClassList API for intellisense that generates all the suggestions and their modifiers, you get a list of 263788 classes. If you canonicalize all of these, it takes ~500ms in total. So roughly ~1.9μs per candidate.

This new API doesn't result in a performance difference for normal Tailwind CSS builds.

The other potential concern is file size of the package. The generated tailwindcss.tgz file changed like this:

- 684652 bytes (684.65 kB)
+ 749169 bytes (749.17 kB)

So the package increased by ~65 kB which I don't think is the end of the world, but it is important for the @tailwindcss/browser build which we don't want to grow unnecessarily. For this reason we remove some of the code for the design system conditionally such that you don't pay this cost in an environment where you will never need this API.

The @tailwindcss/browser build looks like this:

`dist/index.global.js 255.14 KB` (main)
`dist/index.global.js 272.61 KB` (before this change)
`dist/index.global.js 252.83 KB` (after this change, even smaller than on `main`)

@RobinMalfait RobinMalfait force-pushed the feat/canonicalize-candidates branch from 94e1b8e to 6a59311 Compare October 6, 2025 12:46
@philipp-spiess
Copy link
Member

I feel like this would better be present in another package or a separate and completely isolated file, not the core one. Specifically since core is part of the browser build which is pretty sensitive to code size and this is moving a bunch of code into it now. Have you thought about this?

This PR also adds cloneAstNode which should probably be in a separate PR with its own description.

@philipp-spiess
Copy link
Member

philipp-spiess commented Oct 6, 2025

I see that you mentioned file size, how the hell does core have more than 3MB 🤔 That seems very unexpected.

I think something with your measurement is wrong: https://bundlephobia.com/package/@tailwindcss/[email protected]

// Parse a dimension such as `64rem` into `[64, 'rem']`.
export const dimensions = new DefaultMap((input) => {
let match = /^(?<value>-?(?:\d*\.)?\d+)(?<unit>[a-z]+|%)$/i.exec(input)
let match = DIMENSION_REGEX.exec(input)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only reason I touched this is because I noticed it before I wanted to move it from the upgrade package to core.

_userConfig: Config | null,
rawCandidate: string,
): Promise<string> {
): string {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was annotating migrations to know whether it was a v3 → v4 upgrade, or if it was a v4 optimization. Noticed that this was using async which is not necessary in this case.


function themeToVar(designSystem: DesignSystem, rawCandidate: string): string {
let convert = createConverter(designSystem)
let convert = converterCache.get(designSystem)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you look at this commit, use the ?w=1 GitHub flag. The only real changes here are:

  1. Re-indent because of the wrapping DefaultMap
  2. The removal of the prettyPrint flag, that was only used in the CSS part (which still exists in the CSS version where whitespace is fine.

}

function pathToVariableName(path: string) {
function pathToVariableName(path: string, shouldPrefix = true) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was actually a bug we shipped and only noticed it because the canonicalization tests run all examples in multiple catagories:

  1. The candidate on its own
  2. The candidate with a variant in front
  3. The candidate with an important ! at the end
  4. The candidate with a tw: prefix in front

Then I noticed that this broke a test. Without this, the following candidate would be using --tw- as well:

- tw:bg-[--theme(--color-red-500/50%)]/50
+ tw:bg-[--theme(--tw-color-red-500/50%)]/50

The problem is that the --theme(…) does take the value without the --tw- flag: https://play.tailwindcss.com/GMFKMFGLFz

You could argue that this is a --theme(…) bug and that we should handle it there. But this is the behavior of --theme(…) today, so didn't want to break it.

@@ -1,13 +1,13 @@
import { substituteAtApply } from '../../../../tailwindcss/src/apply'
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was kind of a smell that all these imports in the upgrade package were using tailwindcss internals 😅

return candidate
}

candidate.root = `bg-linear-to-${direction}`
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already cloned the candidate when it came out of the design system, so we can mutate it freely without causing issues.

}
}

function baseCandidate<T extends Candidate>(candidate: T) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before running the canonicalization steps, we already drop the important flag and variants so each step is already dealing with the base candidate (utility).

Comment on lines 849 to 859
for (let readonlyCandidate of designSystem.parseCandidate(rawCandidate)) {
// The below logic makes use of mutation. Since candidates in the
// DesignSystem are cached, we can't mutate them directly.
let candidate = structuredClone(readonlyCandidate) as Writable<typeof readonlyCandidate>

// Create a basic stripped candidate without variants or important flag. We
// will re-add those later but they are irrelevant for what we are trying to
// do here (and will increase cache hits because we only have to deal with
// the base utility, nothing more).
let targetCandidate = baseCandidate(candidate)
let targetCandidateString = printUnprefixedCandidate(designSystem, targetCandidate)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar here, we already are dealing with the base candidate so we can just drop all of this entirely.


for (let candidate of parseCandidate(designSystem, rawCandidate)) {
let clone = structuredClone(candidate) as Writable<typeof candidate>
let changed = false
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We were tracking whether something changed or not, and if it did we would re-print the Candidate AST, otherwise we would just re-use the incoming rawCandidate.

Now, we are directly mutating the incoming candidate: Candidate which we will return as well, so this code can be dropped.

Comment on lines +14 to +15
const ATTRIBUTE_REGEX =
/\[\s*(?<attribute>[a-zA-Z_-][a-zA-Z0-9_-]*)\s*((?<operator>[*|~^$]?=)\s*(?<quote>['"])?\s*(?<value>.*?)\4\s*(?<sensitivity>[is])?\s*)?\]/
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one isn't even 100% correct, but it's slower already so didn't want to make it more correct 😅

@RobinMalfait RobinMalfait marked this pull request as ready for review October 6, 2025 13:23
@RobinMalfait RobinMalfait requested a review from a team as a code owner October 6, 2025 13:23
@RobinMalfait RobinMalfait marked this pull request as draft October 6, 2025 13:29
@thecrypticace
Copy link
Contributor

Yeah core isn't 3.5MB right now. It's about 680KB (which is still a lot but that's because CJS + ESM duplication).

The build process doesn't clear the dist folder so it probably got a bunch of chunks from previous builds bundled into the .tgz folder on Robin's local machine.

@RobinMalfait RobinMalfait force-pushed the feat/canonicalize-candidates branch from 6a59311 to ccbd48d Compare October 6, 2025 15:00
While going through the migrations, I noticed that this was still marked
as async even though nothing is async about this. Simplified it to a
simple sync function instead.
This is likely temporary, but needed to know which migrations were sync
or async, and which migrations are an _actual_ migration from v3 → v4 or
which ones are mere optimizations in v4 land.

The v4 optimizations will be moved to the `canonicalizeCandidates` logic
straight on the design system.
After all migrations happened, we run the canonicalization step to
perform the last bit of migrations
We will likely clean this up later. But just trying to move things
aroudn first.
If you use `canonicalizeCandidates(['bg-red-500', 'bg-red-500/100'])`
they would both canonicalize to `bg-red-500`. No need to return
duplicates here.
Note: this commit on its own _will_ have some failing tests because the
`migrate-arbitrary-utilities.test.ts` file combined some other
migrations as well.

Eventually once those are ported over, will make the tests pass.

Another thing to notice is that we updated some tests. The more we
canonicalize candidates, the more tests will need upgrading to an even
more canonical form.
This test setup already tests various cases including prefixes and
important strategies.
We were also converting the current charcode back to a string, but we
already had that information in `input[i]`
The part where we upgrade old v3 syntax like `group-[]:` still exists in
the `@tailwindcss/upgrade` package.

We were relying on `postcss-selector-parser`, but the copied part is now
using our own `SelectorParser`.

I also added a temporary attribute selector parser to make everything
pass. Don't look at the code for that _too_ much because we will swap it
out in a later commit 😅
A bunch of migrations were moved to `canonicalizeCandidates`, so we
still have to run that at the end.

Also used:
```css
@theme {
  --*: initial;
}
```

To make the design system a bit faster.
Note: we made sure that all the tests passed before this commit, and we
also made sure to not touch any of the tests as part of this commit
ensuring that everything still works as expected. Aka, an actual
refactor without behavior change.

This is a big commit and a big refactor but that's mainly because there
are some re-indents that are happening. Will maybe split up this commit
but then it won't be an atomic (working) commit anymore.

One thing I noticed is that all our "migrations" essentially parsed a
raw candidate, did some work and then stringified the candidate AST
again.

That's one of the reasons I inlined the migrations in previous commits
in a single file. Looking at the file now, it's a little bit silly that
we parse and print over and over again.

Parsing a raw candidate also results in an an array of `Candidate` AST
nodes, so we always had a loop going on.

Another thing I noticed is that often we need to work with a "base"
candidate, that's essentially the candidate without the variants and
without the important flag.

That's where this refactor comes in:

1. We parse each candidate _once_
2. When there are variants, we canonicalize each variant separately and
   store the results in a separate cache.
3. When there are variants _or_ the important flag is present, we
   canonicalize the base utility separately and later re-attach the
   variants and important flag at the end. This means that not a single
   canonicalization step has to deal with variants or important flag.
4. Since we only want to parse the candidate once on the outside, we
   will pass a cloned `Candidate` to each canonicalize step to prevent
   parsing and printing over and over again.
5. Only at the end will we re-print the `Candidate`

This process has a few benefits:

1. We are handling candidates and variants separately and caching them
   separately. This means that we can increase cache hits because
   variants from candidate A and candidate B can be re-used. E.g.:
   `[@media_print]:flex` and `[@media_print]:underline` will map to
   `print:flex` and `print:underline` where `[@media_print]` is only
   handled once.
2. This also means that we could simplify some canonicalization steps
   because they don't have to worry about variants and the important
   flag.
3. There is no parsing & looping over the parsed candidates array going
   on.
The the scenario of creating signatures for a component, we always want
to respect the important flag if it is given. Therefore I think that we
can drop the `designSystem.compileAstNodes(candidate,
CompileAstFlags.None)` which saves us some time.

Also removed the `structuredClone` call. Whenever we handle `@apply`
which essentially happened the step before, we already start from a
cloned node internally. This call just meant we were doing it again.
This implementation is also 2-3x faster than the original one.

```
AttributeSelectorParser.parse - src/attribute-selector-parser.bench.ts > parsing
  2.24x faster than REGEX.test(…)
  2.59x faster than ….match(REGEX)
  2.84x faster than parseAttributeSelector
```
@RobinMalfait RobinMalfait force-pushed the feat/canonicalize-candidates branch from ccbd48d to 226ae47 Compare October 6, 2025 15:49
@RobinMalfait RobinMalfait marked this pull request as ready for review October 6, 2025 17:02
@RobinMalfait
Copy link
Member Author

Yeah core isn't 3.5MB right now. It's about 680KB (which is still a lot but that's because CJS + ESM duplication).

The build process doesn't clear the dist folder so it probably got a bunch of chunks from previous builds bundled into the .tgz folder on Robin's local machine.

Yep, that was the reason why! I'm not that stupid and I did clear them manually before. But I'm stupid enough not to realise that Turborepo just put them all back when running a build as if nothing ever happened. 😅

ast.push(node)
}
}
buffer = String.fromCharCode(currentChar)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking we did this in purpose for perf reasons but it might not actually make a difference compared to everything else.

}

// Update the candidate with the new value
replaceObject(candidate, replacementCandidate)
Copy link
Contributor

@thecrypticace thecrypticace Oct 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think all the uses of replaceObject(candidate, replacementCandidate); return replacementCandidate can be just return replacementCandidate right?

It'd be cool if we could get rid of replaceObject entirely for variants too but that's probably not realistic.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You know... very likely. Let me test that out!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sense to merge this with the main selector parser or nah?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't because we don't need it anywhere else. But we can make it part of the SelectorParser and introduce a new attribute kind 🤔

I think the postcss-selector-parser actually has a selector kind that is just a group, each node is class, id, attribute etc.

The only other spot in core where we use the selector parser is inside of the compat plugin API

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm kinda indifferent — it just feels odd to have a "selector parser" and a separate "attribute selector parser". Reads like the selector parser doesn't handle attributes at all. (it does just doesn't split out their "parts").

Maybe @philipp-spiess can weigh in on what he thinks.

Copy link
Contributor

@thecrypticace thecrypticace left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall things look good 💯

Massive PR but will be awesome to have this stuff in core. If we can speed up the initialization step enough I'll be able to make this a "default on" diagnostic for intellisense which would be pretty cool.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants