Skip to content

Commit 96a02d5

Browse files
committed
wip
1 parent a5b5d4e commit 96a02d5

File tree

5 files changed

+203
-103
lines changed

5 files changed

+203
-103
lines changed

packages/route-pattern/README.md

Lines changed: 39 additions & 103 deletions
Original file line numberDiff line numberDiff line change
@@ -14,8 +14,8 @@ It does not discuss the algorithms nor data structures used by the matching engi
1414
## Non-goals
1515

1616
- Matching URL fragments (`#section`)
17-
- Matching the URL port (`:8080`)
18-
- Matching the URL credentials (`user:pass@`)
17+
- Matching URL port (`:8080`)
18+
- Matching URL credentials (`user:pass@`)
1919
- Caching
2020
- Request/Response handling
2121

@@ -64,7 +64,7 @@ You can use any combination of these to create a route pattern, for example:
6464
// ...and so on...
6565
```
6666

67-
**Delimiters:** Route patterns use the first occurrences of `://`, `/`, and `?` as delimiters to split a route pattern into its parts.
67+
**Part delimiters:** Route patterns use the first occurrences of `://`, `/`, and `?` as delimiters to split a route pattern into its parts.
6868
Pathname-only route patterns are the most common, so route patterns are assumed to be pathname-only unless `://` or `?` are present.
6969
As a result, hostnames must begin with `://` and searches must begin with `?` to distinguish both from pathnames.
7070

@@ -90,107 +90,35 @@ However, omitting a pathname means "match the 'empty' pathname" (namely `""` and
9090
// ✗ doesn't match: https://api.example.com/users
9191
```
9292

93-
9493
## Pattern modifiers
9594

96-
Before describing [wildcards](#wildcards), [params](#params), and [optionals](#optionals),
97-
its important to note that each pattern modifier applies only in the same part of the URL where it appears.
95+
Each pattern modifier — [param](#params), [glob](#globs), or [optional](#optionals) — applies only in the same part of the URL where it appears.
9896
As a result:
9997

100-
- Wildcards and params do not match characters that appear outside of their part of the route pattern
98+
- Params and globs do not match characters that appear outside of their part of the route pattern
10199
- Optionals must begin and end within the same part of the route pattern
102100

103-
### Wildcards
104-
105-
| | protocol | hostname | pathname | search |
106-
| ---------- | -------- | -------- | -------- | ------ |
107-
| Supported? |||||
108-
109-
Wildcards match dynamic parts of a URL.
110-
111-
Route patterns support two types of wildcards:
112-
113-
- `*` ("star") for matching anything _within_ a segment
114-
- `**` ("star star") for matching anything, even across multiple segments
115-
116-
As a result, wildcards correspond to these regular expressions:
117-
118-
| | `*` | `**` |
119-
| -------- | --------- | ------ |
120-
| hostname | `/[^.]*/` | `/.*/` |
121-
| pathname | `/[^/]*/` | `/.*/` |
122-
123-
```ts
124-
'/files/*';
125-
// ✓ matches: /files/photo.jpg
126-
// ✗ doesn't match: /files/2023/photo.jpg
127-
128-
'/docs/**';
129-
// ✓ matches: /docs/api/v1/intro.html
130-
// ✗ doesn't match: /docs (no trailing content)
131-
132-
'://*.example.com';
133-
// ✓ matches: ://cdn.example.com
134-
// ✗ doesn't match: ://api.staging.example.com
135-
136-
'://**.api.com';
137-
// ✓ matches: ://tenant.v1.api.com
138-
// ✗ doesn't match: ://api.com (no prefix)
139-
```
140-
141-
Route patterns can have multiple wildcards, even within the same segment.
142-
143-
```ts
144-
'/assets/**/static/**/*.css';
145-
// ✓ matches: /assets/v2/themes/static/dark/main.css
146-
// ✗ doesn't match: /assets/v2/themes/static/main.js
147-
148-
'://us-**.cdn.com/cars/*-*';
149-
// ✓ matches: ://us-east.staging.cdn.com/cars/audi-a4.jpg
150-
// ✗ doesn't match: ://us-east.staging.cdn.com/cars/toyota.jpg
151-
```
152-
153-
Wildcards only match characters within the same part of the URL:
154-
155-
```ts
156-
'://api.**/users';
157-
// ✓ matches: ://api.example.com/users
158-
// ✗ doesn't match: ://api.example.com/123/users
159-
```
160-
161101
### Params
162102

163103
| | protocol | hostname | pathname | search |
164104
| ---------- | -------- | -------- | -------- | ------ |
165105
| Supported? |||||
166106

167-
Params, like wildcards, match dynamic parts of the URL but they also give you access to the matched values.
168-
169-
A param is written as:
170-
171-
- `:` followed by a name for capturing anything within a segment (similar to `*`)
172-
- `::` followed by a name for capturing anything, even across multiple segments (similar to `**`)
173-
174-
**Note:** Param names must be [JavaScript identifiers](#javascript-identifier).
107+
Params match dynamic parts of a URL within a segment.
175108

176-
As a result, params correspond to these regular expressions:
177-
178-
| | `:<name>` | `::<name>` |
179-
| -------- | ----------- | ---------- |
180-
| hostname | `/([^.]*)/` | `/(.*)/` |
181-
| pathname | `/([^/]*)/` | `/(.*)/` |
109+
They are written as a `:` optionally followed by a [JavaScript identifier](#javascript-identifier) that acts as its name:
182110

183111
```ts
184112
'products/:id';
185113
// /products/wireless-headphones → { id: 'wireless-headphones' }
186114
// /products/123 → { id: '123' }
115+
```
187116

188-
// ❌ Error - missing or invalid param name
189-
'products/:123';
117+
When a param name is not given, the matched value won't be returned:
190118

191-
'docs/::path';
192-
// /docs/api/v1/intro.html → { path: 'api/v1/intro.html' }
193-
// /docs/guide → { path: 'guide' }
119+
```ts
120+
'products/:-shoes';
121+
// /products/tennis-shoes -> {}
194122
```
195123

196124
Param names must be unique:
@@ -204,13 +132,9 @@ Param names must be unique:
204132

205133
// ❌ Bad - duplicate param name across hostname and pathname
206134
'://:region.api.example.com/users/:region';
207-
208-
// ✅ Good - `::` param captures across segments
209-
'files/::path/download';
210-
// /files/2023/photos/vacation.jpg/download → { path: '2023/photos/vacation.jpg' }
211135
```
212136

213-
Params can be mixed with static text, wildcards, and even other params:
137+
Params can be mixed with static text and even other params:
214138

215139
```ts
216140
'users/@:id';
@@ -222,22 +146,34 @@ Params can be mixed with static text, wildcards, and even other params:
222146
'api/v:major.:minor-:channel';
223147
// /api/v2.1-beta → { major: '2', minor: '1', channel: 'beta' }
224148

225-
'://:region.:env.api.example.com';
226-
// us-east.staging.api.example.com → { region: 'us-east', env: 'staging' }
149+
'://us-:region.:env.api.example.com';
150+
// us-east.staging.api.example.com → { region: 'east', env: 'staging' }
151+
```
152+
153+
### Globs
154+
155+
| | protocol | hostname | pathname | search |
156+
| ---------- | -------- | -------- | -------- | ------ |
157+
| Supported? |||||
158+
159+
Globs match dynamic parts of a URL, but — unlike [params](#params) — they are not limited to a single segment.
227160

228-
'cdn/::path/*.jpg';
229-
// /cdn/images/2023/vacation.jpg → { path: 'images/2023' }
161+
They are written as a `*` optionally followed by a [JavaScript identifier](#javascript-identifier) that acts as its name:
230162

231-
'://::subdomain.api.com/:version/*';
232-
// tenant.v1.api.com/v2/users → { subdomain: 'tenant.v1', version: 'v2' }
163+
```ts
164+
// todo
165+
```
166+
167+
When a glob name is not given, the matched value won't be returned:
168+
169+
```ts
170+
// todo
233171
```
234172

235-
Params only match characters within the same part of the URL:
173+
Globs share a namespace with params:
236174

237175
```ts
238-
'://api.::domain/users';
239-
// ✓ matches: ://api.example.com/users → { domain: 'example.com' }
240-
// ✗ doesn't match: ://api.example.com/123/users
176+
// todo
241177
```
242178

243179
### Optionals
@@ -274,23 +210,23 @@ Optionals can span any characters and contain static text, params, or wildcards:
274210
// /users/sarah/settings/profile → { id: 'sarah', section: 'profile' }
275211
// /users/sarah/settings/profile/edit → { id: 'sarah', section: 'profile' }
276212

277-
'users/:userId(/files/*)';
213+
'users/:userId(/files/:)';
278214
// /users/sarah → { userId: 'sarah' }
279215
// /users/sarah/files/document.pdf → { userId: 'sarah' }
280216

281-
'users/:userId(/docs/**)';
217+
'users/:userId(/docs/*)';
282218
// /users/sarah → { userId: 'sarah' }
283219
// /users/sarah/docs/projects/readme.md → { userId: 'sarah' }
284220

285-
'users/:userId(/files/::path)';
221+
'users/:userId(/files/*path)';
286222
// /users/sarah → { userId: 'sarah' }
287223
// /users/sarah/files/projects/docs/readme.md → { userId: 'sarah', path: 'projects/docs/readme.md' }
288224

289225
'://(www.)shop.example.com';
290226
// shop.example.com → {}
291227
// www.shop.example.com → {}
292228

293-
'://(*.)api.example.com(/v*)';
229+
'://(:.)api.example.com(/v:)';
294230
// api.example.com → {}
295231
// cdn.api.example.com/v2 → {}
296232
```
Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
import * as assert from 'node:assert/strict';
2+
import { describe, it } from 'node:test';
3+
4+
import { parseProtocol } from './parse.ts';
5+
6+
describe('parse', () => {
7+
it('parses protocol', () => {
8+
assert.deepStrictEqual(parseProtocol('http(s)'), [
9+
{ span: [0, 4], type: 'text', value: 'http' },
10+
{ span: [4, 3], type: 'optional', nodes: [{ span: [5, 1], type: 'text', value: 's' }] },
11+
]);
12+
});
13+
});
Lines changed: 42 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,42 @@
1+
import { lexHostname, lexPathname, lexProtocol, type Token } from './tokenize.ts';
2+
3+
type Optional = { type: 'optional'; nodes: Array<Token>; span: [number, number] };
4+
type Node = Token | Optional;
5+
6+
function parseOptionals(tokens: Iterable<Token>) {
7+
const nodes: Array<Node> = [];
8+
9+
let optional: Optional | null = null;
10+
for (const token of tokens) {
11+
if (token.type === '(') {
12+
if (optional) {
13+
throw new Error(`Nested paren: ${optional.span[0]} ${token.span[0]}`);
14+
}
15+
optional = { type: 'optional', nodes: [], span: token.span };
16+
continue;
17+
}
18+
if (token.type === ')') {
19+
if (!optional) {
20+
throw new Error(`Unbalanced paren: ${token.span[0]}`);
21+
}
22+
optional.span[1] = optional.nodes.reduce((acc, node) => acc + node.span[1], 0) + 2;
23+
nodes.push(optional);
24+
optional = null;
25+
continue;
26+
}
27+
(optional?.nodes ?? nodes).push(token);
28+
}
29+
return nodes;
30+
}
31+
32+
export function parseProtocol(source: string) {
33+
return parseOptionals(lexProtocol(source));
34+
}
35+
36+
export function parseHostname(source: string) {
37+
return parseOptionals(lexHostname(source));
38+
}
39+
40+
export function parsePathname(source: string) {
41+
return parseOptionals(lexPathname(source));
42+
}
Lines changed: 38 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,38 @@
1+
import * as assert from 'node:assert/strict';
2+
import { describe, it } from 'node:test';
3+
4+
import { lexProtocol, lexHostname, lexPathname } from './tokenize.ts';
5+
6+
describe('lex', () => {
7+
it('lexes protocol', () => {
8+
assert.deepStrictEqual(Array.from(lexProtocol('http(s)')), [
9+
{ span: [0, 4], type: 'text', value: 'http' },
10+
{ span: [4, 1], type: '(' },
11+
{ span: [5, 1], type: 'text', value: 's' },
12+
{ span: [6, 1], type: ')' },
13+
]);
14+
});
15+
it('lexes hostname', () => {
16+
assert.deepStrictEqual(Array.from(lexHostname('(*tenant.:sub.)remix.run')), [
17+
{ span: [0, 1], type: '(' },
18+
{ span: [1, 7], type: 'glob', name: 'tenant' },
19+
{ span: [8, 1], type: 'text', value: '.' },
20+
{ span: [9, 4], type: 'param', name: 'sub' },
21+
{ span: [13, 1], type: 'text', value: '.' },
22+
{ span: [14, 1], type: ')' },
23+
{ span: [15, 9], type: 'text', value: 'remix.run' },
24+
]);
25+
});
26+
it('lexes pathname', () => {
27+
assert.deepStrictEqual(Array.from(lexPathname('/products/:id(/v:version/*path)')), [
28+
{ span: [0, 10], type: 'text', value: '/products/' },
29+
{ span: [10, 3], type: 'param', name: 'id' },
30+
{ span: [13, 1], type: '(' },
31+
{ span: [14, 2], type: 'text', value: '/v' },
32+
{ span: [16, 8], type: 'param', name: 'version' },
33+
{ span: [24, 1], type: 'text', value: '/' },
34+
{ span: [25, 5], type: 'glob', name: 'path' },
35+
{ span: [30, 1], type: ')' },
36+
]);
37+
});
38+
});
Lines changed: 71 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,71 @@
1+
export type Token = { span: [number, number] } & (
2+
| { type: 'text'; value: string }
3+
| { type: 'param'; name: string }
4+
| { type: 'glob'; name: string }
5+
| { type: '(' | ')' }
6+
);
7+
8+
type Lexer = (source: string, index: number) => Token | null;
9+
10+
function* tokenize(source: string, lexers: Array<Lexer>): Generator<Token> {
11+
let buffer = '';
12+
13+
let index = 0;
14+
while (index < source.length) {
15+
let token: Token | null = null;
16+
for (const lex of lexers) {
17+
token = lex(source, index);
18+
if (token) break;
19+
}
20+
21+
if (token) {
22+
if (buffer) {
23+
yield { type: 'text', value: buffer, span: [index - buffer.length, buffer.length] };
24+
buffer = '';
25+
}
26+
yield token;
27+
index += token.span[1];
28+
} else {
29+
buffer += source[index];
30+
index += 1;
31+
}
32+
}
33+
34+
if (buffer) {
35+
yield { type: 'text', value: buffer, span: [source.length - buffer.length, buffer.length] };
36+
}
37+
}
38+
39+
const parensLexer: Lexer = (source, index) => {
40+
const char = source[index];
41+
if (char === '(' || char === ')') {
42+
return { type: char, span: [index, 1] };
43+
}
44+
return null;
45+
};
46+
47+
const identifierRE = /[a-zA-Z_$][a-zA-Z_$0-9]*/;
48+
49+
const paramRE = new RegExp('^:(' + identifierRE.source + ')?');
50+
const paramLexer: Lexer = (source, index) => {
51+
const match = paramRE.exec(source.slice(index));
52+
if (!match) return null;
53+
const name = match[1];
54+
if (name === undefined) throw new Error('todo: missing param name');
55+
return { type: 'param', name, span: [index, match[0].length] };
56+
};
57+
58+
const globRE = new RegExp('^\\*(' + identifierRE.source + ')?');
59+
const globLexer: Lexer = (source, index) => {
60+
const match = globRE.exec(source.slice(index));
61+
if (!match) return null;
62+
const name = match[1];
63+
if (name === undefined) throw new Error('todo: missing glob name');
64+
return { type: 'glob', name, span: [index, match[0].length] };
65+
};
66+
67+
export const lexProtocol = (source: string) => tokenize(source, [parensLexer]);
68+
export const lexHostname = (source: string) =>
69+
tokenize(source, [paramLexer, globLexer, parensLexer]);
70+
export const lexPathname = (source: string) =>
71+
tokenize(source, [paramLexer, globLexer, parensLexer]);

0 commit comments

Comments
 (0)