|
1 | 1 | # URL Tools |
2 | 2 |
|
3 | | -Processing, normalizing, and de-duplicating large piles of URLs can be a pain, |
4 | | -particularly if you're trying to distinguish "real" unique URLs from the many |
5 | | -variations that can appear in the wild. URLs with anchor links, query params |
6 | | -in different orders, social sharing and analytics campaign cruft, accidental |
7 | | -references to staging servers… You get the idea. |
| 3 | +Processing, normalizing, and de-duplicating large piles of URLs can be a pain, particularly if you're trying to distinguish "real" unique URLs from the many variations that can appear in the wild. URLs with anchor links, query params in different orders, social sharing and analytics campaign cruft, accidental references to staging servers… You get the idea. |
8 | 4 |
|
9 | | -URL Tools is a helper library whose sole purpose is making that process just a |
10 | | -little less frustrating. It consists of four major pieces: |
| 5 | +URL Tools is a helper library whose sole purpose is making that process just a little less frustrating. It consists of four major pieces: |
11 | 6 |
|
12 | | -- `ParsedUrl`, a wrapper for the standard WHATWG `URL` class that mixes in the |
13 | | - domain and subdomain parsing from `tldjs`. Serializing a `ParsedUrl` object to |
14 | | - JSON also produces a broken out collection of its individual properties, rather |
15 | | - than spitting out the `href` property, as is the `URL` class's habit. |
16 | | -- `ParsedUrlSet`, a collection class that automatically parses, normalizes, and |
17 | | - de-duplicates sets of existing Urls. It's a bit janky, since ES6's `Set` |
18 | | - implementation only supports value comparison. As such, you can put `ParsedUrl` |
19 | | - objects *into* the set, but after normalization they're stored as simple strings. |
20 | | - It also keeps track of the URLs that it rejects as unparseable. |
21 | | -- A light set of helper functions for common filtering and normalizing operations, |
22 | | - including sorting querystring parameters, stripping social sharing cruft, |
23 | | - remapping 'ww1', 'ww2', etc. subdomains to a single canonical one, identifying |
24 | | - web vs. non-web URLs, flagging urls on public hosting like S3, and more. |
25 | | -# Todo |
26 | | -- [ ] A richer set of filters |
27 | | -- [ ] Chainable filtering and transforming for ParsedUrlSet |
| 7 | +- `ParsedUrl`, a wrapper for the standard WHATWG `URL` class that mixes in the domain and subdomain parsing from [`tldjs`](https://www.npmjs.com/package/tldjs). Serializing a `ParsedUrl` object to JSON also produces a broken out collection of its individual properties, rather than spitting out the `href` property, as is the `URL` class's habit. |
| 8 | +- `ParsedUrlSet`, a collection class that automatically parses, normalizes, and de-duplicates sets of existing Urls. It's a bit janky, since ES6's `Set` implementation only supports value comparison. As such, you can put `ParsedUrl` objects *into* the set, but after normalization they're stored as simple strings. It also keeps track of the URLs that it rejects as unparseable. |
| 9 | +- A light set of helper functions for common filtering and normalizing operations, including sorting querystring parameters, stripping social sharing cruft, remapping 'ww1', 'ww2', etc. subdomains to a single canonical one, identifying web vs. non-web URLs, flagging urls on public hosting like S3, and more. These can be combined or defined on the fly to serve as the normalizer function for a `ParsedUrlSet`. |
0 commit comments