Skip to content

Commit b3f9b16

Browse files
Merge pull request #336 from ua-parser/expand-readme
Expand the readme with a small summary of the spec
2 parents 48ea1a0 + d572c0b commit b3f9b16

File tree

1 file changed

+89
-0
lines changed

1 file changed

+89
-0
lines changed

README.md

Lines changed: 89 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -79,6 +79,95 @@ To cross-build for different Scala versions:
7979
sbt +publishLocal
8080
```
8181

82+
### Details about the implementation using `regexes.yaml` file
83+
84+
This project uses the 'regexes.yaml' file from [ua-parser/uap-core](https://github.com/ua-parser/uap-core/) repository to
85+
perform user-agent string parsing according to the [documented specification](https://github.com/ua-parser/uap-core/blob/master/docs/specification.md).
86+
The file is included as a git submodule in the `core` directory.
87+
88+
Below, follows a summary of that same specification.
89+
90+
#### Summary
91+
92+
This implementation (and others) works by applying three independent ordered rule lists to the same input user‑agent
93+
string:
94+
95+
- User agent parser ('user_agent_parsers' definitions): provides the "browser" name and version.
96+
- OS parser ('os_parsers' definitions): provides operating system name and version.
97+
- Device parser ('device_parsers'): provides device family and optional brand and model.
98+
99+
Each list is evaluated top‑to‑bottom. The first matching regex wins, and parsing for that list stops immediately.
100+
101+
#### Data file format
102+
103+
At a high level, 'regexes.yaml' is a YAML map with top-level keys like:
104+
105+
- `user_agent_parsers:`
106+
- `os_parsers:`
107+
- `device_parsers:`
108+
109+
Each value is a YAML list. Each list item is a small map that always contains a regex and may contain `*_replacement`
110+
fields.
111+
112+
##### Examples:
113+
114+
User agent parser example:
115+
```yaml
116+
user_agent_parsers:
117+
- regex: '(Namoroka|Shiretoko|Minefield)/(\d+)\.(\d+)\.(\d+(?:pre|))'
118+
family_replacement: 'Firefox ($1)'
119+
```
120+
121+
OS parser example:
122+
```yaml
123+
os_parsers:
124+
- regex: 'CFNetwork/.{0,100} Darwin/22\.([0-5])\.\d+'
125+
os_replacement: 'iOS'
126+
os_v1_replacement: '16'
127+
os_v2_replacement: '$1'
128+
```
129+
130+
Device parser example:
131+
```yaml
132+
device_parsers:
133+
- regex: '; *(PEDI)_(PLUS)_(W) Build'
134+
device_replacement: 'Odys $1 $2 $3'
135+
brand_replacement: 'Odys'
136+
model_replacement: '$1 $2 $3'
137+
```
138+
139+
#### Capturing groups and default field mapping
140+
141+
The spec’s core idea is to put capturing groups `(...)` in your regex to extract parts of the UA string. If you don't
142+
supply replacements, fields map by group order.
143+
144+
#### User agent default mapping
145+
146+
If a user agent rule matches and it provides no replacements:
147+
- group 1: _family_
148+
- group 2: _major_
149+
- group 3: _minor_
150+
- group 4: _patch_
151+
152+
#### OS default mapping
153+
154+
Similarly, OS rules map:
155+
156+
- group 1: _family_
157+
- group 2: _major_
158+
- group 3: _minor_
159+
- group 4: _patch_
160+
- group 5: _patchMinor_
161+
162+
#### Device default mapping
163+
164+
Devices are slightly different: if no replacements are given, the first match defines the device family and model,
165+
and brand/model may be undefined depending on the rule and implementation.
166+
167+
In case no matching regex is found, the value for family shall be "Other". Brand and model shall not be defined.
168+
Leading and trailing whitespaces shall be trimmed from the result.
169+
170+
82171
### Maintainers
83172

84173
* Piotr Adamski ([@mcveat](https://twitter.com/mcveat)) (Author. Based on the java implementation by Steve Jiang [@sjiang](https://twitter.com/sjiang) and using agent data from BrowserScope)

0 commit comments

Comments
 (0)