Skip to content

Commit ad3eef8

Browse files
committed
Introduce Rules::getPublicSuffix
Because Rules::resolve try to comply to PublicSuffix List algorithm some exception are not always caught Rules::getPublicSuffix is more strict than Rules::resolve and only returns the public suffix information for a given domain. The Domain and PublicSuffix value object now implements two modifiers ::toUnicode and ::toAscii to convers their value according to IDN UTS 46 algorithm. Domain names are still not validated but if the conversion can not be perform an Exception is thrown.
1 parent f3f4275 commit ad3eef8

12 files changed

+582
-149
lines changed

CHANGELOG.md

Lines changed: 28 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,33 @@
11
# Changelog
22

3-
All Notable changes to `PHP Domain Parser` will be documented in this file
3+
All Notable changes to `PHP Domain Parser` **5.x** series will be documented in this file
4+
5+
## Next - TBD
6+
7+
### Added
8+
9+
- `Pdp\Rules::supports` returns a boolean to indicates if a given section is supported
10+
- `Pdp\Rules::getPublicSuffix` returns a `Pdp\PublicSuffix` value object
11+
- `Pdp\Rules::__set_state` is implemented
12+
- `Pdp\Domain::getSection` returns a string containing the section name used to determine the public suffix
13+
- `Pdp\Domain::toUnicode` returns a `Pdp\Domain` with its value converted to its Unicode form
14+
- `Pdp\Domain::toAscii` returns a `Pdp\Domain` with its value converted to its AScii form
15+
- `Pdp\PublicSuffix::getSection` returns a string containing the section name used to determine the public suffix
16+
- `Pdp\PublicSuffix::toUnicode` returns a `Pdp\PublicSuffix` with its value converted to its Unicode form
17+
- `Pdp\PublicSuffix::toAscii` returns a `Pdp\PublicSuffix` with its value converted to its AScii form
18+
19+
### Fixed
20+
21+
- `Pdp\Domain::getDomain` returns the normalized form of the domain name
22+
- `Pdp\PublicSuffix` is no longer internal.
23+
24+
### Deprecated
25+
26+
- None
27+
28+
### Removed
29+
30+
- None
431

532
## 5.1.0 - 2017-12-18
633

README.md

Lines changed: 38 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -64,10 +64,17 @@ final class Rules
6464
public static function createFromPath(string $path, $context = null): self
6565
public static function createFromString(string $content): self
6666
public function __construct(array $rules)
67+
public function supports(string $section): bool
68+
public function getPublicSuffix(string $domain = null, string $section = self::ALL_DOMAINS): PublicSuffix
6769
public function resolve(string $domain = null, string $section = self::ALL_DOMAINS): Domain
6870
}
6971
~~~
7072

73+
**NEW IN VERSION 5.2:**
74+
75+
- `Rules::supports` returns a boolean to tell whether the specific section is present in the `Rules` object;
76+
- `Rules::getPublicSuffix` returns a `PublicSuffix` object determined from the `Rules` object;
77+
7178
**NEW IN VERSION 5.1:**
7279

7380
- `Rules::createFromString` expects a string content which follows [the PSL format](https://publicsuffix.org/list/#list-format);
@@ -104,9 +111,18 @@ final class Domain implements JsonSerializable
104111
public function isKnown(): bool;
105112
public function isICANN(): bool;
106113
public function isPrivate(): bool;
114+
public function getSection(): string;
115+
public function toUnicode(): self;
116+
public function toAscii(): self;
107117
}
108118
~~~
109119

120+
**NEW IN VERSION 5.2:**
121+
122+
- `Domain::getSection` returns the section string name if presents else returns an empty string;
123+
- `Domain::toUnicode` returns an instance with the domain converted to its unicode representation;
124+
- `Domain::toAscii` returns an instance with the domain converted to its ascii representation;
125+
110126
**THIS EXAMPLE ILLUSTRATES HOW THE OBJECT WORK BUT SHOULD BE AVOIDED IN PRODUCTON**
111127

112128
~~~php
@@ -169,6 +185,28 @@ If the domain name or some of its part are seriously malformed or unrecognized,
169185
- `Pdp\Domain::isICANN` returns `true` if the public suffix is found using a PSL which includes the ICANN DOMAINS section;
170186
- `Pdp\Domain::isPrivate` returns `true` if the public suffix is found using a PSL which includes the PRIVATE DOMAINS section;
171187

188+
The `Rules::getPublicSuffix` method expects the same arguments as `Rules::resolve` but returns a `Pdp\PublicSuffix` object instead.
189+
190+
~~~php
191+
<?php
192+
193+
final class PublicSuffix implements Countable, JsonSerializable
194+
{
195+
public function getContent(): ?string
196+
public function isKnown(): bool;
197+
public function isICANN(): bool;
198+
public function isPrivate(): bool;
199+
public function getSection(): string;
200+
public function toUnicode(): self;
201+
public function toAscii(): self;
202+
}
203+
~~~
204+
205+
While `Rules::resolve` will only throws an exception if the section value is invalid, the `Rules::getPublicSuffix` is more restrictive and will additionnally throw if:
206+
207+
- If the Domain is invalid or seriously malformed
208+
- If the PublicSuffix can not be normalized and converted using the domain encoding.
209+
172210
**WARNING:**
173211

174212
**The `Pdp\Rules::resolve` does not validate the submitted host. You are require to use a host validator prior to using this library.**

data/pdp-PSL-FULL-5a3cc7f81795bb2e48e848af42d287b4.cache

Lines changed: 1 addition & 1 deletion
Large diffs are not rendered by default.

src/Converter.php

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,9 @@
1717
* Public Suffix List Parser.
1818
*
1919
* This class convert the Public Suffix List into an associative, multidimensional array
20+
*
21+
* @author Jeremy Kendall <[email protected]>
22+
* @author Ignace Nyamagana Butera <[email protected]>
2023
*/
2124
final class Converter
2225
{

src/Domain.php

Lines changed: 96 additions & 40 deletions
Original file line numberDiff line numberDiff line change
@@ -49,6 +49,14 @@ final class Domain implements JsonSerializable
4949
*/
5050
private $subDomain;
5151

52+
/**
53+
* {@inheritdoc}
54+
*/
55+
public static function __set_state(array $properties): self
56+
{
57+
return new self($properties['domain'], $properties['publicSuffix']);
58+
}
59+
5260
/**
5361
* New instance.
5462
*
@@ -57,12 +65,36 @@ final class Domain implements JsonSerializable
5765
*/
5866
public function __construct($domain = null, PublicSuffix $publicSuffix = null)
5967
{
68+
if (false !== strpos((string) $domain, '%')) {
69+
$domain = rawurldecode($domain);
70+
}
71+
72+
if (null !== $domain) {
73+
$domain = strtolower($domain);
74+
}
75+
6076
$this->domain = $domain;
61-
$this->publicSuffix = $publicSuffix ?? new PublicSuffix();
77+
$this->publicSuffix = $this->setPublicSuffix($publicSuffix);
6278
$this->registrableDomain = $this->setRegistrableDomain();
6379
$this->subDomain = $this->setSubDomain();
6480
}
6581

82+
/**
83+
* Filter the PublicSuffix
84+
*
85+
* @param PublicSuffix|null $publicSuffix
86+
*
87+
* @return PublicSuffix
88+
*/
89+
private function setPublicSuffix(PublicSuffix $publicSuffix = null): PublicSuffix
90+
{
91+
if (null === $publicSuffix || null === $this->domain) {
92+
return new PublicSuffix();
93+
}
94+
95+
return $publicSuffix;
96+
}
97+
6698
/**
6799
* Compute the registrable domain part.
68100
*
@@ -82,29 +114,7 @@ private function setRegistrableDomain()
82114
$domainLabels = explode('.', $this->domain);
83115
$registrableDomain = implode('.', array_slice($domainLabels, count($domainLabels) - $nbLabelsToRemove));
84116

85-
return $this->normalize($registrableDomain);
86-
}
87-
88-
/**
89-
* Normalizes the domain according to its representation.
90-
*
91-
* @param string $domain
92-
*
93-
* @return string|null
94-
*/
95-
private function normalize(string $domain)
96-
{
97-
$func = 'idn_to_utf8';
98-
if (false !== strpos($domain, 'xn--')) {
99-
$func = 'idn_to_ascii';
100-
}
101-
102-
$domain = $func($domain, 0, INTL_IDNA_VARIANT_UTS46);
103-
if (false === $domain) {
104-
return null;
105-
}
106-
107-
return strtolower($domain);
117+
return $registrableDomain;
108118
}
109119

110120
/**
@@ -127,23 +137,19 @@ private function setSubDomain()
127137

128138
$subDomain = implode('.', array_slice($domainLabels, 0, $countLabels - $nbLabelsToRemove));
129139

130-
return $this->normalize($subDomain);
140+
return $subDomain;
131141
}
132142

133143
/**
134144
* {@inheritdoc}
135145
*/
136146
public function jsonSerialize()
137147
{
138-
return [
148+
return array_merge([
139149
'domain' => $this->domain,
140150
'registrableDomain' => $this->registrableDomain,
141151
'subDomain' => $this->subDomain,
142-
'publicSuffix' => $this->publicSuffix->getContent(),
143-
'isKnown' => $this->publicSuffix->isKnown(),
144-
'isICANN' => $this->publicSuffix->isICANN(),
145-
'isPrivate' => $this->publicSuffix->isPrivate(),
146-
];
152+
], $this->publicSuffix->jsonSerialize());
147153
}
148154

149155
/**
@@ -154,14 +160,6 @@ public function __debugInfo()
154160
return $this->jsonSerialize();
155161
}
156162

157-
/**
158-
* {@inheritdoc}
159-
*/
160-
public static function __set_state(array $properties)
161-
{
162-
return new self($properties['domain'], $properties['publicSuffix']);
163-
}
164-
165163
/**
166164
* Returns the full domain name.
167165
*
@@ -214,7 +212,7 @@ public function getPublicSuffix()
214212
}
215213

216214
/**
217-
* Tells whether the public suffix has a matching rule in a Public Suffix List.
215+
* Tells whether the public suffix has been matching rule in a Public Suffix List.
218216
*
219217
* @return bool
220218
*/
@@ -242,4 +240,62 @@ public function isPrivate(): bool
242240
{
243241
return $this->publicSuffix->isPrivate();
244242
}
243+
244+
/**
245+
* Returns the public suffix section name used to determine the public suffix.
246+
*
247+
* @return string
248+
*/
249+
public function getSection(): string
250+
{
251+
return $this->publicSuffix->getSection();
252+
}
253+
254+
/**
255+
* Converts the domain to its IDNA ASCII form.
256+
*
257+
* This method MUST retain the state of the current instance, and return
258+
* an instance with is content converted to its IDNA ASCII form
259+
*
260+
* @throws Exception if the domain can not be converted to ASCII using IDN UTS46 algorithm
261+
*
262+
* @return self
263+
*/
264+
public function toAscii(): self
265+
{
266+
if (null === $this->domain || false !== strpos($this->domain, 'xn--')) {
267+
return $this;
268+
}
269+
270+
$domain = idn_to_ascii($this->domain, 0, INTL_IDNA_VARIANT_UTS46, $arr);
271+
if (!$arr['errors']) {
272+
return new self($domain, $this->publicSuffix->toAscii());
273+
}
274+
275+
throw new Exception(sprintf('The following domain `%s` can not be converted to ascii', $this->domain));
276+
}
277+
278+
/**
279+
* Converts the domain to its IDNA UTF8 form.
280+
*
281+
* This method MUST retain the state of the current instance, and return
282+
* an instance with is content converted to its IDNA UTF8 form
283+
*
284+
* @throws Exception if the domain can not be converted to Unicode using IDN UTS46 algorithm
285+
*
286+
* @return self
287+
*/
288+
public function toUnicode(): self
289+
{
290+
if (null === $this->domain || false === strpos($this->domain, 'xn--')) {
291+
return $this;
292+
}
293+
294+
$domain = idn_to_utf8($this->domain, 0, INTL_IDNA_VARIANT_UTS46, $arr);
295+
if (!$arr['errors']) {
296+
return new self($domain, $this->publicSuffix->toUnicode());
297+
}
298+
299+
throw new Exception(sprintf('The following domain `%s` can not be converted to unicode', $this->domain));
300+
}
245301
}

src/Installer.php

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -40,7 +40,7 @@ public static function updateLocalCache(Event $event = null)
4040

4141
require $vendor.'/autoload.php';
4242

43-
$io->write('Updating your Public Suffix List ICANN Section local cache.');
43+
$io->write('Updating your Public Suffix List local cache.');
4444
if (!extension_loaded('curl')) {
4545
$io->writeError([
4646
'😓 😓 😓 Your local cache could not be updated. 😓 😓 😓',

src/Manager.php

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,9 @@
1818
*
1919
* This class obtains, writes, caches, and returns PHP representations
2020
* of the Public Suffix List ICANN section
21+
*
22+
* @author Jeremy Kendall <[email protected]>
23+
* @author Ignace Nyamagana Butera <[email protected]>
2124
*/
2225
final class Manager
2326
{

0 commit comments

Comments
 (0)