Skip to content

Commit 7d8a4f7

Browse files
committed
Improve Rules code
- Constants are now attached to the `Rules` class instead of the `PublicSuffix` class - The `Parser` class is renamed `Converter` to avoid BC break usage - The `PublicSuffix` class is made internal
1 parent 9e3e894 commit 7d8a4f7

11 files changed

+192
-131
lines changed

CHANGELOG.md

Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,32 @@
1+
# Changelog
2+
3+
All Notable changes to `PHP Domain Parser` will be documented in this file
4+
5+
## 4.0.0 - TBD
6+
7+
### Added
8+
9+
- `Pdp\Exception` a base exception for the library
10+
- `Pdp\Rules` a class to resolve domain name against the public suffix list
11+
- `Pdp\Domain` an immutable value object to represents a parsed domain name
12+
- `Pdp\Installer` a class to enable improve PSL maintenance
13+
- `Pdp\Cache` a PSR-16 file cache implementation to cache a local copy of the PSL
14+
- `Pdp\Manager` a class to enable managing PSL sources and `Rules` objects creation
15+
- `Pdp\Converter` a class to convert the PSL into a PHP array
16+
17+
### Fixed
18+
19+
- Domain class with invalid domain names improved supported
20+
- idn_* conversion error better handled
21+
22+
### Deprecated
23+
24+
- None
25+
26+
### Removed
27+
28+
- URL Parsing capabilities API is removed
29+
- `Pdp\PublicSuffixList` class replaced by the `Pdp\Rules` class
30+
- `Pdp\PublicSuffixManager` class replaced by the `Pdp\Manager` class
31+
- `Pdp\HttpAdapter\HttpAdapterInterface` interface replaced by the `Pdp\HttpClient` interface
32+
- `Pdp\HttpAdapter\CurlHttpAdapter` class replaced by the `Pdp\CurlHttpClient` class

README.md

Lines changed: 17 additions & 32 deletions
Original file line numberDiff line numberDiff line change
@@ -18,10 +18,8 @@ Consider the domain www.pref.okinawa.jp. In this domain, the
1818
*public suffix* portion is **okinawa.jp**, the *registrable domain* is
1919
**pref.okinawa.jp**, and the *subdomain* is **www**. You can't regex that.
2020

21-
Other similar libraries focus primarily on URL building, parsing, and
22-
manipulation and additionally include public suffix domain parsing. PHP Domain
23-
Parser was built around accurate Public Suffix List based parsing from the very
24-
beginning, adding a URL object simply for the sake of completeness.
21+
PHP Domain Parser was built around accurate Public Suffix List based parsing from the very
22+
beginning. For URL parsing, building or manipulation please refer to [better libraries](https://packagist.org/packages/sabre/uri?q=uri%20rfc3986&p=0) who are more focus on those area of development
2523

2624
System Requirements
2725
-------
@@ -50,7 +48,6 @@ Documentation
5048

5149
### Public Suffix Manager
5250

53-
5451
~~~php
5552
<?php
5653

@@ -240,40 +237,28 @@ namespace Pdp;
240237

241238
final class Rules
242239
{
240+
const ALL_DOMAINS = 'ALL_DOMAINS';
241+
const ICANN_DOMAINS = 'ICANN_DOMAINS';
242+
const PRIVATE_DOMAINS = 'PRIVATE_DOMAINS';
243+
243244
public function __construct(array $rules)
244-
public function resolve(string $domain = null, string $type = Domain::UNKNOWN_DOMAIN): Domain
245+
public function resolve(string $domain = null, string $type = self::ALL_DOMAINS): Domain
245246
}
246247
~~~
247248

248249
The `Rules` constructor expects a `array` representation of the Public Suffix List. This `array` representation is constructed by the `Manager` and stored using a PSR-16 compliant cache.
249250

250-
The `Rules` class resolves the submitted domain against the parsed rules from the PSL. This is done using the `Rules::resolve` method which returns a `Pdp\Domain` object. The method expect a valid domain and you can optionnally specify against which section of rules you want to validate the given domain. By default all section are used (ie `PRIVATE` and `ICANN`) if the submitted section is invalid or unknown, the resolver will fallback to use the entire list.
251-
252-
~~~php
253-
<?php
254-
255-
final class PublicSuffix
256-
{
257-
258-
const ICANN = 'ICANN_DOMAIN';
259-
const PRIVATE = 'PRIVATE_DOMAIN';
260-
const ALL = 'ALL_DOMAIN';
261-
const UNKNOWN = 'UNKNOWN_DOMAIN';
251+
The `Rules` class resolves the submitted domain against the parsed rules from the PSL. This is done using the `Rules::resolve` method which returns a `Pdp\Domain` object. The method expects
262252

263-
public function __construct(?string $domain = null, string $type = self::UNKNOWN);
264-
public function getContent(): ?string
265-
public function isKnown(): bool;
266-
public function isICANN(): bool;
267-
public function isPrivate(): bool;
268-
}
269-
~~~
253+
- a valid domain name as a string
254+
- a string to optionnally specify which section of the PSL you want to validate the given domain against.
255+
By default all sections are used `Rules::ALL_DOMAINS` but you can validate your domain against the ICANN only section (`Rules::ICANN_DOMAINS` or the private section (`Rules::PRIVATE_DOMAINS`) of the PSL.
270256

271257
~~~php
272258
<?php
273259

274260
final class Domain
275261
{
276-
public function __construct(?string $domain = null, PublicSuffix $publicSuffix);
277262
public function getDomain(): ?string
278263
public function getPublicSuffix(): ?string
279264
public function getRegistrableDomain(): ?string
@@ -286,7 +271,7 @@ final class Domain
286271

287272
The `Domain` getters method always return normalized value according to the domain status against the PSL rules.
288273

289-
<p class="message-notice"><code>Domain::isValid</code> status depends on the PSL rules used. For the same domain, depending on the rules used a domain public suffix may be valid or not.</p>
274+
<p class="message-notice"><code>Domain::isKnown</code> status depends on the PSL rules used. For the same domain, depending on the rules used a domain public suffix may be known or not.</p>
290275

291276
~~~php
292277
<?php
@@ -305,23 +290,23 @@ $domain->getDomain(); //returns 'www.ulb.ac.be'
305290
$domain->getPublicSuffix(); //returns 'ac.be'
306291
$domain->getRegistrableDomain(); //returns 'ulb.ac.be'
307292
$domain->getSubDomain(); //returns 'www'
308-
$domain->isValid(); //returns true
293+
$domain->isKnown(); //returns true
309294
$domain->isICANN(); //returns true
310295
$domain->isPrivate(); //returns false
311296

312-
//let's resolve the same URI againts the PRIVATE DOMAIN SECTION
297+
//let's resolve the same domain against the PRIVATE DOMAIN SECTION
313298

314-
$domain = $rules->resolve('www.ulb.ac.be', Domain::PRIVATE_DOMAIN);
299+
$domain = $rules->resolve('www.ulb.ac.be', Rules::PRIVATE_DOMAINS);
315300
$domain->getDomain(); //returns 'www.ulb.ac.be'
316301
$domain->getPublicSuffix(); //returns 'be'
317302
$domain->getRegistrableDomain(); //returns 'ac.be'
318303
$domain->getSubDomain(); //returns 'www.ulb'
319-
$domain->isValid(); //returns false
304+
$domain->isKnown(); //returns false
320305
$domain->isICANN(); //returns false
321306
$domain->isPrivate(); //returns false
322307
~~~
323308

324-
<p class="message-warning"><strong>Warning:</strong> Some people use the PSL to determine what is a valid domain name and what isn't. This is dangerous, particularly in these days where new gTLDs are arriving at a rapid pace, if your software does not regularly receive PSL updates, because it will erroneously think new gTLDs are not valid. The DNS is the proper source for this information. If you must use it for this purpose, please do not bake static copies of the PSL into your software with no update mechanism.</p>
309+
<p class="message-warning"><strong>Warning:</strong> Some people use the PSL to determine what is a valid domain name and what isn't. This is dangerous, particularly in these days where new gTLDs are arriving at a rapid pace, if your software does not regularly receive PSL updates, it may erroneously think new gTLDs are not known. The DNS is the proper source for this information. If you must use it for this purpose, please do not bake static copies of the PSL into your software with no update mechanism.</p>
325310

326311
Contributing
327312
-------

composer.json

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -29,11 +29,12 @@
2929
"keywords": [
3030
"Public Suffix List",
3131
"domain parsing",
32-
"url parsing"
32+
"icann",
33+
"idn",
34+
"psl"
3335
],
3436
"require": {
3537
"php": ">=7.0",
36-
"ext-curl": "*",
3738
"ext-intl": "*",
3839
"psr/simple-cache": "^1"
3940
},
@@ -43,7 +44,8 @@
4344
"friendsofphp/php-cs-fixer": "^2.7"
4445
},
4546
"suggest": {
46-
"psr/simple-cache-implementation": "To enable using other cache providers"
47+
"psr/simple-cache-implementation": "To enable using other cache providers",
48+
"ext-curl": "To use the package http client"
4749
},
4850
"autoload": {
4951
"psr-4": {

data/pdp-PSL-FULL-5a3cc7f81795bb2e48e848af42d287b4.cache

Lines changed: 1 addition & 1 deletion
Large diffs are not rendered by default.

src/Parser.php renamed to src/Converter.php

Lines changed: 8 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -16,9 +16,9 @@
1616
/**
1717
* Public Suffix List Parser.
1818
*
19-
* This class parses the Public Suffix List
19+
* This class convert the Public Suffix List into an associative, multidimensional array
2020
*/
21-
final class Parser
21+
final class Converter
2222
{
2323
/**
2424
* Convert the Public Suffix List into
@@ -28,12 +28,9 @@ final class Parser
2828
*
2929
* @return array
3030
*/
31-
public function parse(string $content): array
31+
public function convert(string $content): array
3232
{
33-
$rules = [
34-
PublicSuffix::ICANN => [],
35-
PublicSuffix::PRIVATE => [],
36-
];
33+
$rules = [Rules::ICANN_DOMAINS => [], Rules::PRIVATE_DOMAINS => []];
3734
$file = new SplTempFileObject();
3835
$file->fwrite($content);
3936
$file->setFlags(SplTempFileObject::DROP_NEW_LINE | SplTempFileObject::READ_AHEAD | SplTempFileObject::SKIP_EMPTY);
@@ -59,18 +56,18 @@ public function parse(string $content): array
5956
private function getSection(string $section, string $line): string
6057
{
6158
if ($section == '' && strpos($line, '// ===BEGIN ICANN DOMAINS===') === 0) {
62-
return PublicSuffix::ICANN;
59+
return Rules::ICANN_DOMAINS;
6360
}
6461

65-
if ($section == PublicSuffix::ICANN && strpos($line, '// ===END ICANN DOMAINS===') === 0) {
62+
if ($section == Rules::ICANN_DOMAINS && strpos($line, '// ===END ICANN DOMAINS===') === 0) {
6663
return '';
6764
}
6865

6966
if ($section == '' && strpos($line, '// ===BEGIN PRIVATE DOMAINS===') === 0) {
70-
return PublicSuffix::PRIVATE;
67+
return Rules::PRIVATE_DOMAINS;
7168
}
7269

73-
if ($section == PublicSuffix::PRIVATE && strpos($line, '// ===END PRIVATE DOMAINS===') === 0) {
70+
if ($section == Rules::PRIVATE_DOMAINS && strpos($line, '// ===END PRIVATE DOMAINS===') === 0) {
7471
return '';
7572
}
7673

src/Domain.php

Lines changed: 20 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -28,7 +28,7 @@ final class Domain
2828
private $domain;
2929

3030
/**
31-
* @var string|null
31+
* @var PublicSuffix
3232
*/
3333
private $publicSuffix;
3434

@@ -52,23 +52,26 @@ public function __construct($domain = null, PublicSuffix $publicSuffix)
5252
{
5353
$this->domain = $domain;
5454
$this->publicSuffix = $publicSuffix;
55-
$this->setRegistrableDomain();
56-
$this->setSubDomain();
55+
$this->registrableDomain = $this->setRegistrableDomain();
56+
$this->subDomain = $this->setSubDomain();
5757
}
5858

5959
/**
60-
* Compute the registrable domain part
60+
* Compute the registrable domain part.
61+
*
62+
* @return string|null
6163
*/
6264
private function setRegistrableDomain()
6365
{
6466
if (!$this->hasRegistrableDomain()) {
6567
return;
6668
}
6769

68-
$countLabelsToRemove = count($this->publicSuffix) + 1;
70+
$nbLabelsToRemove = count($this->publicSuffix) + 1;
6971
$domainLabels = explode('.', $this->domain);
70-
$domain = implode('.', array_slice($domainLabels, count($domainLabels) - $countLabelsToRemove));
71-
$this->registrableDomain = $this->normalize($domain);
72+
$registrableDomain = implode('.', array_slice($domainLabels, count($domainLabels) - $nbLabelsToRemove));
73+
74+
return $this->normalize($registrableDomain);
7275
}
7376

7477
/**
@@ -105,23 +108,26 @@ private function normalize(string $domain)
105108
}
106109

107110
/**
108-
* Compute the sub domain part
111+
* Compute the sub domain part.
112+
*
113+
* @return string|null
109114
*/
110115
private function setSubDomain()
111116
{
112117
if (!$this->hasRegistrableDomain()) {
113-
return;
118+
return null;
114119
}
115120

121+
$nbLabelsToRemove = count($this->publicSuffix) + 1;
116122
$domainLabels = explode('.', $this->domain);
117123
$countLabels = count($domainLabels);
118-
$countLabelsToRemove = count($this->publicSuffix) + 1;
119-
if ($countLabels === $countLabelsToRemove) {
120-
return;
124+
if ($countLabels === $nbLabelsToRemove) {
125+
return null;
121126
}
122127

123-
$domain = implode('.', array_slice($domainLabels, 0, $countLabels - $countLabelsToRemove));
124-
$this->subDomain = $this->normalize($domain);
128+
$domain = implode('.', array_slice($domainLabels, 0, $countLabels - $nbLabelsToRemove));
129+
130+
return $this->normalize($domain);
125131
}
126132

127133
/**

src/Manager.php

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -96,11 +96,11 @@ private function getCacheKey(string $str): string
9696
*/
9797
public function refreshRules(string $source_url = self::PSL_URL): bool
9898
{
99-
static $parser;
100-
$parser = $parser ?? new Parser();
99+
static $converter;
100+
$converter = $converter ?? new Converter();
101101
$content = $this->http->getContent($source_url);
102-
$rules = $parser->parse($content);
103-
if (empty($rules[PublicSuffix::ICANN]) || empty($rules[PublicSuffix::PRIVATE])) {
102+
$rules = $converter->convert($content);
103+
if (empty($rules[Rules::ICANN_DOMAINS]) || empty($rules[Rules::PRIVATE_DOMAINS])) {
104104
return false;
105105
}
106106

src/PublicSuffix.php

Lines changed: 7 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -16,16 +16,12 @@
1616
/**
1717
* Public Suffix Value Object
1818
*
19-
* @author Jeremy Kendall <[email protected]>
2019
* @author Ignace Nyamagana Butera <[email protected]>
20+
*
21+
* @internal
2122
*/
2223
final class PublicSuffix implements Countable
2324
{
24-
const ICANN = 'ICANN_DOMAIN';
25-
const PRIVATE = 'PRIVATE_DOMAIN';
26-
const ALL = 'ALL_DOMAIN';
27-
const UNKNOWN = 'UNKNOWN_DOMAIN';
28-
2925
/**
3026
* @var string|null
3127
*/
@@ -34,21 +30,17 @@ final class PublicSuffix implements Countable
3430
/**
3531
* @var string
3632
*/
37-
private $type = '';
33+
private $type;
3834

3935
/**
4036
* New instance.
4137
*
4238
* @param string|null $publicSuffix
4339
* @param string $type
4440
*/
45-
public function __construct(string $publicSuffix = null, string $type = self::UNKNOWN)
41+
public function __construct(string $publicSuffix = null, string $type = '')
4642
{
4743
$this->publicSuffix = $publicSuffix;
48-
if (!in_array($type, [self::UNKNOWN, self::PRIVATE, self::ICANN], true)) {
49-
$type = self::UNKNOWN;
50-
}
51-
5244
$this->type = $type;
5345
}
5446

@@ -89,7 +81,7 @@ public function count()
8981
*/
9082
public function isKnown(): bool
9183
{
92-
return self::UNKNOWN !== $this->type;
84+
return '' !== $this->type;
9385
}
9486

9587
/**
@@ -109,7 +101,7 @@ public function isKnown(): bool
109101
*/
110102
public function isICANN(): bool
111103
{
112-
return self::ICANN === $this->type;
104+
return Rules::ICANN_DOMAINS === $this->type;
113105
}
114106

115107
/**
@@ -129,7 +121,7 @@ public function isICANN(): bool
129121
*/
130122
public function isPrivate(): bool
131123
{
132-
return self::PRIVATE === $this->type;
124+
return Rules::PRIVATE_DOMAINS === $this->type;
133125
}
134126

135127
/**

0 commit comments

Comments
 (0)