Improve the ABP regex handling

Creating an issue while the pull request is closed as it would be easier to discuss in an issue.

So I found out where I would need to modify the ABP regex because where I currently modified it was the incorrect location.

Currently the code [here](https://github.com/serverless-dns/blocklists/blob/bd8e7431fc3001160fd637760f4c1585d743298a/download.py#L134) is where I’am assuming the change needs to be.

it could possibly be changed to…

```
def extractDomains(txt, rgx, groupindex):
    domainlist = set()
    regexc = re.compile(rgx, re.M)

    for match in re.finditer(regexc, txt):
        g = match.groups()
        if g is None or len(g) <= groupindex:
            continue
        g2 = g[groupindex]
        g2 = g2.strip()

        if g2.startswith("||") and g2.endswith("^"):
            domain = g2[2:-1].replace("^", "*")
            domain = "*." + domain
            domainlist.add(domain)

        else:
            if g2 and g2[-1] != '.':
                domainlist.add(g2)

    if len(domainlist) <= 0:
        return ""

    return "\n".join(domainlist)
```

This replacement should replace `||` with `*.` and removes the `^` or at least that’s the point of it so it will be treated like a standard wildcard entry. While this doesn’t address situations where there is an ending like $thirdparty, it will address the standard dns abp cases.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve the ABP regex handling #150

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Improve the ABP regex handling #150

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions