Skip to content

Commit df355e6

Browse files
author
Pietro Albini
committed
Fix strip_urls not stripping domains with dashes (fixes #32)
Before this commit, an url with dashes in the domain was not stripped correctly by the botogram.utils.strip_urls functions, which caused issues in the syntax detector (recognizing URLs as Markdown). This commit fixes the regular expression used to strip URLs, and also adds some unit tests for that function in order to prevent issues like this one in the future.
1 parent e70a442 commit df355e6

File tree

3 files changed

+30
-1
lines changed

3 files changed

+30
-1
lines changed

botogram/utils.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,7 @@
2222
_username_re = re.compile(r"\@([a-zA-Z0-9_]{5}[a-zA-Z0-9_]*)")
2323
_command_re = re.compile(r"^\/[a-zA-Z0-9_]+(\@[a-zA-Z0-9_]{5}[a-zA-Z0-9_]*)?$")
2424
_email_re = re.compile(r"[a-zA-Z0-9_\.\+\-]+\@[a-zA-Z0-9_\.\-]+\.[a-zA-Z]+")
25-
_url_re = re.compile(r"https?://(-\.)?([^\s/?\.#-]+\.?)+(/[^\s]*)?")
25+
_url_re = re.compile(r"https?://(-\.)?([^\s/?\.#]+\.?)+(/[^\s]*)?")
2626

2727
# This small piece of global state will track if logbook was configured
2828
_logger_configured = False

docs/changelog.rst

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -19,13 +19,16 @@ botogram 0.2
1919
* Renamed ``Bot.init_shared_memory`` to ``Bot.prepare_memory``
2020
* Renamed ``Component.add_shared_memory_initializer`` to
2121
``Component.add_memory_preparer``
22+
* Fix the syntax detector checking URLs with dashes in the domain (`issue 32`_)
2223

2324
The following things are now **deprecated**:
2425

2526
* ``Bot.init_shared_memory``, and it will be removed in botogram 1.0
2627
* ``Component.add_shared_memory_initializer``, and it will be removed in
2728
botogram 1.0
2829

30+
.. _issue 32: https://github.com/pietroalbini/botogram/issues/32
31+
2932
.. _changelog-0.1.2:
3033

3134
botogram 0.1.2

tests/test_utils.py

Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,32 @@
1717
"""
1818

1919

20+
def test_strip_urls():
21+
# Standard HTTP url
22+
assert botogram.utils.strip_urls("http://www.example.com") == ""
23+
24+
# Standard HTTPS url
25+
assert botogram.utils.strip_urls("https://www.example.com") == ""
26+
27+
# HTTP url with dashes in the domain (issue #32)
28+
assert botogram.utils.strip_urls("http://www.ubuntu-it.org") == ""
29+
30+
# HTTP url with a path
31+
assert botogram.utils.strip_urls("http://example.com/~john/d/a.txt") == ""
32+
33+
# Standard email address
34+
assert botogram.utils.strip_urls("[email protected]") == ""
35+
36+
# Email address with a comment (+something)
37+
assert botogram.utils.strip_urls("[email protected]") == ""
38+
39+
# Email address with subdomains
40+
assert botogram.utils.strip_urls("[email protected]") == ""
41+
42+
# Email address with dashes in the domain name (issue #32)
43+
assert botogram.utils.strip_urls("[email protected]") == ""
44+
45+
2046
def test_format_docstr():
2147
# This docstring needs lots of cleanup...
2248
res = botogram.utils.format_docstr(STRANGE_DOCSTRING)

0 commit comments

Comments
 (0)