Skip to content

Commit dde965b

Browse files
committed
✨ Make text in resp-text optional (IMAP4rev2)
In RFC3501 (IMAP4rev1): resp-text = ["[" resp-text-code "]" SP] text In RFC9051 (IMAP4rev2): resp-text = ["[" resp-text-code "]" SP] [text] And in RFC9051 Appendix E: 23. resp-text ABNF non-terminal was updated to allow for empty text. In the spirit of Appendix E. 23 (and based on some actual server responses I've seen over the years), I've leniently re-interpreted this as also allowing us to drop the trailing `SP` char after `[resp-text-code parsable code data]`, like so: resp-text = "[" resp-text-code "]" [SP [text]] / [text] Actually, the original parser already _mostly_ behaved this way, because the original regexps for `T_TEXT` used `*` and not `+`. But, as I updated the parser in many other places to more closely match the RFCs, that broke this behavior. This commit originally came _after_ many many other changes. While rebasing, I moved this commit first because that simplified later commits. Also: * ♻️ Add `Patterns` module, to organize regexps. * ♻️ Use `Patterns::CharClassSubtraction` refinement to simplify exceptions. * ♻️ Add `ParserUtils::Generator#def_char_matchers` to define `SP`, `LBRA`, `RBRA`. * ♻️ Add `ParserUtils#{match,accept}_re` to replace `TEXT`, `CTEXT` lex states. * ♻️ Remove unused `lex_state` kwarg from match
1 parent bd2ddc0 commit dde965b

File tree

2 files changed

+159
-95
lines changed

2 files changed

+159
-95
lines changed

lib/net/imap/response_parser.rb

Lines changed: 86 additions & 78 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,7 @@ class IMAP < Protocol
99
# Parses an \IMAP server response.
1010
class ResponseParser
1111
include ParserUtils
12+
extend ParserUtils::Generator
1213

1314
# :call-seq: Net::IMAP::ResponseParser.new -> Net::IMAP::ResponseParser
1415
def initialize
@@ -38,9 +39,6 @@ def parse(str)
3839

3940
EXPR_BEG = :EXPR_BEG # the default, used in most places
4041
EXPR_DATA = :EXPR_DATA # envelope, body(structure), namespaces
41-
EXPR_TEXT = :EXPR_TEXT # text, after 'resp-text-code "]"'
42-
EXPR_RTEXT = :EXPR_RTEXT # resp-text, before "["
43-
EXPR_CTEXT = :EXPR_CTEXT # resp-text-code, after 'atom SP'
4442

4543
T_SPACE = :SPACE # atom special
4644
T_ATOM = :ATOM # atom (subset of astring chars)
@@ -60,6 +58,60 @@ def parse(str)
6058
T_TEXT = :TEXT # any char except CRLF
6159
T_EOF = :EOF # end of response string
6260

61+
module Patterns
62+
63+
module CharClassSubtraction
64+
refine Regexp do
65+
def -(rhs); /[#{source}&&[^#{rhs.source}]]/n.freeze end
66+
end
67+
end
68+
using CharClassSubtraction
69+
70+
# From RFC5234, "Augmented BNF for Syntax Specifications: ABNF"
71+
# >>>
72+
# ALPHA = %x41-5A / %x61-7A ; A-Z / a-z
73+
# CHAR = %x01-7F
74+
# CRLF = CR LF
75+
# ; Internet standard newline
76+
# CTL = %x00-1F / %x7F
77+
# ; controls
78+
# DIGIT = %x30-39
79+
# ; 0-9
80+
# DQUOTE = %x22
81+
# ; " (Double Quote)
82+
# HEXDIG = DIGIT / "A" / "B" / "C" / "D" / "E" / "F"
83+
# OCTET = %x00-FF
84+
# SP = %x20
85+
module RFC5234
86+
ALPHA = /[A-Za-z]/n
87+
CHAR = /[\x01-\x7f]/n
88+
CRLF = /\r\n/n
89+
CTL = /[\x00-\x1F\x7F]/n
90+
DIGIT = /\d/n
91+
DQUOTE = /"/n
92+
HEXDIG = /\h/
93+
OCTET = /[\x00-\xFF]/n # not using /./m for embedding purposes
94+
SP = / /n
95+
end
96+
97+
include RFC5234
98+
99+
# resp-specials = "]"
100+
RESP_SPECIALS = /[\]]/n
101+
102+
# TEXT-CHAR = <any CHAR except CR and LF>
103+
TEXT_CHAR = CHAR - /[\r\n]/
104+
105+
# resp-text-code = ... / atom [SP 1*<any TEXT-CHAR except "]">]
106+
CODE_TEXT_CHAR = TEXT_CHAR - RESP_SPECIALS
107+
CODE_TEXT = /#{CODE_TEXT_CHAR}+/n
108+
109+
# RFC3501:
110+
# text = 1*TEXT-CHAR
111+
TEXT_rev1 = /#{TEXT_CHAR}+/
112+
113+
end
114+
63115
# the default, used in most places
64116
BEG_REGEXP = /\G(?:\
65117
(?# 1: SPACE )( +)|\
@@ -90,20 +142,18 @@ def parse(str)
90142
(?# 7: RPAR )(\)))/ni
91143

92144
# text, after 'resp-text-code "]"'
93-
TEXT_REGEXP = /\G(?:\
94-
(?# 1: TEXT )([^\x00\r\n]*))/ni
95-
96-
# resp-text, before "["
97-
RTEXT_REGEXP = /\G(?:\
98-
(?# 1: LBRA )(\[)|\
99-
(?# 2: TEXT )([^\x00\r\n]*))/ni
145+
TEXT_REGEXP = /\G(#{Patterns::TEXT_rev1})/n
100146

101147
# resp-text-code, after 'atom SP'
102-
CTEXT_REGEXP = /\G(?:\
103-
(?# 1: TEXT )([^\x00\r\n\]]*))/ni
148+
CTEXT_REGEXP = /\G(#{Patterns::CODE_TEXT})/n
104149

105150
Token = Struct.new(:symbol, :value)
106151

152+
def_char_matchers :SP, " ", :T_SPACE
153+
154+
def_char_matchers :lbra, "[", :T_LBRA
155+
def_char_matchers :rbra, "]", :T_RBRA
156+
107157
# atom = 1*ATOM-CHAR
108158
#
109159
# TODO: match atom entirely by regexp (in the "lexer")
@@ -1143,20 +1193,27 @@ def namespace_response_extensions
11431193
# text = 1*TEXT-CHAR
11441194
# TEXT-CHAR = <any CHAR except CR and LF>
11451195
def text
1146-
match(T_TEXT, lex_state: EXPR_TEXT).value
1196+
match_re(TEXT_REGEXP, "text")[0]
11471197
end
11481198

1149-
# resp-text = ["[" resp-text-code "]" SP] text
1199+
# an "accept" versiun of #text
1200+
def text?
1201+
accept_re(TEXT_REGEXP)&.[](0)
1202+
end
1203+
1204+
# RFC3501:
1205+
# resp-text = ["[" resp-text-code "]" SP] text
1206+
# RFC9051:
1207+
# resp-text = ["[" resp-text-code "]" SP] [text]
1208+
#
1209+
# We leniently re-interpret this as
1210+
# resp-text = ["[" resp-text-code "]" [SP [text]] / [text]
11501211
def resp_text
1151-
token = match(T_LBRA, T_TEXT, lex_state: EXPR_RTEXT)
1152-
case token.symbol
1153-
when T_LBRA
1154-
code = resp_text_code
1155-
match(T_RBRA)
1156-
accept_space # violating RFC
1157-
ResponseText.new(code, text)
1158-
when T_TEXT
1159-
ResponseText.new(nil, token.value)
1212+
if lbra?
1213+
code = resp_text_code; rbra
1214+
ResponseText.new(code, SP? && text? || "")
1215+
else
1216+
ResponseText.new(nil, text? || "")
11601217
end
11611218
end
11621219

@@ -1198,15 +1255,19 @@ def resp_text_code
11981255
token = lookahead
11991256
if token.symbol == T_SPACE
12001257
shift_token
1201-
token = match(T_TEXT, lex_state: EXPR_CTEXT)
1202-
result = ResponseCode.new(name, token.value)
1258+
result = ResponseCode.new(name, text_chars_except_rbra)
12031259
else
12041260
result = ResponseCode.new(name, nil)
12051261
end
12061262
end
12071263
return result
12081264
end
12091265

1266+
# 1*<any TEXT-CHAR except "]">
1267+
def text_chars_except_rbra
1268+
match_re(CTEXT_REGEXP, '1*<any TEXT-CHAR except "]">')[0]
1269+
end
1270+
12101271
def charset_list
12111272
result = []
12121273
if accept(T_SPACE)
@@ -1447,21 +1508,6 @@ def nil_atom
14471508

14481509
SPACES_REGEXP = /\G */n
14491510

1450-
# This advances @pos directly so it's safe before changing @lex_state.
1451-
def accept_space
1452-
if @token
1453-
if @token.symbol == T_SPACE
1454-
shift_token
1455-
" "
1456-
end
1457-
elsif @str[@pos] == " "
1458-
@pos += 1
1459-
" "
1460-
end
1461-
end
1462-
1463-
alias SP? accept_space
1464-
14651511
# The RFC is very strict about this and usually we should be too.
14661512
# But skipping spaces is usually a safe workaround for buggy servers.
14671513
#
@@ -1549,44 +1595,6 @@ def next_token
15491595
@str.index(/\S*/n, @pos)
15501596
parse_error("unknown token - %s", $&.dump)
15511597
end
1552-
when EXPR_TEXT
1553-
if @str.index(TEXT_REGEXP, @pos)
1554-
@pos = $~.end(0)
1555-
if $1
1556-
return Token.new(T_TEXT, $+)
1557-
else
1558-
parse_error("[Net::IMAP BUG] TEXT_REGEXP is invalid")
1559-
end
1560-
else
1561-
@str.index(/\S*/n, @pos)
1562-
parse_error("unknown token - %s", $&.dump)
1563-
end
1564-
when EXPR_RTEXT
1565-
if @str.index(RTEXT_REGEXP, @pos)
1566-
@pos = $~.end(0)
1567-
if $1
1568-
return Token.new(T_LBRA, $+)
1569-
elsif $2
1570-
return Token.new(T_TEXT, $+)
1571-
else
1572-
parse_error("[Net::IMAP BUG] RTEXT_REGEXP is invalid")
1573-
end
1574-
else
1575-
@str.index(/\S*/n, @pos)
1576-
parse_error("unknown token - %s", $&.dump)
1577-
end
1578-
when EXPR_CTEXT
1579-
if @str.index(CTEXT_REGEXP, @pos)
1580-
@pos = $~.end(0)
1581-
if $1
1582-
return Token.new(T_TEXT, $+)
1583-
else
1584-
parse_error("[Net::IMAP BUG] CTEXT_REGEXP is invalid")
1585-
end
1586-
else
1587-
@str.index(/\S*/n, @pos) #/
1588-
parse_error("unknown token - %s", $&.dump)
1589-
end
15901598
else
15911599
parse_error("invalid @lex_state - %s", @lex_state.inspect)
15921600
end

lib/net/imap/response_parser/parser_utils.rb

Lines changed: 73 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -8,26 +8,58 @@ class ResponseParser
88
# (internal API, subject to change)
99
module ParserUtils # :nodoc:
1010

11-
private
11+
module Generator
12+
13+
LOOKAHEAD = "(@token ||= next_token)"
14+
SHIFT_TOKEN = "(@token = nil)"
15+
16+
# we can skip lexer for single character matches, as a shortcut
17+
def def_char_matchers(name, char, token)
18+
match_name = name.match(/\A[A-Z]/) ? "#{name}!" : name
19+
char = char.dump
20+
class_eval <<~RUBY, __FILE__, __LINE__ + 1
21+
# frozen_string_literal: true
1222
13-
def match(*args, lex_state: @lex_state)
14-
if @token && lex_state != @lex_state
15-
parse_error("invalid lex_state change to %s with unconsumed token",
16-
lex_state)
23+
# like accept(token_symbols); returns token or nil
24+
def #{name}?
25+
if @token&.symbol == #{token}
26+
#{SHIFT_TOKEN}
27+
#{char}
28+
elsif !@token && @str[@pos] == #{char}
29+
@pos += 1
30+
#{char}
31+
end
32+
end
33+
34+
# like match(token_symbols); returns token or raises parse_error
35+
def #{match_name}
36+
if @token&.symbol == #{token}
37+
#{SHIFT_TOKEN}
38+
#{char}
39+
elsif !@token && @str[@pos] == #{char}
40+
@pos += 1
41+
#{char}
42+
else
43+
parse_error("unexpected %s (expected %p)",
44+
@token&.symbol || @str[@pos].inspect, #{char})
45+
end
46+
end
47+
RUBY
1748
end
18-
begin
19-
@lex_state, original_lex_state = lex_state, @lex_state
20-
token = lookahead
21-
unless args.include?(token.symbol)
22-
parse_error('unexpected token %s (expected %s)',
23-
token.symbol.id2name,
24-
args.collect {|i| i.id2name}.join(" or "))
25-
end
26-
shift_token
27-
return token
28-
ensure
29-
@lex_state = original_lex_state
49+
50+
end
51+
52+
private
53+
54+
def match(*args)
55+
token = lookahead
56+
unless args.include?(token.symbol)
57+
parse_error('unexpected token %s (expected %s)',
58+
token.symbol.id2name,
59+
args.collect {|i| i.id2name}.join(" or "))
3060
end
61+
shift_token
62+
token
3163
end
3264

3365
# like match, but does not raise error on failure.
@@ -42,6 +74,14 @@ def accept(*args)
4274
end
4375
end
4476

77+
# To be used conditionally:
78+
# assert_no_lookahead if Net::IMAP.debug
79+
def assert_no_lookahead
80+
@token.nil? or
81+
parse_error("assertion failed: expected @token.nil?, actual %s: %p",
82+
@token.symbol, @token.value)
83+
end
84+
4585
# like accept, without consuming the token
4686
def lookahead?(*symbols)
4787
@token if symbols.include?((@token ||= next_token)&.symbol)
@@ -51,6 +91,22 @@ def lookahead
5191
@token ||= next_token
5292
end
5393

94+
def accept_re(re)
95+
assert_no_lookahead if Net::IMAP.debug
96+
re.match(@str, @pos) and @pos = $~.end(0)
97+
$~
98+
end
99+
100+
def match_re(re, name)
101+
assert_no_lookahead if Net::IMAP.debug
102+
if re.match(@str, @pos)
103+
@pos = $~.end(0)
104+
$~
105+
else
106+
parse_error("invalid #{name}")
107+
end
108+
end
109+
54110
def shift_token
55111
@token = nil
56112
end

0 commit comments

Comments
 (0)