Merge pull request #258 from mnishz/pattern_for_Vim_8.1

h-east · web-flow · commit 97801bea9a2f · 2018-06-25T13:45:38.000+09:00
Update pattern from Vim 8.0 to 8.1
diff --git a/doc/pattern.jax b/doc/pattern.jax
@@ -1,4 +1,4 @@
-*pattern.txt*   For Vim バージョン 8.0.  Last change: 2017 Mar 05
+*pattern.txt*   For Vim バージョン 8.1.  Last change: 2018 Mar 13
 
 
 		  VIMリファレンスマニュアル	  by Bram Moolenaar
@@ -897,7 +897,7 @@ $	パターンの末尾、または "\|"、"\)"、"\n" ('magic' on時) の前に
 	ります。Vim は自動的にマッチの強調表示を更新したりしません。
 	|/\%#| で検索した後にカーソルを動かした場合と似ています。
 
-						*/\%l* */\%>l* */\%<l*
+						*/\%l* */\%>l* */\%<l* *E951*
 \%23l	指定した行にマッチします。
 \%<23l	指定した行より上にマッチします。
 \%>23l	指定した行より下にマッチします。
@@ -1065,33 +1065,40 @@ x	特別な意味のない文字は、その文字自身とマッチします
 	ド全体がパターンとなることに注意してください。例えば、":s/[/x/" は
 	"[/x" を検索します。置換はおこなわれません。"[" を検索して "x" に置換
 	するのではありません！
+
+								*E944* *E945*
 	コレクション文字列の先頭が "^" の場合、コレクションに含まれている文字
 	以外の文字がマッチします。"[^xyz]" は 'x'、'y'、'z' 以外の文字にマッチ
 	します。
 	- 2 つの文字で '-' を挟んで、ASCII 文字の範囲を指定できます。たとえ
-	  ば、"[0-9]" はすべての数字にマッチします。非ASCII 文字も指定できます
-	  が 2 つの文字の値の差が 256 を超えてはなりません。
+	  ば、"[0-9]" はすべての数字にマッチします。例えば [c-a] のように最初
+	  の文字が後ろの文字よりも大きい場合は E944 が発生します。非ASCII 文字
+	  も指定できますが、古い正規表現エンジンでは 2 つの文字の値の差が 256
+	  を超えてはなりません。例えば re=1 をセットした後に [\u3000-\u4000]
+	  で検索すると E945 が発生します。先頭に \%#=2 を追加することでこれを
+	  回避できます。
 	- 文字クラス表現を使って、その文字クラスが含んでいる文字を取り込むこと
 	  ができます。次の文字クラスがサポートされています。
-			  名前		含んでいるもの ~
-*[:alnum:]*		  [:alnum:]     ASCII の英数字
-*[:alpha:]*		  [:alpha:]     ASCII の英字
-*[:blank:]*		  [:blank:]     スペースと Tab 文字
-*[:cntrl:]*		  [:cntrl:]     コントロール文字
-*[:digit:]*		  [:digit:]     10 進数字
-*[:graph:]*		  [:graph:]     スペース以外の印字可能文字
-*[:lower:]*		  [:lower:]     小文字英字 ('ignorecase' がオンのとき
+		  名前	      関数	含んでいるもの ~
+*[:alnum:]*	  [:alnum:]   isalnum	ASCII の英数字
+*[:alpha:]*	  [:alpha:]   isalpha	ASCII の英字
+*[:blank:]*	  [:blank:]   		スペースとタブ
+*[:cntrl:]*	  [:cntrl:]   iscntrl	ASCII コントロール文字
+*[:digit:]*	  [:digit:]   		10 進数字、'0' から '9'
+*[:graph:]*	  [:graph:]   isgraph	スペース以外の ASCII 印字可能文字
+*[:lower:]*	  [:lower:]   (1)	小文字英字 ('ignorecase' がオンのとき
 					はすべての英字)
-*[:print:]*		  [:print:]     スペースを含む印字可能文字
-*[:punct:]*		  [:punct:]     ASCII の句読点
-*[:space:]*		  [:space:]     空白文字 (スペース、Tab、改ページ文字)
-*[:upper:]*		  [:upper:]     大文字英字 ('ignorecase' がオンのとき
+*[:print:]*	  [:print:]   (2)	スペースを含む印字可能文字
+*[:punct:]*	  [:punct:]   ispunct	ASCII の句読点
+*[:space:]*	  [:space:]   		空白文字: スペース、タブ、復帰コード、
+					改行コード、垂直タブ、改ページ
+*[:upper:]*	  [:upper:]   (3)	大文字英字 ('ignorecase' がオンのとき
 					はすべての英字)
-*[:xdigit:]*		  [:xdigit:]    16 進数字
-*[:return:]*		  [:return:]	<CR> 文字
-*[:tab:]*		  [:tab:]	<Tab> 文字
-*[:escape:]*		  [:escape:]	<Esc> 文字
-*[:backspace:]*		  [:backspace:]	<BS> 文字
+*[:xdigit:]*	  [:xdigit:]  		16 進数字: 0-9, a-f, A-F
+*[:return:]*	  [:return:]  		<CR> 文字
+*[:tab:]*	  [:tab:]     		<Tab> 文字
+*[:escape:]*	  [:escape:]  		<Esc> 文字
+*[:backspace:]*	  [:backspace:]		<BS> 文字
 	  角カッコで囲んだ文字クラス表現を、コレクションの角カッコ内に書きま
 	  す。たとえば、"[-./[:alnum:]_~]\+" は、UNIX のファイル名として妥当な
 	  パターンです。このパターンは、'-'、'.'、'/'、英数字、'_'、'~'、のど
@@ -1101,6 +1108,13 @@ x	特別な意味のない文字は、その文字自身とマッチします
 	  文字にも作用します。|two-engines| を参照。将来的にはこれらの項目は、
 	  マルチバイト文字に作用するようになるでしょう。現状 "alpha" の全てを
 	  得るには [[:lower:][:upper:]] を使う事ができます。
+
+	  "関数" 列はどのライブラリ関数が使われるかを示しています。実装はシス
+	  テムに依存します。特殊なものは以下の通りです:
+	  (1) ASCII には islower()、それ以外には、|+multi_byte| 機能付きでビル
+	  ドされた場合 Vim の組み込みルールが使用されます。
+	  (2) Vim の組み込みルールが使用されます。
+	  (3) (1)と同じですが、代わりに isupper() が使用されます。
 							*/[[=* *[==]*
 	- 等価クラス。これはその文字とほぼ同じ文字にマッチします。例えば、アク
 	  セントを無視するなど。これは Unicode、latin1、latin9 でのみ機能しま
@@ -1142,7 +1156,9 @@ x	特別な意味のない文字は、その文字自身とマッチします
 	- コレクションを使ったマッチングは遅くなることがあります。コレクション
 	  の文字と、テキストの文字を、それぞれ 1 文字ずつ比較する必要があるか
 	  らです。同じ意味のアトムが他にある場合は、それを使ってください。たと
-	  えば、"\d" は "[0-9]" よりも速く、同じ文字にマッチします。
+	  えば、"\d" は "[0-9]" よりも速く、同じ文字にマッチします。ただし新し
+	  い |NFA| 正規表現エンジンにおけるこれらの取り扱いは、古いものよりも
+	  高速です。
 
 						*/\%[]* *E69* *E70* *E369*
 \%[]	任意にマッチするアトム列です。これは常にマッチします。アトム単位で最長
diff --git a/en/pattern.txt b/en/pattern.txt
@@ -1,4 +1,4 @@
-*pattern.txt*   For Vim version 8.0.  Last change: 2017 Mar 05
+*pattern.txt*   For Vim version 8.1.  Last change: 2018 Mar 13
 
 
 		  VIM REFERENCE MANUAL    by Bram Moolenaar
@@ -905,7 +905,7 @@ $	At end of pattern or in front of "\|", "\)" or "\n" ('magic' on):
 	becomes invalid.  Vim doesn't automatically update the matches.
 	Similar to moving the cursor for "\%#" |/\%#|.
 
-						*/\%l* */\%>l* */\%<l*
+						*/\%l* */\%>l* */\%<l* *E951*
 \%23l	Matches in a specific line.
 \%<23l	Matches above a specific line (lower line number).
 \%>23l	Matches below a specific line (higher line number).
@@ -1076,34 +1076,40 @@ x	A single character, with no special meaning, matches itself
 	":s/[/x/" searches for "[/x" and replaces it with nothing.  It does
 	not search for "[" and replaces it with "x"!
 
+								*E944* *E945*
 	If the sequence begins with "^", it matches any single character NOT
 	in the collection: "[^xyz]" matches anything but 'x', 'y' and 'z'.
 	- If two characters in the sequence are separated by '-', this is
 	  shorthand for the full list of ASCII characters between them.  E.g.,
-	  "[0-9]" matches any decimal digit.  Non-ASCII characters can be
-	  used, but the character values must not be more than 256 apart.
+	  "[0-9]" matches any decimal digit. If the starting character exceeds
+	  the ending character, e.g. [c-a], E944 occurs. Non-ASCII characters
+	  can be used, but the character values must not be more than 256 apart
+	  in the old regexp engine. For example, searching by [\u3000-\u4000]
+	  after setting re=1 emits a E945 error. Prepending \%#=2 will fix it.
 	- A character class expression is evaluated to the set of characters
 	  belonging to that character class.  The following character classes
 	  are supported:
-			  Name		Contents ~
-*[:alnum:]*		  [:alnum:]     ASCII letters and digits
-*[:alpha:]*		  [:alpha:]     ASCII letters
-*[:blank:]*		  [:blank:]     space and tab characters
-*[:cntrl:]*		  [:cntrl:]     control characters
-*[:digit:]*		  [:digit:]     decimal digits
-*[:graph:]*		  [:graph:]     printable characters excluding space
-*[:lower:]*		  [:lower:]     lowercase letters (all letters when
+		  Name	      Func	Contents ~
+*[:alnum:]*	  [:alnum:]   isalnum	ASCII letters and digits
+*[:alpha:]*	  [:alpha:]   isalpha  	ASCII letters
+*[:blank:]*	  [:blank:]     	space and tab
+*[:cntrl:]*	  [:cntrl:]   iscntrl 	ASCII control characters
+*[:digit:]*	  [:digit:]     	decimal digits '0' to '9'
+*[:graph:]*	  [:graph:]   isgraph	ASCII printable characters excluding
+					space
+*[:lower:]*	  [:lower:]   (1)	lowercase letters (all letters when
 					'ignorecase' is used)
-*[:print:]*		  [:print:]     printable characters including space
-*[:punct:]*		  [:punct:]     ASCII punctuation characters
-*[:space:]*		  [:space:]     whitespace characters
-*[:upper:]*		  [:upper:]     uppercase letters (all letters when
+*[:print:]*	  [:print:]   (2) 	printable characters including space
+*[:punct:]*	  [:punct:]   ispunct	ASCII punctuation characters
+*[:space:]*	  [:space:]     	whitespace characters: space, tab, CR,
+					NL, vertical tab, form feed
+*[:upper:]*	  [:upper:]   (3)	uppercase letters (all letters when
 					'ignorecase' is used)
-*[:xdigit:]*		  [:xdigit:]    hexadecimal digits
-*[:return:]*		  [:return:]	the <CR> character
-*[:tab:]*		  [:tab:]	the <Tab> character
-*[:escape:]*		  [:escape:]	the <Esc> character
-*[:backspace:]*		  [:backspace:]	the <BS> character
+*[:xdigit:]*	  [:xdigit:]    	hexadecimal digits: 0-9, a-f, A-F
+*[:return:]*	  [:return:]		the <CR> character
+*[:tab:]*	  [:tab:]		the <Tab> character
+*[:escape:]*	  [:escape:]		the <Esc> character
+*[:backspace:]*	  [:backspace:]		the <BS> character
 	  The brackets in character class expressions are additional to the
 	  brackets delimiting a collection.  For example, the following is a
 	  plausible pattern for a UNIX filename: "[-./[:alnum:]_~]\+" That is,
@@ -1114,6 +1120,13 @@ x	A single character, with no special meaning, matches itself
 	  regexp engine.  See |two-engines|.  In the future these items may
 	  work for multi-byte characters.  For now, to get all "alpha"
 	  characters you can use: [[:lower:][:upper:]].
+
+	  The "Func" column shows what library function is used.  The
+	  implementation depends on the system.  Otherwise:
+	  (1) Uses islower() for ASCII and Vim builtin rules for other
+	  characters when built with the |+multi_byte| feature.
+	  (2) Uses Vim builtin rules
+	  (3) As with (1) but using isupper()
 							*/[[=* *[==]*
 	- An equivalence class.  This means that characters are matched that
 	  have almost the same meaning, e.g., when ignoring accents.  This
@@ -1153,7 +1166,8 @@ x	A single character, with no special meaning, matches itself
 	- Matching with a collection can be slow, because each character in
 	  the text has to be compared with each character in the collection.
 	  Use one of the other atoms above when possible.  Example: "\d" is
-	  much faster than "[0-9]" and matches the same characters.
+	  much faster than "[0-9]" and matches the same characters.  However,
+	  the new |NFA| regexp engine deals with this better than the old one.
 
 						*/\%[]* *E69* *E70* *E369*
 \%[]	A sequence of optionally matched atoms.  This always matches.