tanloong
diff --git a/‎CHANGELOG.md‎
Lines changed: 7 additions & 1 deletion b/‎CHANGELOG.md‎
Lines changed: 7 additions & 1 deletion
diff --git a/‎README.md‎
Lines changed: 8 additions & 8 deletions b/‎README.md‎
Lines changed: 8 additions & 8 deletions
diff --git a/‎README_zh_cn.md‎
Lines changed: 8 additions & 8 deletions b/‎README_zh_cn.md‎
Lines changed: 8 additions & 8 deletions
diff --git a/‎README_zh_tw.md‎
Lines changed: 8 additions & 8 deletions b/‎README_zh_tw.md‎
Lines changed: 8 additions & 8 deletions
diff --git a/‎neosca/about.py‎
Lines changed: 1 addition & 1 deletion b/‎neosca/about.py‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎neosca/depends_installer.py‎
Lines changed: 18 additions & 17 deletions b/‎neosca/depends_installer.py‎
Lines changed: 18 additions & 17 deletions
diff --git a/‎neosca/lca/lca.py‎
Lines changed: 6 additions & 1 deletion b/‎neosca/lca/lca.py‎
Lines changed: 6 additions & 1 deletion
diff --git a/‎neosca/lca/main.py‎
Lines changed: 2 additions & 1 deletion b/‎neosca/lca/main.py‎
Lines changed: 2 additions & 1 deletion
diff --git a/‎neosca/neosca.py‎
Lines changed: 6 additions & 5 deletions b/‎neosca/neosca.py‎
Lines changed: 6 additions & 5 deletions
@@ -1,6 +1,12 @@
 <div align="center"><h1>Changelog</h1></div>
 
-## [0.0.49](https://github.com/tanloong/neosca/releases/tag/0.0.48) (19 August 2023)
+## [0.0.50](https://github.com/tanloong/neosca/releases/tag/0.0.50) (23 August 2023)
+
+### Bug fixes
+
++ Fix `--reserve-matched` not working since 0.0.48
+
+## [0.0.49](https://github.com/tanloong/neosca/releases/tag/0.0.49) (19 August 2023)
 
 ### Bug fixes
 
 
@@ -17,7 +17,7 @@
 [繁體中文](https://github.com/tanloong/neosca/blob/master/README_zh_tw.md) |
 English
 
-NeoSCA is a fork of [Xiaofei Lu](http://personal.psu.edu/xxl13/index.html)'s [L2 Syntactic Complexity Analyzer](http://personal.psu.edu/xxl13/downloads/l2sca.html) (L2SCA), with added support for Windows and an improved command-line interface for easier usage. NeoSCA accepts written English texts and computes the following measures:
+NeoSCA is a fork of [Xiaofei Lu](http://personal.psu.edu/xxl13/index.html)'s [L2 Syntactic Complexity Analyzer](http://personal.psu.edu/xxl13/downloads/l2sca.html) (L2SCA), with added support for Windows and an improved command-line interface for easier usage. NeoSCA is written by Tan, Long (谭龙)。It accepts written English texts and computes the following measures:
 
 <details>
 
@@ -168,8 +168,8 @@ This ensures that the entire filename including the spaces, is interpreted as a
 Specify the input directory after `nsca`.
 
 ```
-nsca samples/ # analyze every txt/docx file under the "samples/" directory
-nsca samples/ --ftype txt # analyze only txt files under "samples/"
+nsca samples/              # analyze every txt/docx file under the "samples/" directory
+nsca samples/ --ftype txt  # analyze only txt files under "samples/"
 nsca samples/ --ftype docx # analyze only docx files under "samples/"
 ```
 
@@ -184,8 +184,8 @@ You can also use [wildcards](https://www.gnu.org/savannah-checkouts/gnu/clisp/im
 
 ```sh
 cd ./samples/
-nsca sample*.txt # every file whose name starts with "sample" and ends with ".txt"
-nsca sample[1-9].txt sample10.txt # sample1.txt -- sample10.txt
+nsca sample*.txt                                           # every file whose name starts with "sample" and ends with ".txt"
+nsca sample[1-9].txt sample10.txt                          # sample1.txt -- sample10.txt
 nsca sample10[1-9].txt sample1[1-9][0-9].txt sample200.txt # sample101.txt -- sample200.txt
 ```
 
@@ -414,7 +414,7 @@ BibTeX
 
 ```BibTeX
 @misc{tan2022neosca,
-title        = {NeoSCA: A Fork of L2 Syntactic Complexity Analyzer, version 0.0.49},
+title        = {NeoSCA: A Fork of L2 Syntactic Complexity Analyzer, version 0.0.50},
 author       = {Long Tan},
 howpublished = {\url{https://github.com/tanloong/neosca}},
 year         = {2022}
@@ -429,7 +429,7 @@ year         = {2022}
 APA (7th edition)
 </summary>
 
-<pre>Tan, L. (2022). <i>NeoSCA</i> (version 0.0.49) [Computer software]. Github. https://github.com/tanloong/neosca</pre>
+<pre>Tan, L. (2022). <i>NeoSCA</i> (version 0.0.50) [Computer software]. Github. https://github.com/tanloong/neosca</pre>
 
 </details>
 
@@ -439,7 +439,7 @@ APA (7th edition)
 MLA (9th edition)
 </summary>
 
-<pre>Tan, Long. <i>NeoSCA</i>. version 0.0.49, GitHub, 2022, https://github.com/tanloong/neosca.</pre>
+<pre>Tan, Long. <i>NeoSCA</i>. version 0.0.50, GitHub, 2022, https://github.com/tanloong/neosca.</pre>
 
 </details>
 
 
@@ -17,7 +17,7 @@
 [繁體中文](https://github.com/tanloong/neosca/blob/master/README_zh_tw.md) |
 [English](https://github.com/tanloong/neosca#readme)
 
-NeoSCA 是 [Xiaofei Lu](http://personal.psu.edu/xxl13/index.html) 的 [L2 Syntactic Complexity Analyzer (L2SCA)](http://personal.psu.edu/xxl13/downloads/l2sca.html) 的复刻版，添加了对 Windows 的支持和更多的命令行选项。NeoSCA 对英文语料统计以下内容：
+NeoSCA 是 [Xiaofei Lu](http://personal.psu.edu/xxl13/index.html) 的 [L2 Syntactic Complexity Analyzer (L2SCA)](http://personal.psu.edu/xxl13/downloads/l2sca.html) 的复刻版，添加了对 Windows 的支持和更多的命令行选项，作者谭龙。NeoSCA 对英文语料统计以下内容：
 
 <details>
 
@@ -161,8 +161,8 @@ nsca "./samples/sample 1.txt"
 在 `nsca` 的右边指定输入文件夹。
 
 ```
-nsca samples/ # 分析 samples/ 文件夹下所有的 txt 和 docx 文件
-nsca samples/ --ftype txt # 只分析 txt 文件
+nsca samples/              # 分析 samples/ 文件夹下所有的 txt 和 docx 文件
+nsca samples/ --ftype txt  # 只分析 txt 文件
 nsca samples/ --ftype docx # 只分析 docx 文件
 ```
 
@@ -177,8 +177,8 @@ nsca sample1.txt sample2.txt
 
 ```sh
 cd ./samples/
-nsca sample*.txt # 指定所有文件名以 “sample” 开头并且以 “.txt” 结尾的文件
-nsca sample[1-9].txt sample10.txt # sample1.txt -- sample10.txt
+nsca sample*.txt                                           # 指定所有文件名以 “sample” 开头并且以 “.txt” 结尾的文件
+nsca sample[1-9].txt sample10.txt                          # sample1.txt -- sample10.txt
 nsca sample10[1-9].txt sample1[1-9][0-9].txt sample200.txt # sample101.txt -- sample200.txt
 ```
 
@@ -404,7 +404,7 @@ BibTeX
 
 ```BibTeX
 @misc{tan2022neosca,
-title        = {NeoSCA: A Fork of L2 Syntactic Complexity Analyzer, version 0.0.49},
+title        = {NeoSCA: A Fork of L2 Syntactic Complexity Analyzer, version 0.0.50},
 author       = {Long Tan},
 howpublished = {\url{https://github.com/tanloong/neosca}},
 year         = {2022}
@@ -419,7 +419,7 @@ year         = {2022}
 APA (7th edition)
 </summary>
 
-<pre>Tan, L. (2022). <i>NeoSCA</i> (version 0.0.49) [Computer software]. Github. https://github.com/tanloong/neosca</pre>
+<pre>Tan, L. (2022). <i>NeoSCA</i> (version 0.0.50) [Computer software]. Github. https://github.com/tanloong/neosca</pre>
 
 </details>
 
@@ -429,7 +429,7 @@ APA (7th edition)
 MLA (9th edition)
 </summary>
 
-<pre>Tan, Long. <i>NeoSCA</i>. version 0.0.49, GitHub, 2022, https://github.com/tanloong/neosca.</pre>
+<pre>Tan, Long. <i>NeoSCA</i>. version 0.0.50, GitHub, 2022, https://github.com/tanloong/neosca.</pre>
 
 </details>
 
 
@@ -17,7 +17,7 @@
 [繁體中文](https://github.com/tanloong/neosca/blob/master/README_zh_tw.md) |
 [English](https://github.com/tanloong/neosca#readme)
 
-NeoSCA 是 [Xiaofei Lu](http://personal.psu.edu/xxl13/index.html) 的 [L2 Syntactic Complexity Analyzer (L2SCA)](http://personal.psu.edu/xxl13/downloads/l2sca.html) 的復刻版，添加了對 Windows 的支持和更多的命令行選項。NeoSCA 對英文語料統計以下內容：
+NeoSCA 是 [Xiaofei Lu](http://personal.psu.edu/xxl13/index.html) 的 [L2 Syntactic Complexity Analyzer (L2SCA)](http://personal.psu.edu/xxl13/downloads/l2sca.html) 的復刻版，添加了對 Windows 的支持和更多的命令行選項，作者譚龍。NeoSCA 對英文語料統計以下內容：
 
 <details>
 
@@ -161,8 +161,8 @@ nsca "./samples/sample 1.txt"
 在 `nsca` 的右邊指定輸入文件夾。
 
 ```
-nsca samples/ # 分析 samples/ 文件夾下所有的 txt 和 docx 文件
-nsca samples/ --ftype txt # 只分析 txt 文件
+nsca samples/              # 分析 samples/ 文件夾下所有的 txt 和 docx 文件
+nsca samples/ --ftype txt  # 只分析 txt 文件
 nsca samples/ --ftype docx # 只分析 docx 文件
 ```
 
@@ -177,8 +177,8 @@ nsca sample1.txt sample2.txt
 
 ```sh
 cd ./samples/
-nsca sample*.txt # 指定所有文件名以 「sample」 開頭並且以 「.txt」 結尾的文件
-nsca sample[1-9].txt sample10.txt # sample1.txt -- sample10.txt
+nsca sample*.txt                                           # 指定所有文件名以 「sample」 開頭並且以 「.txt」 結尾的文件
+nsca sample[1-9].txt sample10.txt                          # sample1.txt -- sample10.txt
 nsca sample10[1-9].txt sample1[1-9][0-9].txt sample200.txt # sample101.txt -- sample200.txt
 ```
 
@@ -404,7 +404,7 @@ BibTeX
 
 ```BibTeX
 @misc{tan2022neosca,
-title        = {NeoSCA: A Fork of L2 Syntactic Complexity Analyzer, version 0.0.49},
+title        = {NeoSCA: A Fork of L2 Syntactic Complexity Analyzer, version 0.0.50},
 author       = {Long Tan},
 howpublished = {\url{https://github.com/tanloong/neosca}},
 year         = {2022}
@@ -419,7 +419,7 @@ year         = {2022}
 APA (7th edition)
 </summary>
 
-<pre>Tan, L. (2022). <i>NeoSCA</i> (version 0.0.49) [Computer software]. Github. https://github.com/tanloong/neosca</pre>
+<pre>Tan, L. (2022). <i>NeoSCA</i> (version 0.0.50) [Computer software]. Github. https://github.com/tanloong/neosca</pre>
 
 </details>
 
@@ -429,7 +429,7 @@ APA (7th edition)
 MLA (9th edition)
 </summary>
 
-<pre>Tan, Long. <i>NeoSCA</i>. version 0.0.49, GitHub, 2022, https://github.com/tanloong/neosca.</pre>
+<pre>Tan, Long. <i>NeoSCA</i>. version 0.0.50, GitHub, 2022, https://github.com/tanloong/neosca.</pre>
 
 </details>
 
 
@@ -1,4 +1,4 @@
 #!/usr/bin/env python3
 # -*- coding=utf-8 -*-
 
-__version__ = "0.0.49"
+__version__ = "0.0.50"
@@ -4,6 +4,7 @@
 import logging
 import lzma
 import os
+import os.path as os_path
 import re
 import shutil
 import subprocess
@@ -120,7 +121,7 @@ def _get_normalized_archive_ext(self, file: str) -> str:
             raise ValueError(f"Error: {file} has unexpected extension.")
 
     def _extract_files(self, file: str, file_ending: str, destination_folder: str) -> str:
-        if not os.path.isfile(file):
+        if not os_path.isfile(file):
             raise ValueError(f"Error: {file} is not a regular file.")
 
         start_listing = set(os.listdir(destination_folder))
@@ -142,26 +143,26 @@ def _extract_files(self, file: str, file_ending: str, destination_folder: str) -
         end_listing = set(os.listdir(destination_folder))
         unzipped_directory = end_listing.difference(start_listing).pop()
 
-        return os.path.join(destination_folder, unzipped_directory)
+        return os_path.join(destination_folder, unzipped_directory)
 
     def _path_parse(self, file_path: str) -> _Path:
-        dirname = os.path.dirname(file_path)
-        base = os.path.basename(file_path)
-        name, ext = os.path.splitext(base)
+        dirname = os_path.dirname(file_path)
+        base = os_path.basename(file_path)
+        name, ext = os_path.splitext(base)
         return _Path(dir=dirname, base=base, name=name, ext=ext)
 
     def _unpack_jars(self, fs_path: str, java_bin_path: str) -> None:
-        if os.path.isdir(fs_path):
+        if os_path.isdir(fs_path):
             for f in os.listdir(fs_path):
-                current_path = os.path.join(fs_path, f)
+                current_path = os_path.join(fs_path, f)
                 self._unpack_jars(current_path, java_bin_path)
             return
-        elif os.path.isfile(fs_path):
-            file_ext = os.path.splitext(fs_path)[-1]
+        elif os_path.isfile(fs_path):
+            file_ext = os_path.splitext(fs_path)[-1]
             if file_ext.endswith("pack"):
                 p = self._path_parse(fs_path)
-                name = os.path.join(p.dir, p.name)
-                tool_path = os.path.join(java_bin_path, _UNPACK200)
+                name = os_path.join(p.dir, p.name)
+                tool_path = os_path.join(java_bin_path, _UNPACK200)
                 try:
                     subprocess.run(
                         [tool_path, _UNPACK200_ARGS, f"{name}.pack", f"{name}.jar"],
@@ -178,15 +179,15 @@ def _decompress_archive(
         self, archive_path: str, file_extension: str, target_dir: str
     ) -> str:
         logging.info(f"Decompressing {archive_path} to {target_dir}...")
-        if not os.path.isdir(target_dir):
+        if not os_path.isdir(target_dir):
             os.makedirs(target_dir)
 
-        archive_path = os.path.normpath(archive_path)
+        archive_path = os_path.normpath(archive_path)
 
-        if os.path.isfile(archive_path):
+        if os_path.isfile(archive_path):
             unzipped_directory = self._extract_files(archive_path, file_extension, target_dir)
             return unzipped_directory
-        elif os.path.isdir(archive_path):
+        elif os_path.isdir(archive_path):
             return archive_path
         else:
             raise ValueError(f"Error: {archive_path} is neither a directory not a file.")
@@ -252,7 +253,7 @@ def _download(self, download_url: str, name: str) -> str:
         else:
             filename = urllib.parse.urlparse(download_url).path.rpartition("/")[-1]
             # e.g. stanford-tregex-4.2.0.zip, stanford-parser-4.2.0.zip
-        filename = os.path.join(tempfile.gettempdir(), filename)  # type: ignore
+        filename = os_path.join(tempfile.gettempdir(), filename)  # type: ignore
         try:
             opener = urllib.request.build_opener()
             opener.addheaders = list(self.headers.items())
@@ -314,7 +315,7 @@ def install_java(
         jdk_archive = self._download(url, name=JAVA)
         jdk_ext = self._get_normalized_archive_ext(jdk_archive)
         jdk_dir = self._decompress_archive(jdk_archive, jdk_ext, target_dir)
-        jdk_bin = os.path.join(jdk_dir, "bin")
+        jdk_bin = os_path.join(jdk_dir, "bin")
         self._unpack_jars(jdk_dir, jdk_bin)
         if jdk_archive:
             os.remove(jdk_archive)
 
@@ -155,7 +155,12 @@ def run_on_ifile(
 
                 if lemma not in easy_words:
                     slex_count_map[lemma] = slex_count_map.get(lemma, 0) + 1
-            elif pos == "VERB" and lemma not in ("be", "have"):
+            # Don't have to filter auxiliary verbs, because the VERB tag covers
+            #  main verbs (content verbs) but it does not cover auxiliary verbs
+            #  and verbal copulas (in the narrow sense), for which there is the
+            #  AUX tag.
+            #  https://universaldependencies.org/u/pos/VERB.html
+            elif pos == "VERB":
                 verb_count_map[lemma] = verb_count_map.get(lemma, 0) + 1
                 lex_count_map[lemma] = lex_count_map.get(lemma, 0) + 1
 
 
@@ -76,7 +76,7 @@ def install_spacy(self) -> SCAProcedureResult:
         import subprocess
         from subprocess import CalledProcessError
 
-        command = [sys.executable, "-m", "pip", "install", "spacy"]
+        command = [sys.executable, "-m", "pip", "install", "-U", "spacy"]
         try:
             subprocess.run(command, check=True, capture_output=False)
         except CalledProcessError as e:
@@ -94,6 +94,7 @@ def check_spacy(self):
         try:
             logging.info("Trying to load spaCy...")
             import spacy  # type: ignore # noqa: F401 'en_core_web_sm' imported but unused
+            import en_core_web_sm  # type: ignore # noqa: F401 'en_core_web_sm' imported but unused
         except ModuleNotFoundError:
             is_install = get_yes_or_no(
                 "Running LCA requires spaCy. Do you want me to install it for you?"
 
@@ -1,6 +1,7 @@
 import json
 import logging
 import os
+import os.path as os_path
 import sys
 from typing import Dict, List, Optional, Set, Tuple
 
@@ -85,10 +86,10 @@ def ensure_stanford_parser_initialized(self) -> None:
 
     def already_parsed(self, ofile_parsed: str, ifile: str) -> bool:
         has_been_parsed = False
-        is_exist = os.path.exists(ofile_parsed)
+        is_exist = os_path.exists(ofile_parsed)
         if is_exist:
-            is_not_empty = os.path.getsize(ofile_parsed) > 0
-            is_parsed_newer_than_input = os.path.getmtime(ofile_parsed) > os.path.getmtime(ifile)
+            is_not_empty = os_path.getsize(ofile_parsed) > 0
+            is_parsed_newer_than_input = os_path.getmtime(ofile_parsed) > os_path.getmtime(ifile)
             if is_not_empty and is_parsed_newer_than_input:
                 has_been_parsed = True
         return has_been_parsed
@@ -125,7 +126,7 @@ def parse_ifile(self, ifile: str) -> Optional[str]:
             # assume input as parse trees
             return self.io.read_txt(ifile, is_guess_encoding=False)
 
-        ofile_parsed = os.path.splitext(ifile)[0] + ".parsed"
+        ofile_parsed = os_path.splitext(ifile)[0] + ".parsed"
         has_been_parsed = self.already_parsed(ofile_parsed=ofile_parsed, ifile=ifile)
         if has_been_parsed:
             logging.info(
@@ -142,7 +143,7 @@ def parse_ifile(self, ifile: str) -> Optional[str]:
         try:
             trees = self.parse_text(text, ofile_parsed)
         except KeyboardInterrupt:
-            if os.path.exists(ofile_parsed):
+            if os_path.exists(ofile_parsed):
                 os.remove(ofile_parsed)
             sys.exit(1)
         else: