You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
### New features
+ Add support of PTB POS tagset for `nsca-lca`
+ Add `--text` option for `nsca-lca`
### Improvements
+ Add formulae in README for LCA measures
### Bug fixes
+ Fix not correctly using Chinese JDK mirror
NeoSCA is a fork of [Xiaofei Lu](http://personal.psu.edu/xxl13/index.html)'s [L2 Syntactic Complexity Analyzer](http://personal.psu.edu/xxl13/downloads/l2sca.html) (L2SCA), with added support for Windows and an improved command-line interface for easier usage. NeoSCA is written by Tan, Long (谭龙)。It accepts written English texts and computes the following measures:
@@ -252,7 +252,7 @@ There are two approaches to define a structure: using `tregex_pattern` or `value
252
252
+[Powerpoint tutorial](https://nlp.stanford.edu/software/tregex/The_Wonderful_World_of_Tregex.ppt) for Tregex by Galen Andrew
`value_source` specifies an arithmetic operation on values of other structures to calculate the value of the a structure. `value_source` can include names of other structures, integers, decimals, `+`, `-`, `*`, `/`, `(` and `)`. `value_source` are tokenized using Python's standard library `tokenize`, which is specifically designed for Python source code. The name of a structure that is refered to in a `value_source` should adhere to the naming convention of Python variables (composed of *letters*, *numbers*, and *underscores*, cannot start with a *number*; *letters* refer to those characters defined in the Unicode character database as "Letter", such as English letters and Chinese characters), or otherwise the name will not be correctly recognized.
255
+
`value_source` specifies an arithmetic operation on values of other structures to calculate the value of the structure being defined. `value_source` can include names of other structures, integers, decimals, `+`, `-`, `*`, `/`, `(` and `)`. `value_source` are tokenized using Python's standard library `tokenize`, which is specifically designed for Python source code. The name of a structure that is refered to in a `value_source` should adhere to the naming convention of Python variables (composed of *letters*, *numbers*, and *underscores*, cannot start with a *number*; *letters* refer to those characters defined in the Unicode character database as "Letter", such as English letters and Chinese characters), or otherwise the name will not be correctly recognized.
256
256
257
257
The `value_source` definition can be nested, which means that dependant structures in turn can also be defined through `value_source` and rely on others, forming a tree-like relationship. But the terminal structures must be defined by `tregex_pattern` to avoid circular definition.
258
258
@@ -395,7 +395,54 @@ NeoSCA has a Tregex command line interface `nsca-tregex`, which behaves similar
395
395
396
396
#### Lexical complexity analysis
397
397
398
-
NeoSCA provides an `nsca-lca` command to do the lexical complexity analysis mirroring the functionality of [LCA (Lexical Complexity Analyzer)](https://sites.psu.edu/xxl13/lca/).
398
+
NeoSCA provides an `nsca-lca` command to do the lexical complexity analysis mirroring the functionality of [LCA (Lexical Complexity Analyzer)](https://sites.psu.edu/xxl13/lca/). Below are the available measures:
399
+
400
+
<!-- {{{ LCA measures -->
401
+
<details>
402
+
<summary>
403
+
Measures of Lexical Density and Sophistication
404
+
</summary>
405
+
406
+
|Measure|Formula|
407
+
|-|-|
408
+
|Lexical Density||
409
+
|Lexical Sophistication-I||
410
+
|Lexical Sophistication-II||
411
+
|Verb Sophistication-I||
412
+
|Corrected Verb Sophistication-I||
413
+
|Verb Sophistication-II||
414
+
415
+
</details>
416
+
417
+
<details>
418
+
<summary>
419
+
Measures of Lexical Variation
420
+
</summary>
421
+
422
+
|Measure|Formula|
423
+
|-|-|
424
+
|Number of Different Words||
425
+
|Number of Different Words (first 50 words)||
426
+
|Number of Different Words (expected random 50)||
427
+
|Number of Different Words (expected sequence 50)||
428
+
|Type-Token Ratio||
429
+
|Mean Segmental Type-Token Ratio (50)||
430
+
|Corrected Type-Token Ratio||
431
+
|Root Type-Token Ratio||
432
+
|Bilogarithmic Type-Token Ratio||
433
+
|Uber Index||
434
+
|Lexical Word Variation||
435
+
|Verb Variation-I||
436
+
|Squared Verb Variation-I||
437
+
|Corrected Verb Variation-I||
438
+
|Verb Variation-II||
439
+
|Noun Variation||
440
+
|Adjective Variation||
441
+
|Adverb Variation||
442
+
|Modifier Variation| types to the number of lexical words")|
443
+
444
+
</details>
445
+
<!-- }}} -->
399
446
400
447
```sh
401
448
nsca-lca sample.txt # single input file
@@ -414,7 +461,7 @@ BibTeX
414
461
415
462
```BibTeX
416
463
@misc{tan2022neosca,
417
-
title = {NeoSCA: A Fork of L2 Syntactic Complexity Analyzer, version 0.0.51},
464
+
title = {NeoSCA: A Fork of L2 Syntactic Complexity Analyzer, version 0.0.52},
|Lexical Density||
399
+
|Lexical Sophistication-I||
400
+
|Lexical Sophistication-II||
401
+
|Verb Sophistication-I||
402
+
|Corrected Verb Sophistication-I||
403
+
|Verb Sophistication-II||
404
+
405
+
</details>
406
+
407
+
<details>
408
+
<summary>
409
+
Measures of Lexical Variation
410
+
</summary>
411
+
412
+
|Measure|Formula|
413
+
|-|-|
414
+
|Number of Different Words||
415
+
|Number of Different Words (first 50 words)||
416
+
|Number of Different Words (expected random 50)||
417
+
|Number of Different Words (expected sequence 50)||
418
+
|Type-Token Ratio||
419
+
|Mean Segmental Type-Token Ratio (50)||
420
+
|Corrected Type-Token Ratio||
421
+
|Root Type-Token Ratio||
422
+
|Bilogarithmic Type-Token Ratio||
423
+
|Uber Index||
424
+
|Lexical Word Variation||
425
+
|Verb Variation-I||
426
+
|Squared Verb Variation-I||
427
+
|Corrected Verb Variation-I||
428
+
|Verb Variation-II||
429
+
|Noun Variation||
430
+
|Adjective Variation||
431
+
|Adverb Variation||
432
+
|Modifier Variation| types to the number of lexical words")|
433
+
434
+
</details>
435
+
<!-- }}} -->
389
436
390
437
```sh
391
438
nsca-lca sample.txt # 单篇分析
@@ -404,7 +451,7 @@ BibTeX
404
451
405
452
```BibTeX
406
453
@misc{tan2022neosca,
407
-
title = {NeoSCA: A Fork of L2 Syntactic Complexity Analyzer, version 0.0.51},
454
+
title = {NeoSCA: A Fork of L2 Syntactic Complexity Analyzer, version 0.0.52},
|Lexical Density||
399
+
|Lexical Sophistication-I||
400
+
|Lexical Sophistication-II||
401
+
|Verb Sophistication-I||
402
+
|Corrected Verb Sophistication-I||
403
+
|Verb Sophistication-II||
404
+
405
+
</details>
406
+
407
+
<details>
408
+
<summary>
409
+
Measures of Lexical Variation
410
+
</summary>
411
+
412
+
|Measure|Formula|
413
+
|-|-|
414
+
|Number of Different Words||
415
+
|Number of Different Words (first 50 words)||
416
+
|Number of Different Words (expected random 50)||
417
+
|Number of Different Words (expected sequence 50)||
418
+
|Type-Token Ratio||
419
+
|Mean Segmental Type-Token Ratio (50)||
420
+
|Corrected Type-Token Ratio||
421
+
|Root Type-Token Ratio||
422
+
|Bilogarithmic Type-Token Ratio||
423
+
|Uber Index||
424
+
|Lexical Word Variation||
425
+
|Verb Variation-I||
426
+
|Squared Verb Variation-I||
427
+
|Corrected Verb Variation-I||
428
+
|Verb Variation-II||
429
+
|Noun Variation||
430
+
|Adjective Variation||
431
+
|Adverb Variation||
432
+
|Modifier Variation| types to the number of lexical words")|
433
+
434
+
</details>
435
+
<!-- }}} -->
389
436
390
437
```sh
391
438
nsca-lca sample.txt # 單篇分析
@@ -404,7 +451,7 @@ BibTeX
404
451
405
452
```BibTeX
406
453
@misc{tan2022neosca,
407
-
title = {NeoSCA: A Fork of L2 Syntactic Complexity Analyzer, version 0.0.51},
454
+
title = {NeoSCA: A Fork of L2 Syntactic Complexity Analyzer, version 0.0.52},
0 commit comments