Skip to content

Commit 6d87975

Browse files
committed
0.1
Simple version
0 parents  commit 6d87975

File tree

5 files changed

+59893
-0
lines changed

5 files changed

+59893
-0
lines changed

LICENSE

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
MIT License
2+
3+
Copyright (c) 2015 unabashed
4+
5+
Permission is hereby granted, free of charge, to any person obtaining a copy
6+
of this software and associated documentation files (the "Software"), to deal
7+
in the Software without restriction, including without limitation the rights
8+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9+
copies of the Software, and to permit persons to whom the Software is
10+
furnished to do so, subject to the following conditions:
11+
12+
The above copyright notice and this permission notice shall be included in all
13+
copies or substantial portions of the Software.
14+
15+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21+
SOFTWARE.

README.md

Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,33 @@
1+
# Yoficator
2+
3+
> A Russian text yoficator (ёфикатор).
4+
5+
## What does it do?
6+
It conservatively replaces every `е` to `ё` when it's unambiguously a case of the latter. No context is used; it relies entirely on a lack of dictionary entries for a correspondent "truly `е`" homograph.
7+
8+
Yoficating Russian texts removes some unnecessary ambiguities.
9+
10+
To learn more, check Wikipedia in [English](https://en.wikipedia.org/wiki/Yoficator) or [Russian](https://ru.wikipedia.org/wiki/Ёфикатор).
11+
12+
## Usage
13+
Depends on yoficator.dic, which is used for the lookup and should remain in the same folder.
14+
15+
`yoficator.py [text-file-in-Russian | string-in-Russian]`
16+
17+
## Examples
18+
Running the command without arguments parses the test file:
19+
20+
`yoficator.py`
21+
22+
Or just use it with a file or string:
23+
```bash
24+
yoficator.py russianfile.txt # prints to STDOUT
25+
yoficator.py russianfile.txt > russianfile-yoficated.txt
26+
yoficator.py "Где ее книга?"
27+
```
28+
29+
## Limitations
30+
* The code being conservative and not looking for context, it won't correct when a "truly `е`" homograph exists. Thus a "`все`" will never be corrected, because both `все` and `всё` exist as different words.
31+
* Prone to wrongly yoficate other Cyrillic-based languages, such as Bulgarian, Ukrainian, Belarussian.
32+
* It's not the fastest thing in the world, mind you. But does the job.
33+

tests/yoficator.txt

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
(Tests the parsing)
2+
Ambiguous, stays е: все
3+
Unambiguous, changes: щетка, произнес, еще, ее
4+

0 commit comments

Comments
 (0)