Skip to content

Commit edeee85

Browse files
authored
Merge branch 'main' into split_p
2 parents f43c778 + 5d2c89a commit edeee85

File tree

10 files changed

+87
-32
lines changed

10 files changed

+87
-32
lines changed

.github/workflows/make_test_ebook.yaml

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -37,6 +37,10 @@ jobs:
3737
run: |
3838
python3 make_book.py --book_name "test_books/the_little_prince.txt" --test --test_num 20 --model google
3939
40+
- name: make txt book test with batch_size
41+
run: |
42+
python3 make_book.py --book_name "test_books/the_little_prince.txt" --test --batch_size 30 --test_num 20 --model google
43+
4044
4145
- name: make openai key ebook test
4246
if: env.OPENAI_API_KEY != null

README-CN.md

Lines changed: 7 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@ bilingual_book_maker 是一个 AI 翻译工具,使用 ChatGPT 帮助用户制
1515

1616
## 使用
1717

18-
1. `pip install -r requirements.txt`
18+
1. `pip install -r requirements.txt``pip install -U bbook_maker`
1919
2. 使用 `--openai_key` 指定 OpenAI API key,如果有多个可以用英文逗号分隔(xxx,xxx,xxx),可以减少接口调用次数限制带来的错误。
2020
或者,指定环境变量 `BMM_OPENAI_API_KEY` 来略过这个选项。
2121
3. 本地放了一个 `test_books/animal_farm.epub` 给大家测试
@@ -42,9 +42,11 @@ bilingual_book_maker 是一个 AI 翻译工具,使用 ChatGPT 帮助用户制
4242
16. 翻译完会生成一本 ${book_name}_bilingual.epub 的双语书
4343
17. 如果出现了错误或使用 `CTRL+C` 中断命令,不想接下来继续翻译了,会生成一本 ${book_name}_bilingual_temp.epub 的书,直接改成你想要的名字就可以了
4444
18. 如果你想要翻译电子书中的无标签字符串,可以使用 `--allow_navigable_strings` 参数,会将可遍历字符串加入翻译队列,**注意,在条件允许情况下,请寻找更规范的电子书**
45-
45+
19. 使用`--batch_size` 参数,指定批量翻译的行数(默认行数为10,目前只对txt生效)
4646
### 示范用例
4747

48+
**如果使用 `pip install bbook_maker` 以下命令都可以改成 `bbook args`**
49+
4850
```shell
4951
# 如果你想快速测一下
5052
python3 make_book.py --book_name test_books/animal_farm.epub --openai_key ${openai_key} --test
@@ -70,6 +72,9 @@ python3 make_book.py --book_from kobo --device_path /tmp/kobo
7072

7173
# 翻译 txt 文件
7274
python3 make_book.py --book_name test_books/the_little_prince.txt --test
75+
# 聚合多行翻译 txt 文件
76+
python3 make_book.py --book_name test_books/the_little_prince.txt --test --batch_size 20
77+
7378
```
7479

7580
更加小白的示例

README.md

Lines changed: 6 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,3 @@
1-
This forked added Google Translate support, only supported translate to `zh-CN`.
2-
Usage: make sure to add `--model google` in the command.
3-
4-
51
**[中文](./README-CN.md) | English**
62

73
# bilingual_book_maker
@@ -19,7 +15,7 @@ The bilingual_book_maker is an AI translation tool that uses ChatGPT to assist u
1915

2016
## Use
2117

22-
1. `pip install -r requirements.txt`
18+
1. `pip install -r requirements.txt` or `pip install -U bbook_maker`(you can use)
2319
2. Use `--openai_key` option to specify OpenAI API key. If you have multiple keys, separate them by commas (xxx,xxx,xxx) to reduce errors caused by API call limits.
2420
Or, just set environment variable `BMM_OPENAI_API_KEY` instead.
2521
3. A sample book, `test_books/animal_farm.epub`, is provided for testing purposes.
@@ -45,9 +41,12 @@ The bilingual_book_maker is an AI translation tool that uses ChatGPT to assist u
4541
16. Once the translation is complete, a bilingual book named `${book_name}_bilingual.epub` would be generated.
4642
17. If there are any errors or you wish to interrupt the translation by pressing `CTRL+C`. A book named `${book_name}_bilingual_temp.epub` would be generated. You can simply rename it to any desired name.
4743
18. If you want to translate strings in an e-book that aren't labeled with any tags, you can use the `--allow_navigable_strings` parameter. This will add the strings to the translation queue. **Note that it's best to look for e-books that are more standardized if possible.**
44+
19. Use the `--batch_size` parameter to specify the number of lines for batch translation (default is 10, currently only effective for txt files).
4845

4946
### Examples
5047

48+
**Note if use `pip install bbook_maker` all commands can change to `bbook args`**
49+
5150
```shell
5251
# Test quickly
5352
python3 make_book.py --book_name test_books/animal_farm.epub --openai_key ${openai_key} --test --language zh-hans
@@ -76,6 +75,8 @@ python3 make_book.py --book_from kobo --device_path /tmp/kobo
7675

7776
# translate txt file
7877
python3 make_book.py --book_name test_books/the_little_prince.txt --test --language zh-hans
78+
# aggregated translation txt file
79+
python3 make_book.py --book_name test_books/the_little_prince.txt --test --batch_size 20
7980
```
8081

8182
More understandable example

book_maker/cli.py

Lines changed: 10 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,12 @@
11
import argparse
2+
import json
23
import os
34
from os import environ as env
4-
import json
55

6+
import book_maker.obok as obok
67
from book_maker.loader import BOOK_LOADER_DICT
78
from book_maker.translator import MODEL_DICT
89
from book_maker.utils import LANGUAGES, TO_LANGUAGE_CODE
9-
import book_maker.obok as obok
1010

1111

1212
def parse_prompt_arg(prompt_arg):
@@ -156,6 +156,13 @@ def main():
156156
metavar="PROMPT_ARG",
157157
help="used for customizing the prompt. It can be the prompt template string, or a path to the template file. The valid placeholders are `{text}` and `{language}`.",
158158
)
159+
parser.add_argument(
160+
"--batch_size",
161+
dest="batch_size",
162+
type=int,
163+
default=10,
164+
help="how many lines will be translated by aggregated translation(This options currently only applies to txt files)",
165+
)
159166

160167
options = parser.parse_args()
161168
PROXY = options.proxy
@@ -219,6 +226,7 @@ def main():
219226
translate_tags=options.translate_tags,
220227
allow_navigable_strings=options.allow_navigable_strings,
221228
prompt_config=parse_prompt_arg(options.prompt_arg),
229+
batch_size=options.batch_size,
222230
)
223231
e.make_bilingual_book()
224232

book_maker/loader/__init__.py

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,4 @@
11
from book_maker.loader.epub_loader import EPUBBookLoader
2-
32
from book_maker.loader.txt_loader import TXTBookLoader
43

54
BOOK_LOADER_DICT = {

book_maker/loader/epub_loader.py

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,9 +9,10 @@
99
from rich import print
1010
from tqdm import tqdm
1111

12-
from .base_loader import BaseBookLoader
1312
from book_maker.utils import prompt_config_to_kwargs
1413

14+
from .base_loader import BaseBookLoader
15+
1516

1617
class EPUBBookLoader(BaseBookLoader):
1718
def __init__(
@@ -21,6 +22,7 @@ def __init__(
2122
key,
2223
resume,
2324
language,
25+
batch_size,
2426
model_api_base=None,
2527
is_test=False,
2628
test_num=5,

book_maker/loader/txt_loader.py

Lines changed: 22 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,10 @@
11
import sys
22
from pathlib import Path
33

4-
from .base_loader import BaseBookLoader
54
from book_maker.utils import prompt_config_to_kwargs
65

6+
from .base_loader import BaseBookLoader
7+
78

89
class TXTBookLoader(BaseBookLoader):
910
def __init__(
@@ -13,6 +14,7 @@ def __init__(
1314
key,
1415
resume,
1516
language,
17+
batch_size,
1618
translate_tags,
1719
allow_navigable_strings,
1820
model_api_base=None,
@@ -32,6 +34,7 @@ def __init__(
3234
self.bilingual_result = []
3335
self.bilingual_temp_result = []
3436
self.test_num = test_num
37+
self.batch_size = batch_size
3538

3639
try:
3740
with open(f"{txt_name}", "r", encoding="utf-8") as f:
@@ -57,17 +60,22 @@ def make_bilingual_book(self):
5760
p_to_save_len = len(self.p_to_save)
5861

5962
try:
60-
for i in self.origin_book:
61-
if self._is_special_text(i):
63+
sliced_list = [
64+
self.origin_book[i : i + self.batch_size]
65+
for i in range(0, len(self.origin_book), self.batch_size)
66+
]
67+
for i in sliced_list:
68+
batch_text = "".join(i)
69+
if self._is_special_text(batch_text):
6270
continue
6371
if self.resume and index < p_to_save_len:
6472
pass
6573
else:
66-
temp = self.translate_model.translate(i)
74+
temp = self.translate_model.translate(batch_text)
6775
self.p_to_save.append(temp)
68-
self.bilingual_result.append(i)
76+
self.bilingual_result.append(batch_text)
6977
self.bilingual_result.append(temp)
70-
index += 1
78+
index += self.batch_size
7179
if self.is_test and index > self.test_num:
7280
break
7381

@@ -85,8 +93,14 @@ def make_bilingual_book(self):
8593

8694
def _save_temp_book(self):
8795
index = 0
88-
for i in range(0, len(self.origin_book)):
89-
self.bilingual_temp_result.append(self.origin_book[i])
96+
sliced_list = [
97+
self.origin_book[i : i + self.batch_size]
98+
for i in range(0, len(self.origin_book), self.batch_size)
99+
]
100+
101+
for i in range(0, len(sliced_list)):
102+
batch_text = "".join(sliced_list[i])
103+
self.bilingual_temp_result.append(batch_text)
90104
if self._is_special_text(self.origin_book[i]):
91105
continue
92106
if index < len(self.p_to_save):

book_maker/obok.py

Lines changed: 11 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -164,19 +164,19 @@
164164
__version__ = "4.0.0"
165165
__about__ = "Obok v{0}\nCopyright © 2012-2020 Physisticated et al.".format(__version__)
166166

167-
import sys
168-
import os
169-
import subprocess
170-
import sqlite3
171167
import base64
172168
import binascii
173-
import re
174-
import zipfile
175169
import hashlib
176-
import xml.etree.ElementTree as ET
177-
import string
170+
import os
171+
import re
178172
import shutil
173+
import sqlite3
174+
import string
175+
import subprocess
176+
import sys
179177
import tempfile
178+
import xml.etree.ElementTree as ET
179+
import zipfile
180180

181181
can_parse_xml = True
182182
try:
@@ -199,14 +199,14 @@ def _load_crypto_libcrypto():
199199
from ctypes import (
200200
CDLL,
201201
POINTER,
202-
c_void_p,
202+
Structure,
203203
c_char_p,
204204
c_int,
205205
c_long,
206-
Structure,
207206
c_ulong,
208-
create_string_buffer,
207+
c_void_p,
209208
cast,
209+
create_string_buffer,
210210
)
211211
from ctypes.util import find_library
212212

book_maker/translator/chatgptapi_translator.py

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,11 @@
11
import time
2+
from os import environ
23

34
import openai
4-
from os import environ
55

66
from .base_translator import Base
77
from ..utils import num_tokens_from_messages
88

9-
109
PROMPT_ENV_MAP = {
1110
"user": "BBM_CHATGPTAPI_USER_MSG_TEMPLATE",
1211
"system": "BBM_CHATGPTAPI_SYS_MSG",

setup.py

Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,23 @@
1+
#!/usr/bin/env python3
2+
from setuptools import find_packages, setup
3+
4+
setup(
5+
name="bbook_maker",
6+
description="The bilingual_book_maker is an AI translation tool that uses ChatGPT to assist users in creating multi-language versions of epub/txt files and books.",
7+
version="0.1.0",
8+
license="MIT",
9+
author="yihong0618",
10+
author_email="zouzou0208@gmail.com",
11+
packages=find_packages(),
12+
url="https://github.com/yihong0618/bilingual_book_maker",
13+
python_requires=">=3.7",
14+
install_requires=["bs4", "openai", "requests", "ebooklib", "rich", "tqdm"],
15+
classifiers=[
16+
"Programming Language :: Python :: 3",
17+
"License :: OSI Approved :: MIT License",
18+
"Operating System :: OS Independent",
19+
],
20+
entry_points={
21+
"console_scripts": ["bbook_maker = book_maker.cli:main"],
22+
},
23+
)

0 commit comments

Comments
 (0)