Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
122 changes: 122 additions & 0 deletions Colab_billingual_book_maker.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,122 @@
{
"nbformat": 4,
"nbformat_minor": 0,
"metadata": {
"colab": {
"provenance": [],
"authorship_tag": "ABX9TyM7W9GpwUBwN2PfJ+WmtExZ",
"include_colab_link": true
},
"kernelspec": {
"name": "python3",
"display_name": "Python 3"
},
"language_info": {
"name": "python"
}
},
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "view-in-github",
"colab_type": "text"
},
"source": [
"<a href=\"https://colab.research.google.com/github/zitotw/bilingual_book_maker/blob/main/Colab_billingual_book_maker.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
]
},
{
"cell_type": "markdown",
"source": [
"[bilingual_book_maker on github](https://github.com/yihong0618/bilingual_book_maker.git)"
],
"metadata": {
"id": "1ePfQoAx4LK8"
}
},
{
"cell_type": "code",
"source": [
"#@title connecting google drive / 連結google drive { vertical-output: true}\n",
"from google.colab import drive\n",
"drive.mount('/content/gdrive')\n",
"\n",
"import os\n",
"directory = 'gdrive/My Drive/project'\n",
"subdirectory = 'gdrive/My Drive/project/bilingual_book_maker'\n",
"%cd\n",
"%cd ../content\n",
"!ls\n",
"\n",
"if os.path.exists(directory):\n",
" print(\"project folder already exists\")\n",
" if os.path.exists(subdirectory):\n",
" print(\"bilingual_book_maker folder already exists\")\n",
" %cd gdrive/My Drive/project/bilingual_book_maker\n",
" !ls\n",
" else:\n",
" %cd gdrive/My Drive/project/\n",
" !git clone https://github.com/yihong0618/bilingual_book_maker.git\n",
" %cd bilingual_book_maker\n",
" !ls\n",
"else:\n",
" os.makedirs(directory)\n",
" print(\"Folder created successfully\")\n",
" %cd gdrive/My Drive/project/\n",
" !git clone https://github.com/yihong0618/bilingual_book_maker.git\n",
" %cd bilingual_book_maker\n",
" !ls"
],
"metadata": {
"id": "6lliBhIVSqA0"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"#@title installing the package / 安裝 { vertical-output: true}\n",
"!pip install -r requirements.txt"
],
"metadata": {
"id": "_hG5eJ_aS5BI"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"Please upload the book you would like to translate manually to google drive 請手動上傳你想翻譯的檔案至google drive"
],
"metadata": {
"id": "Ziib9EuP3Pim"
}
},
{
"cell_type": "markdown",
"source": [
"learn more about the commands on bilingual book maker's [readme](https://github.com/yihong0618/bilingual_book_maker.git) 有更多關於指令的介紹"
],
"metadata": {
"id": "kKjw_A3i3WSx"
}
},
{
"cell_type": "code",
"source": [
"#@title Translation / 翻譯 { vertical-output: true}\n",
"#關於make_book.py的指令請見bilingual book maker的readme\n",
"#https://github.com/yihong0618/bilingual_book_maker.git\n",
"!python3 make_book.py --book_name the_little_prince.txt --openai_key $'YourAPI' --model deeplfree --prompt prompt_template_sample.json --test"
],
"metadata": {
"id": "TGVmfqdr3pdP"
},
"execution_count": null,
"outputs": []
}
]
}
12 changes: 11 additions & 1 deletion README-CN.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,6 @@
**中文 | [English](./README.md)**
[![litellm](https://img.shields.io/badge/%20%F0%9F%9A%85%20liteLLM-OpenAI%7CAzure%7CAnthropic%7CPalm%7CCohere%7CReplicate%7CHugging%20Face-blue?color=green)](https://github.com/BerriAI/litellm)

# bilingual_book_maker

bilingual_book_maker 是一个 AI 翻译工具,使用 ChatGPT 帮助用户制作多语言版本的 epub/txt/srt 文件和图书。该工具仅适用于翻译进入公共版权领域的 epub/txt 图书,不适用于有版权的书籍。请在使用之前阅读项目的 **[免责声明](./disclaimer.md)**。
Expand All @@ -19,7 +22,7 @@ bilingual_book_maker 是一个 AI 翻译工具,使用 ChatGPT 帮助用户制
- 使用 `--openai_key` 指定 OpenAI API key,如果有多个可以用英文逗号分隔(xxx,xxx,xxx),可以减少接口调用次数限制带来的错误。
或者,指定环境变量 `BBM_OPENAI_API_KEY` 来略过这个选项。
- 本地放了一个 `test_books/animal_farm.epub` 给大家测试
- 默认用了 [GPT-3.5-turbo](https://openai.com/blog/introducing-chatgpt-and-whisper-apis) 模型,也就是 ChatGPT 正在使用的模型,用 `--model gpt3` 来使用 gpt3 模型
- 默认用了 [GPT-3.5-turbo](https://openai.com/blog/introducing-chatgpt-and-whisper-apis) 模型,也就是 ChatGPT 正在使用的模型,用 `--model gpt4` 来使用 gpt4 模型,以及用 `--model gpt3` 来使用 gpt3 模型。若使用gpt4模型,用--use_context會在每次翻譯時,多翻譯一段彙整脈絡文字。
- 可以使用 DeepL 封装的 api 进行翻译,需要付费,[DeepL Translator](https://rapidapi.com/splintPRO/api/dpl-translator) 来获得 token `--model deepl --deepl_key ${deepl_key}`
- 可以使用 DeepL free `--model deeplfree`
- 可以使用 [Claude](https://console.anthropic.com/docs) 模型进行翻译 `--model claude --claude_key ${claude_key}`
Expand All @@ -45,6 +48,13 @@ bilingual_book_maker 是一个 AI 翻译工具,使用 ChatGPT 帮助用户制
你也可以用环境以下环境变量来配置 `system` 和 `user` 角色 prompt:`BBM_CHATGPTAPI_USER_MSG_TEMPLATE` 和 `BBM_CHATGPTAPI_SYS_MSG`。
该参数可以是提示模板字符串,也可以是模板 `.txt` 文件的路径。
- 使用`--batch_size` 参数,指定批量翻译的行数(默认行数为10,目前只对txt生效)
- `--accumulated_num` Wait for how many tokens have been accumulated before starting the translation. gpt3.5 limits the total_token to 4090. For example, if you use --accumulated_num 1600, maybe openai will
output 2200 tokens and maybe 200 tokens for other messages in the system messages user messages, 1600+2200+200=4000, So you are close to reaching the limit. You have to choose your own
value, there is no way to know if the limit is reached before sending
- `--use_context` 會讓GPT4模型生成摘要。如果是翻譯的最開始,它會摘要那整個段落(大小取決於 --accumulated_num)。如果是後續的段落,他會修改並延續上一個段落的摘要,創造出一段持續改變的摘要,納含整本翻譯書籍的重要資訊,增進翻譯段落間的一致性。
- `--translation_style` 範例: --translation_style "color: #808080; font-style: italic;"
- `--retranslate` "$translated_filepath" "file_name_in_epub" "start_str" "end_str"(optional)<br>
重新翻譯start_str 到 end_str標記的範圍

### 示范用例

Expand Down