Skip to content

Commit 8023ea8

Browse files
committed
Merge branch 'dev'
2 parents db533bb + 735b804 commit 8023ea8

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

61 files changed

+1315
-799
lines changed

README.md

Lines changed: 30 additions & 48 deletions
Original file line numberDiff line numberDiff line change
@@ -2,28 +2,31 @@
22

33
**wiki2book** is a tool to create good-looking eBooks from one or more Wikipedia articles.
44

5-
The goal is to create eBooks (EPUB files) as beautiful as real books from a couple of Wikipedia articles.
6-
Therefore, wiki2book is specifically implemented to create such books by implementing awareness for Wikipedia- and website-specific features (more on that below).
5+
The goal is to create eBooks (EPUB files) as beautiful as real books from a given list of Wikipedia articles.
6+
To achieve this, wiki2book contains specific treatments of Wikipedia- and website-specific content of the articles and therefore provides different results than general converters (more on this below).
77
This should make reading Wikipedia articles even more fun and may create a whole new readership for this awesome and imperceptibly large database of knowledge.
88

9+
eBook of the German article about astronomy on a Tolino eBook-reader:
910
<p align="center">
10-
<img src="photo.JPG" alt="eBook of the German article about astronomy on a Tolino eBook-reader."/>
11+
<img src="photo.JPG"/>
1112
</p>
1213

13-
### Why not simply using pandoc?
14+
### Why not simply use pandoc?
1415

1516
Good question.
1617

17-
[Pandoc](https://pandoc.org/epub.html) and others like [wb2pdf](https://mediawiki2latex.wmflabs.org/) or [percollate](https://github.com/danburzo/percollate) as well) are great and yes, they can convert mediawiki to EPUB.
18-
In fact, wiki2book relies on pandoc to turn HTML into EPUB because pandoc is well known and it's a simple program call.
18+
[Pandoc](https://pandoc.org/epub.html) and other converters, like [wb2pdf](https://mediawiki2latex.wmflabs.org/) or [percollate](https://github.com/danburzo/percollate), are great and yes, they can convert mediawiki to EPUB.
19+
In fact, wiki2book relies by default on pandoc to turn HTML into EPUB because pandoc does this quite well.
1920

20-
However, there are always things missing in these tools, for example rendering math, downloading images, evaluating templates or a proper handling of tables.
21-
They also don't do any eBook-specific assumptions, e.g. ignoring ebook-unsuitable styles or not evaluating Wikipedia-oriented templates.
21+
However, when converting mediawiki to EPUB, there are always things missing when using these tools.
22+
For example, the correct rendering math code, downloading and embedding images, evaluating templates or a proper handling of tables.
2223

23-
Most existing tools are furthermore rather general purpose, which is not beneficial for the very specific task of converting Wikipedia articles to beautiful offline eBooks.
24+
They are also rather general purpose and don't do any eBook-specific assumptions, e.g. ignoring ebook-unsuitable styles or Wikipedia-oriented templates.
2425

2526
Another feature missing in all of these tools: You cannot turn multiple articles into a ready-to-read eBook.
26-
But wiki2book has exactly this functionality called "projects" as described below.
27+
This also includes adding a title mage, table-of-content, custom styles, etc.
28+
29+
Wiki2book is a tool adressing all these issues and nice features to generate beautiful looking eBooks.
2730

2831
# Installation
2932

@@ -34,20 +37,25 @@ But wiki2book has exactly this functionality called "projects" as described belo
3437
# Usage
3538

3639
Currently only a CLI (_command line interface_) version of wiki2book exists, so nothing with a GUI.
37-
Wiki2book need a configuration file (s. the [configs](./configs) folder), currently only a German config file exists.
40+
Wiki2book uses configuration files, project files and CLI arguments to be configured.
41+
Use the `--help` flag or the [documentation](./doc/configuration.md) for further information.
3842

3943
## Preliminaries
4044

41-
You need the following tools and fonts:
45+
You need the following tools and fonts when using the default configuration and styles:
46+
47+
* ImageMagick (to have the `convert` command)
48+
* Pandoc (when using the `pandoc` output driver). See notes on pandoc versions 2 and 3 below.
49+
* DejaVu fonts in `/usr/share/fonts/TTF/DejaVuSans*.ttf` (is used by the default style in this repo but can be replaced to any other font).
50+
51+
When enabling the conversion of SVGs to PNGs or when using the math converter "internal", then wiki2book uses the tool `rsvg-convert` by default.
4252

43-
1. ImageMagick (to have the `convert` command)
44-
2. *Optional:*
45-
* Pandoc (when using the `pandoc` output driver). See notes on pandoc versions 2 and 3 below.
46-
* DejaVu fonts in `/usr/share/fonts/TTF/DejaVuSans*.ttf` (is used by the default style in this repo but can be replaced to any other font).
53+
The usage of external tools can be configured, e.g. to use explicit paths to executables or to use a custom script.
54+
See [doc/configuration](./doc/configuration.md#configure-external-tool-calls) for further details.
4755

4856
## CLI
4957

50-
The CLI contains three sub-commands that generate an EPUB file from different sources (s. below for examples and details on each sub-command):
58+
The CLI contains three sub-commands that generate an EPUB file from different sources:
5159

5260
1. Project: `wiki2book project ./path/to/project.json`
5361
2. Article: `wiki2book article "article name"`
@@ -69,41 +77,15 @@ To overcome this, pass the argument `--pandoc-data-dir ./pandoc/data` to wiki2bo
6977

7078
Alternatively install pandoc 3, which [avoids CSS3 parameters](https://github.com/jgm/pandoc/blob/3.0/data/epub.css#L166:L169).
7179

72-
### Examples
73-
74-
In the following there are working example calls to wiki2book.
75-
76-
The necessary parameters used below (see `./wiki2book -h` for more information):
77-
78-
* `-c`: Wiki2book configuration file
79-
* `-s`: Specifies an existing style sheet file
80-
81-
#### Project
82-
83-
Use the following command to build the German project about astronomy:
84-
85-
`./wiki2book project -c configs/de.json ./projects/de/astronomie/astronomie.json`
86-
87-
#### Single article
88-
89-
Render a single article by using the `article` sub-command:
90-
91-
`./wiki2book article -c configs/de.json -s projects/style.css "Erde"`
92-
93-
#### Standalone
94-
95-
Use the following command to render the file
96-
97-
`./wiki2book standalone -c configs/de.json -s projects/style.css ./integration-tests/test-real-article-Erde.mediawiki`
98-
9980
# Contribute
10081

10182
## Issues, bugs, ideas
10283

103-
Feel free to open [a new issue](https://github.com/hauke96/wiki2book/issues/new/choose).
104-
But keep in mind:
105-
This is a hobby-project and my time is limited.
106-
Things with less or no use for me personally will get a lower priority.
84+
Feel free to open [a new issue](https://github.com/hauke96/wiki2book/issues/new/choose) and filling out the issue-template.
85+
86+
Please keep in mind:
87+
1. This is a hobby-project and my time is limited.
88+
2. Things with less or no use for me personally will get a lower priority.
10789

10890
## Development
10991

configs/de.json

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -108,6 +108,5 @@
108108
"verweis",
109109
"zentriert"
110110
],
111-
"math-converter": "rsvg",
112-
"convert-pdfs-to-images": true
111+
"math-converter": "internal"
113112
}
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -152,6 +152,7 @@ div.hanging-indent {
152152
td, th {
153153
padding: 0.25rem;
154154
border: 1px solid #a2a2a2;
155+
text-align: left;
155156
}
156157

157158
th {

doc/README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,10 @@
11
This is the documentation of the codebase for *wiki2book*.
22

3-
## User documentation
3+
# User documentation
44

55
* [Configuration](configuration.md): Documentation of config file, project file and CLI arguments.
66

7-
## Technical and internal documentation
7+
# Technical and internal documentation
88

99
* [Structure & Architecture](architecture.md): Description of the overall architecture, layers and packages.
1010
* [Rendering math](rendering-math.md): Documentation on how to render math code (TeX code) using the Wikipedia API.

doc/architecture.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -14,10 +14,10 @@ This is a description of the code structure and architecture.
1414

1515
# Architecture
1616

17-
To generate an EPUB eBook the following high-level steps are executed:
17+
To generate an eBook based on a project file, the following high-level steps are executed:
1818

1919
1. Read the given project file
20-
2. For each Wikipedia article in the project, do the following:
20+
2. This might be executed in parallel, depending on the config: For each Wikipedia article in the project, do the following:
2121
1. Download the wikitext of the article.
2222
2. The wikitext is tokenized, resulting in the tokenized text and a token map.
2323
During this step, templates are evaluated and math is rendered to an SVG.

0 commit comments

Comments
 (0)