You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
-[Supported vs. Unsupported Formats](docs/unsupported-formats.md)
19
+
-[HTML or HTM](docs/html.md)
20
+
-[Markdown](docs/markdown.md)
21
+
-[RST](docs/rst.md)
22
+
-[SGML](docs/sgml.md)
23
+
-[Troubleshooting](docs/troubleshooting.md)
24
+
-[Licence](docs/LICENCE.md)
12
25
13
26
pgEdge Document Loader is a command-line tool for loading documents from various formats into PostgreSQL databases. Full documentation is available at:
The pgEdge Document Loader automatically converts documents (HTML, Markdown, reStructuredText, and SGML/DocBook) to Markdown format and loads them into a PostgreSQL database with extracted metadata.
23
31
24
-
-**Multiple Format Support**: HTML, Markdown, reStructuredText, and
25
-
SGML/DocBook
32
+
**Features**
26
33
27
-
-**HTML** (`.html`, `.htm`) - Extracts title from `<title>` tag
28
-
-**Markdown** (`.md`) - Extracts title from first `#` heading
29
-
-**reStructuredText** (`.rst`) - Extracts title from underlined headings
30
-
-**SGML/DocBook** (`.sgml`, `.sgm`, `.xml`) - Extracts title from
31
-
`<title>` or `<refentrytitle>` tags
34
+
-**Multiple Format Support**: HTML, Markdown, reStructuredText, and SGML/DocBook
32
35
-**Automatic Conversion**: All formats converted to Markdown
Create a table in your Postgres database that has the [appropriate columns](https://github.com/pgEdge/pgedge-docloader/blob/main/docs/configuration.md#column-mappings) to hold the extracted documentation content:
71
+
Before invoking Document Loader, you must configure a Postgres database and create a table with the [appropriate columns](/docs/database-setup.md) to hold the extracted documentation content:
71
72
72
73
```sql
73
74
CREATETABLEdocuments (
@@ -83,7 +84,9 @@ CREATE TABLE documents (
83
84
84
85
**Invoking pgedge-docloader**
85
86
86
-
When invoking `pgedge-docloader`, you can [specify preferences on the command line](#command-line-options), or with a configuration file. Use the following form on the command line:
87
+
When invoking `pgedge-docloader`, you can [specify configuration preferences on the command line](/docs/configuration.md#specifying-options-on-the-command-line), or with a [configuration file](/docs/configuration.md#specifying-options-in-a-configuration-file).
88
+
89
+
The following command [invokes Document Loader on the command line](/docs/usage.md):
87
90
88
91
```bash
89
92
# Load Markdown files into PostgreSQL
@@ -97,7 +100,7 @@ pgedge-docloader \
97
100
--col-file-name filename
98
101
```
99
102
100
-
To manage deployment preferences in a [configuration file](https://github.com/pgEdge/pgedge-docloader/blob/main/docs/configuration.md#configuration), save your deployment details in a file, and then include the `--config` keyword when invoking `pgedge-docloader`:
103
+
To manage deployment preferences in a [configuration file](/docs/configuration.md#specifying-options-in-a-configuration-file), save your deployment details in a file, and then include the `--config` keyword when invoking `pgedge-docloader`:
Permission to use, copy, modify, and distribute this software and its documentation for any purpose, without fee, and without a written agreement is hereby granted, provided that the above copyright notice and this paragraph and the following two paragraphs appear in all copies.
6
+
7
+
IN NO EVENT SHALL pgEdge, Inc. BE LIABLE TO ANY PARTY FOR DIRECT, INDIRECT, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES, INCLUDING LOST PROFITS, ARISING OUT OF THE USE OF THIS SOFTWARE AND ITS DOCUMENTATION, EVEN IF pgEdge, Inc. HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
8
+
9
+
pgEdge, Inc. SPECIFICALLY DISCLAIMS ANY WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE SOFTWARE PROVIDED HEREUNDER IS ON AN "AS IS" BASIS, AND pgEdge, Inc. HAS NO OBLIGATIONS TO PROVIDE MAINTENANCE, SUPPORT, UPDATES, ENHANCEMENTS, OR MODIFICATIONS.
Database passwords are never stored in a configuration file. The tool obtains passwords in this order of priority:
6
+
7
+
1. pgEdge Document Loader first checks the `PGPASSWORD` environment variable:
8
+
9
+
```bash
10
+
export PGPASSWORD=mypassword
11
+
pgedge-docloader --config config.yml
12
+
```
13
+
14
+
2. It then checks the [`~/.pgpass file`](https://www.postgresql.org/docs/18/libpq-pgpass.html) for an entry:
15
+
16
+
```
17
+
localhost:5432:mydb:myuser:mypassword
18
+
```
19
+
20
+
Your `/.pgpass` file must have proper permissions:
21
+
22
+
```bash
23
+
chmod 600 ~/.pgpass
24
+
```
25
+
26
+
!!! note
27
+
28
+
If a password is required but not provided through `PGPASSWORD` or `.pgpass`, PostgreSQL will return an authentication error with a clear message.
29
+
30
+
3. If Document Loader doesn't find a password in the two previous locations, it then attempts passwordless authentication. This allows PostgreSQL to use configured authentication methods such as:
31
+
32
+
- Trust authentication
33
+
- Peer authentication
34
+
- Certificate-based authentication (using `db-sslcert` and `db-sslkey`)
35
+
36
+
If no password is found and an alternative authentication method is not configured, the tool will prompt:
37
+
38
+
```bash
39
+
pgedge-docloader --config config.yml
40
+
Enter database password: ****
41
+
```
42
+
43
+
### Using an Environment Variable to Specify a Password
44
+
45
+
```bash
46
+
export PGPASSWORD=mypassword
47
+
pgedge-docloader --config config.yml
48
+
```
49
+
50
+
### Using the .pgpass File to Store a Password
51
+
52
+
Create `~/.pgpass`:
53
+
54
+
```
55
+
localhost:5432:mydb:myuser:mypassword
56
+
```
57
+
58
+
Set permissions:
59
+
60
+
```bash
61
+
chmod 600 ~/.pgpass
62
+
```
63
+
64
+
## Using an SSL/TLS Connection
65
+
66
+
Include the following options to connect using SSL/TLS with client certificates:
67
+
68
+
```bash
69
+
pgedge-docloader \
70
+
--source ./docs \
71
+
--db-host secure.example.com \
72
+
--db-name mydb \
73
+
--db-user myuser \
74
+
--db-table documents \
75
+
--db-sslmode verify-full \
76
+
--db-sslcert ./certs/client.pem \
77
+
--db-sslkey ./certs/client-key.pem \
78
+
--db-sslrootcert ./certs/ca.pem \
79
+
--col-doc-content content \
80
+
--col-file-name filename
81
+
```
82
+
83
+
The supported SSL modes are:
84
+
85
+
-`disable` - No SSL
86
+
-`allow` - Try SSL, fall back to non-SSL
87
+
-`prefer` - Try SSL, fall back to non-SSL (default)
88
+
-`require` - Require SSL, but don't verify certificates
89
+
-`verify-ca` - Require SSL and verify CA certificate
90
+
-`verify-full` - Require SSL and verify certificate and hostname
0 commit comments