Skip to content

Commit 119df5c

Browse files
authored
Merge pull request #3 from pgEdge/online_docs
Thanks for the documentation improvements!
2 parents b3483e9 + c0407bc commit 119df5c

20 files changed

+942
-1084
lines changed

README.md

Lines changed: 43 additions & 100 deletions
Original file line numberDiff line numberDiff line change
@@ -2,33 +2,36 @@
22

33
[![CI](https://github.com/pgEdge/pgedge-docloader/actions/workflows/ci.yml/badge.svg)](https://github.com/pgEdge/pgedge-docloader/actions/workflows/ci.yml)
44

5-
## Table of Contents
6-
- [Installation Guide](docs/installation.md)
7-
- [Configuration](docs/configuration.md)
8-
- [Usage Examples](docs/usage.md)
9-
- [Database Setup](docs/database-setup.md)
10-
- [Supported Formats](docs/supported-formats.md)
11-
- [Troubleshooting](docs/troubleshooting.md)
5+
- [Introduction](docs/index.md)
6+
- [Best Practices](docs/best_practices.md)
7+
- Installing pgEdge Document Loader
8+
- [Configuring the Postgres Database](docs/database-setup.md)
9+
- [Installing Document Loader](docs/installation.md)
10+
- [Document Loader Configuration](docs/configuration.md)
11+
- [pgEdge Document Loader Quickstart](docs/quickstart.md)
12+
- Using pgEdge Document Loader
13+
- [Using Document Loader](docs/usage.md)
14+
- [Using Custom Metadata Columns](docs/metadata.md)
15+
- [Updating a Document](docs/updating.md)
16+
- [Managing Authentication](docs/authentication.md)
17+
- Supported Formats
18+
- [Supported vs. Unsupported Formats](docs/unsupported-formats.md)
19+
- [HTML or HTM](docs/html.md)
20+
- [Markdown](docs/markdown.md)
21+
- [RST](docs/rst.md)
22+
- [SGML](docs/sgml.md)
23+
- [Troubleshooting](docs/troubleshooting.md)
24+
- [Licence](docs/LICENCE.md)
1225

1326
pgEdge Document Loader is a command-line tool for loading documents from various formats into PostgreSQL databases. Full documentation is available at:
14-
[https://pgedge.github.io/pgedge-docloader](https://pgedge.github.io/pgedge-docloader)
15-
16-
## Overview
1727

18-
The pgEdge Document Loader automatically converts documents (HTML, Markdown,
19-
reStructuredText, and SGML/DocBook) to Markdown format and loads them into a
20-
PostgreSQL database with extracted metadata.
28+
[https://pgedge.github.io/pgedge-docloader](https://pgedge.github.io/pgedge-docloader)
2129

22-
## Features
30+
The pgEdge Document Loader automatically converts documents (HTML, Markdown, reStructuredText, and SGML/DocBook) to Markdown format and loads them into a PostgreSQL database with extracted metadata.
2331

24-
- **Multiple Format Support**: HTML, Markdown, reStructuredText, and
25-
SGML/DocBook
32+
**Features**
2633

27-
- **HTML** (`.html`, `.htm`) - Extracts title from `<title>` tag
28-
- **Markdown** (`.md`) - Extracts title from first `#` heading
29-
- **reStructuredText** (`.rst`) - Extracts title from underlined headings
30-
- **SGML/DocBook** (`.sgml`, `.sgm`, `.xml`) - Extracts title from
31-
`<title>` or `<refentrytitle>` tags
34+
- **Multiple Format Support**: HTML, Markdown, reStructuredText, and SGML/DocBook
3235
- **Automatic Conversion**: All formats converted to Markdown
3336
- **Metadata Extraction**: Titles, filenames, timestamps
3437
- **Flexible Input**: Single file, directory, or glob patterns (including `**` recursive matching)
@@ -39,15 +42,13 @@ PostgreSQL database with extracted metadata.
3942
- **Secure**: Password from environment, .pgpass, or interactive prompt
4043
- **Configuration Files**: Reusable YAML configuration
4144

42-
## Prerequisites
45+
## Document Loader Quickstart
4346

4447
Before installing and using pgEdge Document Loader, download and install:
4548

4649
- Go 1.21 or later
4750
- PostgreSQL 12 or later
4851

49-
## Quick Start
50-
5152
Getting started with pgEdge Document Loader involves three steps:
5253

5354
1. Install the tool.
@@ -56,7 +57,7 @@ Getting started with pgEdge Document Loader involves three steps:
5657

5758
**Installing pgEdge Document Loader**
5859

59-
Use the following commands to download and build `pgedge-docloader`:
60+
Use the following commands to [download and build `pgedge-docloader`](/docs/installation.md):
6061

6162
```bash
6263
git clone https://github.com/pgedge/pgedge-docloader.git
@@ -67,7 +68,7 @@ make install
6768

6869
**Creating a Postgres Table**
6970

70-
Create a table in your Postgres database that has the [appropriate columns](https://github.com/pgEdge/pgedge-docloader/blob/main/docs/configuration.md#column-mappings) to hold the extracted documentation content:
71+
Before invoking Document Loader, you must configure a Postgres database and create a table with the [appropriate columns](/docs/database-setup.md) to hold the extracted documentation content:
7172

7273
```sql
7374
CREATE TABLE documents (
@@ -83,7 +84,9 @@ CREATE TABLE documents (
8384

8485
**Invoking pgedge-docloader**
8586

86-
When invoking `pgedge-docloader`, you can [specify preferences on the command line](#command-line-options), or with a configuration file. Use the following form on the command line:
87+
When invoking `pgedge-docloader`, you can [specify configuration preferences on the command line](/docs/configuration.md#specifying-options-on-the-command-line), or with a [configuration file](/docs/configuration.md#specifying-options-in-a-configuration-file).
88+
89+
The following command [invokes Document Loader on the command line](/docs/usage.md):
8790

8891
```bash
8992
# Load Markdown files into PostgreSQL
@@ -97,7 +100,7 @@ pgedge-docloader \
97100
--col-file-name filename
98101
```
99102

100-
To manage deployment preferences in a [configuration file](https://github.com/pgEdge/pgedge-docloader/blob/main/docs/configuration.md#configuration), save your deployment details in a file, and then include the `--config` keyword when invoking `pgedge-docloader`:
103+
To manage deployment preferences in a [configuration file](/docs/configuration.md#specifying-options-in-a-configuration-file), save your deployment details in a file, and then include the `--config` keyword when invoking `pgedge-docloader`:
101104

102105
```bash
103106
# Create config.yml
@@ -117,95 +120,35 @@ export PGPASSWORD=mypassword
117120
pgedge-docloader --config config.yml
118121
```
119122

120-
## Command-Line Options
123+
For a comprehensive Quickstart Guide, visit [here](/docs/quickstart.md).
121124

122-
When invoking `pgedge-docloader` on the command line, you can include the following options:
125+
## Developer Notes
123126

124-
```
125-
Flags:
126-
-c, --config string Path to configuration file
127-
-s, --source string Source file, directory, or glob pattern
128-
--strip-path Strip path from filename
129-
--db-host string Database host (default "localhost")
130-
--db-port int Database port (default 5432)
131-
--db-name string Database name
132-
--db-user string Database user
133-
--db-table string Database table name
134-
--col-doc-title string Column for document title
135-
--col-doc-content string Column for document content
136-
--col-source-content string Column for source content
137-
--col-file-name string Column for file name
138-
--col-file-modified string Column for file modified timestamp
139-
--col-row-created string Column for row created timestamp
140-
--col-row-updated string Column for row updated timestamp
141-
-u, --update Update existing rows or insert new ones
142-
```
127+
This project is under active development. See the documentation for the latest
128+
features and updates.
143129

144-
## Examples
130+
The pgEdge Document Loader Makefile includes clauses that run test cases or invoke the go linter. Use the following commands:
145131

146-
To load content from a specified directory (./documentation):
132+
**Running Tests**
147133

148134
```bash
149-
pgedge-docloader --source ./documentation --config config.yml
135+
make test
150136
```
151137

152-
To load content that matches a specified pattern (`"./docs/**/*.md"`):
138+
**Linting**
153139

154140
```bash
155-
pgedge-docloader --source "./docs/**/*.md" --config config.yml
141+
make lint
156142
```
157-
Include ** to enforce recursive matching across all subdirectories; for example:
158-
159-
* `docs/**/*.md` - matches all .md files in docs and all subdirectories.
160-
* `docs/*.md` - matches only .md files directly in docs (not subdirectories).
161143

162-
To insert new rows and update existing rows (keeping your table in sync with documentation updates), include the following command syntax:
163-
164-
```bash
165-
pgedge-docloader --source ./docs --config config.yml --update
166-
```
144+
Your contributions are welcome! Please feel free to submit issues and pull requests.
167145

168146
## Support
169147

170-
To review documentation or to open an issue, visit:
171-
172148
- Documentation: [https://pgedge.github.io/pgedge-docloader](https://pgedge.github.io/pgedge-docloader)
173149
- Issues: [GitHub Issues](https://github.com/pgedge/pgedge-docloader/issues)
174150

175-
## Development
176-
177-
Use the following command to create the binary:
178-
179-
```bash
180-
make build
181-
```
182-
183-
The `make` command creates the binary at `bin/pgedge-docloader`. To test locally without installing:
184-
185-
```bash
186-
./bin/pgedge-docloader --help
187-
```
188-
189-
To run project tests, use the following command:
190-
191-
```bash
192-
make test
193-
```
194-
195-
To run the `golangci-lint` linter:
196-
197-
```bash
198-
make lint
199-
```
200-
201-
## Contributing
202-
203-
Contributions are welcome! Please feel free to submit issues and pull requests.
204-
205151
## License
206152

207-
This project is licensed under the PostgreSQL License. See [LICENCE.md](LICENCE.md) for details.
208-
209-
## Project Status
210-
211-
This project is under active development. See the documentation for the latest features and updates.
153+
This project is licensed under the PostgreSQL License. See
154+
[LICENCE.md](LICENCE.md) for details.

docs/LICENCE.md

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
The PostgreSQL License
2+
3+
Portions copyright (c) 2025, pgEdge, Inc.
4+
5+
Permission to use, copy, modify, and distribute this software and its documentation for any purpose, without fee, and without a written agreement is hereby granted, provided that the above copyright notice and this paragraph and the following two paragraphs appear in all copies.
6+
7+
IN NO EVENT SHALL pgEdge, Inc. BE LIABLE TO ANY PARTY FOR DIRECT, INDIRECT, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES, INCLUDING LOST PROFITS, ARISING OUT OF THE USE OF THIS SOFTWARE AND ITS DOCUMENTATION, EVEN IF pgEdge, Inc. HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
8+
9+
pgEdge, Inc. SPECIFICALLY DISCLAIMS ANY WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE SOFTWARE PROVIDED HEREUNDER IS ON AN "AS IS" BASIS, AND pgEdge, Inc. HAS NO OBLIGATIONS TO PROVIDE MAINTENANCE, SUPPORT, UPDATES, ENHANCEMENTS, OR MODIFICATIONS.

docs/authentication.md

Lines changed: 91 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,91 @@
1+
# Password Options
2+
3+
## Specifying a Password
4+
5+
Database passwords are never stored in a configuration file. The tool obtains passwords in this order of priority:
6+
7+
1. pgEdge Document Loader first checks the `PGPASSWORD` environment variable:
8+
9+
```bash
10+
export PGPASSWORD=mypassword
11+
pgedge-docloader --config config.yml
12+
```
13+
14+
2. It then checks the [`~/.pgpass file`](https://www.postgresql.org/docs/18/libpq-pgpass.html) for an entry:
15+
16+
```
17+
localhost:5432:mydb:myuser:mypassword
18+
```
19+
20+
Your `/.pgpass` file must have proper permissions:
21+
22+
```bash
23+
chmod 600 ~/.pgpass
24+
```
25+
26+
!!! note
27+
28+
If a password is required but not provided through `PGPASSWORD` or `.pgpass`, PostgreSQL will return an authentication error with a clear message.
29+
30+
3. If Document Loader doesn't find a password in the two previous locations, it then attempts passwordless authentication. This allows PostgreSQL to use configured authentication methods such as:
31+
32+
- Trust authentication
33+
- Peer authentication
34+
- Certificate-based authentication (using `db-sslcert` and `db-sslkey`)
35+
36+
If no password is found and an alternative authentication method is not configured, the tool will prompt:
37+
38+
```bash
39+
pgedge-docloader --config config.yml
40+
Enter database password: ****
41+
```
42+
43+
### Using an Environment Variable to Specify a Password
44+
45+
```bash
46+
export PGPASSWORD=mypassword
47+
pgedge-docloader --config config.yml
48+
```
49+
50+
### Using the .pgpass File to Store a Password
51+
52+
Create `~/.pgpass`:
53+
54+
```
55+
localhost:5432:mydb:myuser:mypassword
56+
```
57+
58+
Set permissions:
59+
60+
```bash
61+
chmod 600 ~/.pgpass
62+
```
63+
64+
## Using an SSL/TLS Connection
65+
66+
Include the following options to connect using SSL/TLS with client certificates:
67+
68+
```bash
69+
pgedge-docloader \
70+
--source ./docs \
71+
--db-host secure.example.com \
72+
--db-name mydb \
73+
--db-user myuser \
74+
--db-table documents \
75+
--db-sslmode verify-full \
76+
--db-sslcert ./certs/client.pem \
77+
--db-sslkey ./certs/client-key.pem \
78+
--db-sslrootcert ./certs/ca.pem \
79+
--col-doc-content content \
80+
--col-file-name filename
81+
```
82+
83+
The supported SSL modes are:
84+
85+
- `disable` - No SSL
86+
- `allow` - Try SSL, fall back to non-SSL
87+
- `prefer` - Try SSL, fall back to non-SSL (default)
88+
- `require` - Require SSL, but don't verify certificates
89+
- `verify-ca` - Require SSL and verify CA certificate
90+
- `verify-full` - Require SSL and verify certificate and hostname
91+

docs/best_practices.md

Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
# Best Practices
2+
3+
When preparing documents for extraction, you should ensure that each document type has the expected properties.
4+
5+
**HTML Documents**
6+
7+
HTML documents should:
8+
9+
- have a `<title>` tag for proper title extraction.
10+
- use semantic HTML for efficient Markdown conversion.
11+
- avoid complex layouts that don't translate well to Markdown.
12+
13+
**Markdown Documents**
14+
15+
Markdown documents should:
16+
17+
- use a single level-1 heading (`#`) at the top of each file for title extraction.
18+
- place YAML frontmatter before the title (if using frontmatter).
19+
- follow standard Markdown syntax for best results.
20+
21+
**reStructuredText Documents**
22+
23+
reStructuredText documents should:
24+
25+
- use standard RST heading underline formats.
26+
- avoid complex directives that may not convert well.
27+
- test conversion with sample documents.
28+

0 commit comments

Comments
 (0)