Skip to content

Commit c02dd2e

Browse files
authored
Add GitHub Action to check docs (#425)
* Add GitHub Action to check docs * Temporary change to see GH Actions result on fork * Fix links
1 parent 9bea0ee commit c02dd2e

26 files changed

+11353
-119
lines changed

.github/workflows/docs.yml

Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,31 @@
1+
name: Docs
2+
3+
on:
4+
push:
5+
branches:
6+
- '4.0'
7+
# temporary: allow to see GH Actions result on fork
8+
- '4-0-gh-actions'
9+
pull_request:
10+
branches:
11+
- '*'
12+
13+
jobs:
14+
build:
15+
runs-on: ubuntu-latest
16+
17+
steps:
18+
- uses: actions/checkout@v2
19+
20+
- name: Use Node.js 14
21+
uses: actions/setup-node@v1
22+
with:
23+
node-version: '14'
24+
- run: npm install
25+
working-directory: 'doc'
26+
- run: npm run build
27+
working-directory: 'doc'
28+
- run: npm run lint
29+
working-directory: 'doc'
30+
- run: npm run lint:links
31+
working-directory: 'doc'

.github/workflows/notify.yml

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
name: Trigger Publish
2+
3+
on:
4+
push:
5+
paths:
6+
- 'doc/docs'
7+
branches:
8+
- '4.0'
9+
10+
jobs:
11+
trigger_publish:
12+
runs-on: ubuntu-latest
13+
14+
steps:
15+
- name: Trigger Developer Event
16+
uses: peter-evans/repository-dispatch@master
17+
with:
18+
token: ${{ secrets.BUILD_ACCESS_TOKEN }}
19+
repository: neo4j-documentation/docs-refresh
20+
event-type: neo4j-streams

doc/docs.yml

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,12 @@
11
site:
22
title: Neo4j Streams Integrations User Guide
3-
url: /neo4j-streams-docs
43
content:
54
sources:
65
- url: ../
76
branches: HEAD
87
start_path: doc/docs
98
output:
10-
dir: ./build/site/developer
9+
dir: ./build/site/labs/
1110
ui:
1211
bundle:
1312
#url: https://github.com/neo4j-documentation/docs-refresh/raw/master/ui/build/ui-bundle.zip
@@ -17,5 +16,8 @@ urls:
1716
html_extension_style: indexify
1817
asciidoc:
1918
attributes:
19+
page-disabletracking: true
20+
experimental: ''
21+
page-cdn: /static/assets
2022
page-theme: labs
21-
page-disabletracking: true
23+
page-canonical-root: /labs

doc/docs/antora.yml

Lines changed: 12 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,18 @@ asciidoc:
1010
theme: docs
1111
version: 4.0.2
1212
copyright: Neo4j Inc.
13-
common-license-page-uri: https://neo4j.com/docs/license/
1413
page-product: Neo4j Streams
1514
environment: streams.sink
1615
page-pagination: true
16+
url-common-license-page: https://neo4j.com/docs/license/
17+
url-confluent-blog: https://www.confluent.io/blog
18+
url-confluent-download: https://www.confluent.io/dowload/
19+
url-confluent-kafka: https://docs.confluent.io/current/kafka
20+
url-confluent-cloud: https://docs.confluent.io/current/cloud
21+
url-confluent-install: https://docs.confluent.io/current/installation
22+
url-confluent-quickstart: https://docs.confluent.io/platform/current/quickstart
23+
url-confluent-clients: https://docs.confluent.io/platform/current/clients
24+
url-confluent-ksql: https://docs.confluent.io/platform/current/ksqldb/index.html
25+
url-confluent-java-client: https://docs.confluent.io/clients-kafka-java/current/overview.html
26+
url-confluent-hub-neo4j: https://www.confluent.io/hub/neo4j/kafka-connect-neo4j
27+
url-confluent-hub-datagen: https://www.confluent.io/hub/confluentinc/kafka-connect-datagen

doc/docs/modules/ROOT/pages/architecture.adoc

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -18,4 +18,4 @@ image::graph-etl.png[align="center"]
1818

1919
Using neo4j-streams is a form of graph ETL. And it pays to separate out these two pieces (the extraction and the transformation) and handle them separately if we want to do this in a performant and easy to maintain manner. If we are producing records from Neo4j back out to Kafka, it's still the same challenge, just in the opposite direction.
2020

21-
https://www.confluent.io/blog/building-real-time-streaming-etl-pipeline-20-minutes/[Streaming ETL is nothing new for Kafka] -- it is one of the platform's core use cases. A big complicating factor for Neo4j-streams is that not many people have done it for graph before neo4j-streams.
21+
{url-confluent-blog}/building-real-time-streaming-etl-pipeline-20-minutes/[Streaming ETL is nothing new for Kafka] -- it is one of the platform's core use cases. A big complicating factor for Neo4j-streams is that not many people have done it for graph before neo4j-streams.
Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,10 @@
11
= Optimizing Kafka
22

3-
Neo4j can't ingest fast if Kafka isn't set up correctly. While this isn't a common source of problems, it has come up. Confluent has https://www.confluent.io/blog/optimizing-apache-kafka-deployment/[good overall documentation] on optimizing Kafka that is worth being familiar with.
3+
Neo4j can't ingest fast if Kafka isn't set up correctly. While this isn't a common source of problems, it has come up. Confluent has {url-confluent-blog}/optimizing-apache-kafka-deployment/[good overall documentation] on optimizing Kafka that is worth being familiar with.
44

55
The main trade offs are these, and they have to make sense at the Kafka layer before they can make sense for Neo4j.
66

77
* Do you want to optimize for high throughput, which is the rate that data is moved from producers to brokers or brokers to consumers?
8-
* Do you want to optimize for low latency, which is the elapsed time moving messages end-to-end (from producers to brokers to consumers)?
8+
* Do you want to optimize for low latency, which is the elapsed time moving messages end-to-end (from producers to brokers to consumers)?
99
* Do you want to optimize for high durability, which guarantees that messages that have been committed will not be lost?
1010
* Do you want to optimize for high availability, which minimizes downtime in case of unexpected failures? Kafka is a distributed system, and it is designed to tolerate failures.

doc/docs/modules/ROOT/pages/architecture/throughput.adoc

Lines changed: 7 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -34,24 +34,25 @@ https://www.oreilly.com/library/view/kafka-the-definitive/9781491936153/ch04.htm
3434

3535
Some generalities for Neo4j:
3636

37-
* Batch size (`neo4j.batch.size`) - the number of messages to include in a single transactional batch.
38-
* Max Poll Records (`kafka.max.poll.records`) - the number of records to use per transaction in Neo4j. There is a tradeoff between memory usage and total transactional overhead.
37+
* Batch size (`neo4j.batch.size`) - the number of messages to include in a single transactional batch.
38+
* Max Poll Records (`kafka.max.poll.records`) - the number of records to use per transaction in Neo4j. There is a tradeoff between memory usage and total transactional overhead.
3939
** Fewer larger batches is faster to import data into Neo4j overall, but requires more memory.
40-
** **The smaller the payload the larger the batch in general (via unwind)**. A default to start with is 1000 and work your way up. If you are only creating nodes via unwind you can go much higher (20k as a start). Then go for a lower number for the relationship merges (back to 1000-5000).
40+
** **The smaller the payload the larger the batch in general (via unwind)**. A default to start with is 1000 and work your way up. If you are only creating nodes via unwind you can go much higher (20k as a start). Then go for a lower number for the relationship merges (back to 1000-5000).
4141
Each batch represents a transaction in memory, and consider that the message size * the batch size is an important factor in determining how much heap you need for your transactions.
4242
* Fetch bytes (`kafka.max.partition.fetch.bytes`) The maximum amount of data per-partition the server will return. Records are fetched in batches by the consumer. If the first record batch in the first non-empty partition of the fetch is larger than this limit, the batch will still be returned to ensure that the consumer can make progress.
4343

44-
Every time the kafka client calls the poll() operation, it’s limited by these factors. The first is the maximum number of bytes you can pull, so as to constrain your memory overhead. The second is how many records you might want in a batch. Note that at this layer you have no idea how many bytes/record. The default for the batch size is 1mb. So say you have 200kb records (big json files). If you leave batch size at 1mb default, then you’ll never have more than 5 records/tx. The max poll records constrains the other aspect. Finally, you may wish to read into or adjust kafka.max.poll.interval.ms to constrain the amount of time spent polling in advanced scenarios. https://docs.confluent.io/current/clients/consumer.html=group-configuration[See this documentation] for more information on that setting.
44+
Every time the kafka client calls the poll() operation, it’s limited by these factors. The first is the maximum number of bytes you can pull, so as to constrain your memory overhead. The second is how many records you might want in a batch. Note that at this layer you have no idea how many bytes/record. The default for the batch size is 1mb. So say you have 200kb records (big json files). If you leave batch size at 1mb default, then you’ll never have more than 5 records/tx. The max poll records constrains the other aspect. Finally, you may wish to read into or adjust kafka.max.poll.interval.ms to constrain the amount of time spent polling in advanced scenarios.
45+
{url-confluent-clients}/consumer.html#group-configuration[See this documentation] for more information on that setting.
4546

4647
A logical setting might be to set max poll records = your desired transactional batch size, set neo4j.batch.size to the same number. In general you can leave kafka.max.partition.fetch.bytes the same, but if you need to adjust it for memory reasons, it should be equal to max poll records * number of bytes/record on average, + 10% or so.
4748

48-
https://docs.confluent.io/current/installation/configuration/consumer-configs.html=cp-config-consumer[Important Kafka Consumer Configuration Elements & Their Explanations]
49+
{url-confluent-install}/configuration/consumer-configs.html[Important Kafka Consumer Configuration Elements & Their Explanations]
4950

5051
(Use these with neo4j-streams by prepending with "kafka." in the config)
5152

5253
== Kafka Partitioning Strategy
5354

54-
A big factor can be how the Kafka topic is set up. See https://www.confluent.io/blog/how-choose-number-topics-partitions-kafka-cluster/[how to choose the number of topics/partitions in a Kafka Cluster].
55+
A big factor can be how the Kafka topic is set up. See {url-confluent-blog}/how-choose-number-topics-partitions-kafka-cluster/[how to choose the number of topics/partitions in a Kafka Cluster].
5556

5657
image::kafka-partitions.png[align="center"]
5758

doc/docs/modules/ROOT/pages/architecture/transformations.adoc

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -11,15 +11,15 @@ KSQL is the best available method of transforming streams from one format to ano
1111

1212
The downside to KSQL is that it may not work everywhere. Because it's a Confluent Enterprise feature, you won't find it in Amazon MSK or other open source kafka installations.
1313

14-
https://docs.confluent.io/current/ksql/docs/quickstart.html[KSQL Documentation can be found here.]
14+
{url-confluent-ksql}[KSQL Documentation can be found here.]
1515

1616
== KStreams
1717

1818
KStreams is a Java API that allows for rich transformation of streams in any way you can design. It is more akin to the Neo4j Traversal API, in that you can do whatever you can imagine, but it requires custom code to do so. Typically KStreams programs are small apps which might read from one topic, transform, and write to another. And so for our purposes with graphs, KStreams serves the same architectural purpose as KSQL, it's just more powerful, and requires custom code.
1919

2020
In contrast to KSQL which is only available for Confluent Enterprise & Confluent Cloud, the KStreams API should be available with any open source kafka.
2121

22-
https://kafka.apache.org/documentation/=streamsapi[KStreams Documentation can be found here]
22+
https://kafka.apache.org/documentation/#streamsapi[KStreams Documentation can be found here]
2323

2424

2525

doc/docs/modules/ROOT/pages/cloud.adoc

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -3,8 +3,8 @@
33

44
[[confluent_cloud]]
55
Configuring a connection to a Confluent Cloud instance should follow
6-
link:https://docs.confluent.io/current/cloud/using/config-client.html#java-client[Confluent's Java Client]
7-
configuration advice, and the advice in <<_kafka_settings, Kafka Settings>> section.
6+
link:{url-confluent-java-client}[Confluent's Java Client]
7+
configuration advice, and the advice in xref:quickstart.adoc#kafka-settings[Kafka settings] section.
88
At a minimum, to configure this, you will need:
99

1010
* `BOOTSTRAP_SERVER_URL`
@@ -26,4 +26,4 @@ kafka.retry.backoff.ms=500
2626
----
2727

2828
Make sure to replace `BOOTSTRAP_SERVER_URL`, `API_SECRET`, and `API_KEY` with the values that Confluent Cloud
29-
gives you when you generate an API access key.
29+
gives you when you generate an API access key.

doc/docs/modules/ROOT/pages/consumer.adoc

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -104,7 +104,7 @@ The Neo4j Streams Plugin provides several means to handle processing errors.
104104
It can fail fast or log errors with different detail levels.
105105
Another way is to re-route all the data and errors that for something reason it wasn't able to ingest to a `Dead Letter Queue`.
106106

107-
NOTE: It behaves by default like Kafka Connect, see this https://www.confluent.io/blog/kafka-connect-deep-dive-error-handling-dead-letter-queues[blog post^]
107+
NOTE: It behaves by default like Kafka Connect, see this {url-confluent-blog}/kafka-connect-deep-dive-error-handling-dead-letter-queues/[blog post^]
108108

109109
* fail fast (abort) by default
110110
* need to configure dead-letter-queue topic to enable
@@ -202,4 +202,4 @@ The Neo4j Streams plugin supports 2 deserializers:
202202

203203
You can define them independently for `Key` and `Value` as specified in the Configuration paragraph
204204

205-
include::consumer-configuration.adoc[]
205+
include::consumer-configuration.adoc[]

0 commit comments

Comments
 (0)