neo4j-contrib
diff --git a/‎.github/workflows/docs.yml‎
Lines changed: 31 additions & 0 deletions b/‎.github/workflows/docs.yml‎
Lines changed: 31 additions & 0 deletions
diff --git a/‎.github/workflows/notify.yml‎
Lines changed: 20 additions & 0 deletions b/‎.github/workflows/notify.yml‎
Lines changed: 20 additions & 0 deletions
diff --git a/‎doc/docs.yml‎
Lines changed: 5 additions & 3 deletions b/‎doc/docs.yml‎
Lines changed: 5 additions & 3 deletions
diff --git a/‎doc/docs/antora.yml‎
Lines changed: 12 additions & 1 deletion b/‎doc/docs/antora.yml‎
Lines changed: 12 additions & 1 deletion
diff --git a/‎doc/docs/modules/ROOT/pages/architecture.adoc‎
Lines changed: 1 addition & 1 deletion b/‎doc/docs/modules/ROOT/pages/architecture.adoc‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎doc/docs/modules/ROOT/pages/architecture/optimize.adoc‎
Lines changed: 2 additions & 2 deletions b/‎doc/docs/modules/ROOT/pages/architecture/optimize.adoc‎
Lines changed: 2 additions & 2 deletions
diff --git a/‎doc/docs/modules/ROOT/pages/architecture/throughput.adoc‎
Lines changed: 7 additions & 6 deletions b/‎doc/docs/modules/ROOT/pages/architecture/throughput.adoc‎
Lines changed: 7 additions & 6 deletions
diff --git a/‎doc/docs/modules/ROOT/pages/architecture/transformations.adoc‎
Lines changed: 2 additions & 2 deletions b/‎doc/docs/modules/ROOT/pages/architecture/transformations.adoc‎
Lines changed: 2 additions & 2 deletions
diff --git a/‎doc/docs/modules/ROOT/pages/cloud.adoc‎
Lines changed: 3 additions & 3 deletions b/‎doc/docs/modules/ROOT/pages/cloud.adoc‎
Lines changed: 3 additions & 3 deletions
diff --git a/‎doc/docs/modules/ROOT/pages/consumer.adoc‎
Lines changed: 2 additions & 2 deletions b/‎doc/docs/modules/ROOT/pages/consumer.adoc‎
Lines changed: 2 additions & 2 deletions
@@ -0,0 +1,31 @@
+name: Docs
+
+on:
+  push:
+    branches:
+      - '4.0'
+      # temporary: allow to see GH Actions result on fork
+      - '4-0-gh-actions'
+  pull_request:
+    branches:
+      - '*'
+
+jobs:
+  build:
+    runs-on: ubuntu-latest
+
+    steps:
+      - uses: actions/checkout@v2
+
+      - name: Use Node.js 14
+        uses: actions/setup-node@v1
+        with:
+          node-version: '14'
+      - run: npm install
+        working-directory: 'doc'
+      - run: npm run build
+        working-directory: 'doc'
+      - run: npm run lint
+        working-directory: 'doc'
+      - run: npm run lint:links
+        working-directory: 'doc'
@@ -0,0 +1,20 @@
+name: Trigger Publish
+
+on:
+  push:
+    paths:
+      - 'doc/docs'
+    branches:
+      - '4.0'
+
+jobs:
+  trigger_publish:
+    runs-on: ubuntu-latest
+
+    steps:
+      - name: Trigger Developer Event
+        uses: peter-evans/repository-dispatch@master
+        with:
+          token: ${{ secrets.BUILD_ACCESS_TOKEN }}
+          repository: neo4j-documentation/docs-refresh
+          event-type: neo4j-streams
@@ -1,13 +1,12 @@
 site:
   title: Neo4j Streams Integrations User Guide
-  url: /neo4j-streams-docs
 content:
   sources:
     - url: ../
       branches: HEAD
       start_path: doc/docs
 output:
-  dir: ./build/site/developer
+  dir: ./build/site/labs/
 ui:
   bundle:
     #url: https://github.com/neo4j-documentation/docs-refresh/raw/master/ui/build/ui-bundle.zip
@@ -17,5 +16,8 @@ urls:
   html_extension_style: indexify
 asciidoc:
   attributes:
+    page-disabletracking: true
+    experimental: ''
+    page-cdn: /static/assets
     page-theme: labs
-    page-disabletracking: true
+    page-canonical-root: /labs
@@ -10,7 +10,18 @@ asciidoc:
     theme: docs
     version: 4.0.2
     copyright: Neo4j Inc.
-    common-license-page-uri: https://neo4j.com/docs/license/
     page-product: Neo4j Streams
     environment: streams.sink
     page-pagination: true
+    url-common-license-page: https://neo4j.com/docs/license/
+    url-confluent-blog: https://www.confluent.io/blog
+    url-confluent-download: https://www.confluent.io/dowload/
+    url-confluent-kafka: https://docs.confluent.io/current/kafka
+    url-confluent-cloud: https://docs.confluent.io/current/cloud
+    url-confluent-install: https://docs.confluent.io/current/installation
+    url-confluent-quickstart: https://docs.confluent.io/platform/current/quickstart
+    url-confluent-clients: https://docs.confluent.io/platform/current/clients
+    url-confluent-ksql: https://docs.confluent.io/platform/current/ksqldb/index.html
+    url-confluent-java-client: https://docs.confluent.io/clients-kafka-java/current/overview.html
+    url-confluent-hub-neo4j: https://www.confluent.io/hub/neo4j/kafka-connect-neo4j
+    url-confluent-hub-datagen: https://www.confluent.io/hub/confluentinc/kafka-connect-datagen
@@ -18,4 +18,4 @@ image::graph-etl.png[align="center"]
 
 Using neo4j-streams is a form of graph ETL.   And it pays to separate out these two pieces (the extraction and the transformation) and handle them separately if we want to do this in a performant and easy to maintain manner.   If we are producing records from Neo4j back out to Kafka, it's still the same challenge, just in the opposite direction.
 
-https://www.confluent.io/blog/building-real-time-streaming-etl-pipeline-20-minutes/[Streaming ETL is nothing new for Kafka] -- it is one of the platform's core use cases.   A big complicating factor for Neo4j-streams is that not many people have done it for graph before neo4j-streams.
+{url-confluent-blog}/building-real-time-streaming-etl-pipeline-20-minutes/[Streaming ETL is nothing new for Kafka] -- it is one of the platform's core use cases.   A big complicating factor for Neo4j-streams is that not many people have done it for graph before neo4j-streams.
@@ -1,10 +1,10 @@
 = Optimizing Kafka
 
-Neo4j can't ingest fast if Kafka isn't set up correctly.   While this isn't a common source of problems, it has come up.  Confluent has https://www.confluent.io/blog/optimizing-apache-kafka-deployment/[good overall documentation] on optimizing Kafka that is worth being familiar with.
+Neo4j can't ingest fast if Kafka isn't set up correctly.   While this isn't a common source of problems, it has come up.  Confluent has {url-confluent-blog}/optimizing-apache-kafka-deployment/[good overall documentation] on optimizing Kafka that is worth being familiar with.
 
 The main trade offs are these, and they have to make sense at the Kafka layer before they can make sense for Neo4j.
 
 * Do you want to optimize for high throughput, which is the rate that data is moved from producers to brokers or brokers to consumers?
-* Do you want to optimize for low latency, which is the elapsed time moving messages end-to-end (from producers to brokers to consumers)? 
+* Do you want to optimize for low latency, which is the elapsed time moving messages end-to-end (from producers to brokers to consumers)?
 * Do you want to optimize for high durability, which guarantees that messages that have been committed will not be lost?
 * Do you want to optimize for high availability, which minimizes downtime in case of unexpected failures? Kafka is a distributed system, and it is designed to tolerate failures.
@@ -34,24 +34,25 @@ https://www.oreilly.com/library/view/kafka-the-definitive/9781491936153/ch04.htm
 
 Some generalities for Neo4j:
 
-* Batch size (`neo4j.batch.size`) - the number of messages to include in a single transactional batch.  
-* Max Poll Records (`kafka.max.poll.records`) - the number of records to use per transaction in Neo4j.  There is a tradeoff between memory usage and total transactional overhead.  
+* Batch size (`neo4j.batch.size`) - the number of messages to include in a single transactional batch.
+* Max Poll Records (`kafka.max.poll.records`) - the number of records to use per transaction in Neo4j.  There is a tradeoff between memory usage and total transactional overhead.
 ** Fewer larger batches is faster to import data into Neo4j overall, but requires more memory.
-** **The smaller the payload the larger the batch in general (via unwind)**.  A default to start with is 1000 and work your way up.  If you are only creating nodes via unwind you can go much higher (20k as a start).  Then go for a lower number for the relationship merges (back to 1000-5000).  
+** **The smaller the payload the larger the batch in general (via unwind)**.  A default to start with is 1000 and work your way up.  If you are only creating nodes via unwind you can go much higher (20k as a start).  Then go for a lower number for the relationship merges (back to 1000-5000).
 Each batch represents a transaction in memory, and consider that the message size * the batch size is an important factor in determining how much heap you need for your transactions.
 * Fetch bytes (`kafka.max.partition.fetch.bytes`)  The maximum amount of data per-partition the server will return. Records are fetched in batches by the consumer. If the first record batch in the first non-empty partition of the fetch is larger than this limit, the batch will still be returned to ensure that the consumer can make progress.
 
-Every  time the kafka client calls the poll() operation, it’s limited by these factors.  The first is the maximum number of bytes you can pull, so as to constrain your memory overhead.  The second is how many records you might want in a batch.  Note that at this layer you have no idea how many bytes/record.   The default for the batch size is 1mb.   So say you have 200kb records (big json files).   If you leave batch size at 1mb default, then you’ll never have more than 5 records/tx.    The max poll records constrains the other aspect.   Finally, you may wish to read into or adjust kafka.max.poll.interval.ms to constrain the amount of time spent polling in advanced scenarios.  https://docs.confluent.io/current/clients/consumer.html=group-configuration[See this documentation] for more information on that setting.
+Every  time the kafka client calls the poll() operation, it’s limited by these factors.  The first is the maximum number of bytes you can pull, so as to constrain your memory overhead.  The second is how many records you might want in a batch.  Note that at this layer you have no idea how many bytes/record.   The default for the batch size is 1mb.   So say you have 200kb records (big json files).   If you leave batch size at 1mb default, then you’ll never have more than 5 records/tx.    The max poll records constrains the other aspect.   Finally, you may wish to read into or adjust kafka.max.poll.interval.ms to constrain the amount of time spent polling in advanced scenarios.
+{url-confluent-clients}/consumer.html#group-configuration[See this documentation] for more information on that setting.
 
 A logical setting might be to set max poll records = your desired transactional batch size, set neo4j.batch.size to the same number.   In general you can leave kafka.max.partition.fetch.bytes the same, but if you need to adjust it for memory reasons, it should be equal to max poll records * number of bytes/record on average, + 10% or so.
 
-https://docs.confluent.io/current/installation/configuration/consumer-configs.html=cp-config-consumer[Important Kafka Consumer Configuration Elements & Their Explanations]
+{url-confluent-install}/configuration/consumer-configs.html[Important Kafka Consumer Configuration Elements & Their Explanations]
 
 (Use these with neo4j-streams by prepending with "kafka." in the config)
 
 == Kafka Partitioning Strategy
 
-A big factor can be how the Kafka topic is set up.   See https://www.confluent.io/blog/how-choose-number-topics-partitions-kafka-cluster/[how to choose the number of topics/partitions in a Kafka Cluster].   
+A big factor can be how the Kafka topic is set up.   See {url-confluent-blog}/how-choose-number-topics-partitions-kafka-cluster/[how to choose the number of topics/partitions in a Kafka Cluster].
 
 image::kafka-partitions.png[align="center"]
 
 
@@ -11,15 +11,15 @@ KSQL is the best available method of transforming streams from one format to ano
 
 The downside to KSQL is that it may not work everywhere.  Because it's a Confluent Enterprise feature, you won't find it in Amazon MSK or other open source kafka installations.
 
-https://docs.confluent.io/current/ksql/docs/quickstart.html[KSQL Documentation can be found here.]
+{url-confluent-ksql}[KSQL Documentation can be found here.]
 
 == KStreams
 
 KStreams is a Java API that allows for rich transformation of streams in any way you can design.  It is more akin to the Neo4j Traversal API, in that you can do whatever you can imagine, but it requires custom code to do so.  Typically KStreams programs are small apps which might read from one topic, transform, and write to another.   And so for our purposes with graphs, KStreams serves the same architectural purpose as KSQL, it's just more powerful, and requires custom code.
 
 In contrast to KSQL which is only available for Confluent Enterprise & Confluent Cloud, the KStreams API should be available with any open source kafka.
 
-https://kafka.apache.org/documentation/=streamsapi[KStreams Documentation can be found here]
+https://kafka.apache.org/documentation/#streamsapi[KStreams Documentation can be found here]
 
 
 
 
@@ -3,8 +3,8 @@
 
 [[confluent_cloud]]
 Configuring a connection to a Confluent Cloud instance should follow
-link:https://docs.confluent.io/current/cloud/using/config-client.html#java-client[Confluent's Java Client]
-configuration advice, and the advice in <<_kafka_settings, Kafka Settings>> section.
+link:{url-confluent-java-client}[Confluent's Java Client]
+configuration advice, and the advice in xref:quickstart.adoc#kafka-settings[Kafka settings] section.
 At a minimum, to configure this, you will need:
 
 * `BOOTSTRAP_SERVER_URL`
@@ -26,4 +26,4 @@ kafka.retry.backoff.ms=500
 ----
 
 Make sure to replace `BOOTSTRAP_SERVER_URL`, `API_SECRET`, and `API_KEY` with the values that Confluent Cloud
-gives you when you generate an API access key.
+gives you when you generate an API access key.
@@ -104,7 +104,7 @@ The Neo4j Streams Plugin provides several means to handle processing errors.
 It can fail fast or log errors with different detail levels.
 Another way is to re-route all the data and errors that for something reason it wasn't able to ingest to a `Dead Letter Queue`.
 
-NOTE: It behaves by default like Kafka Connect, see this https://www.confluent.io/blog/kafka-connect-deep-dive-error-handling-dead-letter-queues[blog post^]
+NOTE: It behaves by default like Kafka Connect, see this {url-confluent-blog}/kafka-connect-deep-dive-error-handling-dead-letter-queues/[blog post^]
 
 * fail fast (abort) by default
 * need to configure dead-letter-queue topic to enable
@@ -202,4 +202,4 @@ The Neo4j Streams plugin supports 2 deserializers:
 
 You can define them independently for `Key` and `Value` as specified in the Configuration paragraph
 
-include::consumer-configuration.adoc[]
+include::consumer-configuration.adoc[]
Original file line number	Diff line number	Diff line change
`@@ -18,4 +18,4 @@ image::graph-etl.png[align="center"]`
`18`	`18`
`19`	`19`	`Using neo4j-streams is a form of graph ETL. And it pays to separate out these two pieces (the extraction and the transformation) and handle them separately if we want to do this in a performant and easy to maintain manner. If we are producing records from Neo4j back out to Kafka, it's still the same challenge, just in the opposite direction.`
`20`	`20`
`21`		`-https://www.confluent.io/blog/building-real-time-streaming-etl-pipeline-20-minutes/[Streaming ETL is nothing new for Kafka] -- it is one of the platform's core use cases. A big complicating factor for Neo4j-streams is that not many people have done it for graph before neo4j-streams.`
	`21`	`+{url-confluent-blog}/building-real-time-streaming-etl-pipeline-20-minutes/[Streaming ETL is nothing new for Kafka] -- it is one of the platform's core use cases. A big complicating factor for Neo4j-streams is that not many people have done it for graph before neo4j-streams.`