neo4j-contrib
diff --git a/‎doc/asciidoc/consumer/index.adoc‎
Lines changed: 3 additions & 2 deletions b/‎doc/asciidoc/consumer/index.adoc‎
Lines changed: 3 additions & 2 deletions
diff --git a/‎doc/asciidoc/developing/index.adoc‎
Lines changed: 28 additions & 0 deletions b/‎doc/asciidoc/developing/index.adoc‎
Lines changed: 28 additions & 0 deletions
diff --git a/‎doc/asciidoc/index.adoc‎
Lines changed: 68 additions & 29 deletions b/‎doc/asciidoc/index.adoc‎
Lines changed: 68 additions & 29 deletions
diff --git a/‎doc/asciidoc/introduction/index.adoc‎
Lines changed: 164 additions & 0 deletions b/‎doc/asciidoc/introduction/index.adoc‎
Lines changed: 164 additions & 0 deletions
@@ -1,10 +1,11 @@
 [[consumer]]
-== Neo4j Streams Consumer
+== Consumer: Kafka -> Neo4j
 
 ifdef::env-docs[]
 [abstract]
 --
-This chapter describes the Neo4j Streams Consumer in the Neo4j Streams Library.
+This chapter describes the Neo4j Streams Consumer in the Neo4j Streams Library. Use this section
+to configure Neo4j to consume data from Kafka and create new nodes and relationships.
 --
 endif::env-docs[]
 
 
@@ -0,0 +1,28 @@
+[[developing]]
+== Developing Neo4j Streams
+
+ifdef::env-docs[]
+[abstract]
+--
+This chapter describes setting up Neo4j Streams for local development.
+--
+endif::env-docs[]
+
+[[build-locally]]
+=== Build locally
+
+----
+mvn clean install
+----
+
+1. Copy `<project_dir>/target/neo4j-streams-<VERSION>.jar` into `$NEO4J_HOME/plugins`
+2. Restart Neo4j
+
+[[gendocs]]
+=== Generating this Documentation
+
+1. `cd docs && ./gradlew clean packageHTML`
+2. `cd build/html && python3 -m http.server`
+3. Browse to http://localhost:8000/
+
+
@@ -1,7 +1,7 @@
 = Neo4j Streaming Data Integrations User Guide v{docs-version}
 :toc: left
 :experimental:
-:toclevels: 2
+:toclevels: 3
 :sectid:
 :sectlinks:
 :img: https://github.com/neo4j-contrib/neo4j-streams/raw/gh-pages/3.4/images
@@ -11,73 +11,108 @@ ifdef::backend-html5[(C) {copyright}]
 
 License: link:{common-license-page-uri}[Creative Commons 4.0]
 
-
 [abstract]
 --
 This is the user guide for Neo4j Streams version {docs-version}, authored by the Neo4j Labs Team.
 --
 
 The guide covers the following areas:
 
+* <<quickstart>> -- Get Started Fast with the most Common Scenarios
 * <<introduction>> -- An introduction to Neo4j Streams
 * <<producer>> -- Sends transaction event handler events to a Kafka topic
 * <<consumer>> -- Ingests events from a Kafka topic into Neo4j
 * <<procedures>> -- Procedures for consuming and producing Kafka events
-* <<docker>> -- Docker Compose files for local testing
+* <<docker>> -- Docker Compose files for local testing; example configurations
 * <<kafka-connect>> -- Kafka Connect Sink plugin
+* <<cluster>> -- Using with Neo4j Causal Cluster
+* <<developing>> -- Developing Neo4j Streams
 
-
-[[introduction]]
-== Introduction
+[[quickstart]]
+== Quick Start
 
 ifdef::env-docs[]
 [abstract]
 --
-This chapter provides an introduction to the Neo4j Streams Library, and instructions for installation.
+Get started fast for common scenarios, using neo4j-streams as a plugin.
 --
 endif::env-docs[]
 
+=== Install the Plugin
 
-Many user and customers want to integrate Kafka and other streaming solutions with Neo4j.
-Either to ingest data into the graph from other sources.
-Or to send update events (change data capture - CDC) to the event log for later consumption.
+* Download the latest release jar from https://github.com/neo4j-contrib/neo4j-streams/releases/latest
+* Copy it into `$NEO4J_HOME/plugins` and configure the relevant connections
 
-This extension was developed to satisfy all these use-cases and more to come.
+=== Configure Kafka Connection 
 
-The project is composed of several parts:
+If you are running locally or against a standalone machine, configure `neo4j.conf` to point to that server:
 
-* Neo4j Streams Procedure: a procedure to send a payload to a topic
-* Neo4j Streams Producer: a transaction event handler events that sends data to a Kafka topic
-* Neo4j Streams Consumer: a Neo4j application that ingest data from Kafka topics into Neo4j via templated Cypher Statements
-* Kafka-Connect Plugin: a plugin for the Confluent Platform that allows to ingest data into Neo4j, from Kafka topics, via Cypher queries.
+.neo4j.conf
+[source,ini]
+----
+kafka.zookeeper.connect=localhost:2181
+kafka.bootstrap.servers=localhost:9092
+----
+
+If you are using Confluent Cloud (managed Kafka), you can connect to Kafka in this way, filling 
+in your own `CONFLUENT_CLOUD_ENDPOINT`, `CONFLUENT_API_KEY`, and `CONFLUENT_API_SECRET`
 
-[[installation]]
-=== Installation
+.neo4j.conf
+[source,ini]
+----
+kafka.bootstrap.servers: <<CONFLUENT_CLOUD_ENDPOINT_HERE>>
+kafka.sasl.jaas.config: org.apache.kafka.common.security.plain.PlainLoginModule required username="<<CONFLUENT_API_KEY HERE>>" password="<<CONFLUENT_API_SECRET HERE>>";
+kafka.ssl.endpoint.identification.algorithm: https
+kafka.security.protocol: SASL_SSL
+kafka.sasl.mechanism: PLAIN
+kafka.request.timeout_ms: 20000
+kafka.retry.backoff.ms: 500
+----
 
-Download the latest release jar from https://github.com/neo4j-contrib/neo4j-streams/releases/latest
+=== Decide: Consumer, Producer, or Both
 
-Copy it into `$NEO4J_HOME/plugins` and configure the relevant connections.
+Follow one or both subsections according to your use case and need:
 
-The minimal setup in your `neo4j.conf` is:
+==== Consumer
 
+Take data from Kafka and store it in Neo4j (Neo4j as a data sink) by adding configuration such as:
+
+.neo4j.conf
+[source,ini]
 ----
-kafka.zookeeper.connect=localhost:2181
-kafka.bootstrap.servers=localhost:9092
+streams.sink.enabled=true
+streams.sink.topic.cypher.my-ingest-topic=MERGE (n:Label {id: event.id}) ON CREATE SET n += event.properties
 ----
 
-For each module there are additional configs that are explained in the individual sections.
+This will process every message that comes in on `my-ingest-topic` with the given cypher statement.  When
+that cypher statement executes, the `event` variable that is referenced will be set to the message received,
+so this sample cypher will create a `(:Label)` node in the graph with the given ID, copying all of the
+properties in the source message.
+
+For full details on what you can do here, see the link:/consumer[Consumer Section] of the documentation.
 
+==== Producer
 
-[[build-locally]]
-==== Build locally
+Produce data from Neo4j and send it to a Kafka topic (Neo4j as a source) by adding configuration such as:
 
+.neo4j.conf
 ----
-mvn clean install
+streams.source.topic.nodes.my-nodes-topic=Person{*}
+streams.source.topic.relationships.my-rels-topic=KNOWS{*}
+streams.source.enabled=true
+streams.source.schema.polling.interval=10000
 ----
 
-1. Copy `<project_dir>/target/neo4j-streams-<VERSION>.jar` into `$NEO4J_HOME/plugins`
-2. Restart Neo4j
+This will produce all graph nodes labeled `(:Person)` on to the topic `my-nodes-topic` and all
+relationships of type `-[:KNOWS]->` to the topic named `my-rels-topic`.  Further, schema changes will
+be polled every 10,000 ms, which affects how quickly the database picks up new indexes/schema changes.
 
+The expressions `Person{\*}` and `KNOWS{*}` are _patterns_.  You can find documentation on how to change
+these in the link:/producer/#_patterns[Patterns section].
+
+For full details on what you can do here, see the link:/producer[Producer Section] of the documentation.
+
+include::introduction/index.adoc[]
 
 include::producer/index.adoc[]
 
@@ -88,3 +123,7 @@ include::procedures/index.adoc[]
 include::docker/index.adoc[]
 
 include::kafka-connect/index.adoc[]
+
+include::neo4j-cluster/index.adoc[]
+
+include::developing/index.adoc[]
@@ -0,0 +1,164 @@
+[[introduction]]
+== Introduction
+
+ifdef::env-docs[]
+[abstract]
+--
+This chapter provides an introduction to the Neo4j Streams Library, and instructions for installation.
+--
+endif::env-docs[]
+
+Many user and customers want to integrate Kafka and other streaming solutions with Neo4j.
+Either to ingest data into the graph from other sources.
+Or to send update events (change data capture - CDC) to the event log for later consumption.
+
+This extension was developed to satisfy all these use-cases and more to come.
+
+The project is composed of several parts:
+
+* Neo4j Streams Procedure: a procedure to send a payload to a topic
+* Neo4j Streams Producer: a transaction event handler events that sends data to a Kafka topic
+* Neo4j Streams Consumer: a Neo4j application that ingest data from Kafka topics into Neo4j via templated Cypher Statements
+* Kafka-Connect Plugin: a plugin for the Confluent Platform that allows to ingest data into Neo4j, from Kafka topics, via Cypher queries.
+
+[[before_begin]]
+=== Before you Begin
+
+Neo4j streams can run in two modes:
+
+* As a **Neo4j plugin**, neo4j-streams runs inside of the database, and can both consume and produce messages
+to Kafka.
+* As a **Kafka Connect worker**, neo4j-streams is deployed separately from the Neo4j database.  At this time,
+the connect worker can be used to push data to Neo4j (Neo4j as the consumer) but does not yet support
+change data capture (CDC) coming _from_ Neo4j.
+
+Experienced Neo4j users will likely prefer running the software as a Neo4j Plugin.  Kafka administrators
+may prefer using the Kafka Connect method.
+
+The remainder of the introduction section assumes you are running Neo4j Streams as a Neo4j plugin. 
+More information on the alternative Kafka Connect method can be link:/kafka-connect/[found in this section].
+
+[[installation]]
+=== Installation
+
+Download the latest release jar from https://github.com/neo4j-contrib/neo4j-streams/releases/latest
+
+Copy it into `$NEO4J_HOME/plugins` and configure the relevant connections.
+
+[[configuration]]
+=== Configuration Example
+
+Configuring neo4j-streams comes in three different parts, depending on your need:
+
+. *Required*: Configuring a connection to Kafka
+. _Optional_: Configuring Neo4j to ingest from Kafka (link:/consumer[Consumer])
+. _Optional_: Configuring Neo4j to produce records to Kafka (link:/producer[Producer])
+
+Below is a complete configured example of using the plugin in both modes, assuming kafka running
+on localhost.  See the relevant subsections to adjust the configuration as necessary.
+
+.neo4j.conf
+[source,ini]
+----
+kafka.zookeeper.connect=localhost:2181
+kafka.bootstrap.servers=localhost:9092
+
+streams.sink.enabled=true
+streams.sink.topic.cypher.topic-name=MERGE (n:Person { id: event.id }) SET n += event
+
+streams.source.enabled=true
+streams.source.topic.nodes.new-person-topic=Person{*}
+streams.source.topic.relationships.who-knows-who=KNOWS{*}
+streams.source.schema.polling.interval=10000
+----
+
+The rest of this section will deal with overall plugin configuration.
+
+[[kafka_settings]]
+=== Kafka Settings
+
+Any configuration option that starts with `kafka.` will be passed to the underlying Kafka driver. Neo4j 
+streams uses the official Confluent Kafka producer and consumer java clients.
+Configuration settings which are valid for those connectors will also work for Neo4j Streams.  
+
+For example, in the
+kafka documentation linked below, the configuration setting named `batch.size` should be stated as
+`kafka.batch.size` in Neo4j Streams.
+
+The following are common configuration settings you may wish to use.  _This is not a complete
+list_.  The full list of configuration options and reference material is available from Confluent's
+site for link:https://docs.confluent.io/current/installation/configuration/consumer-configs.html#cp-config-consumer[consumer configurations] and
+link:https://docs.confluent.io/current/installation/configuration/producer-configs.html#cp-config-producer[producer configurations]
+
+.Most Common Needed Configuration Settings
+|===
+|Setting Name |Description |Default Value
+
+|kafka.max.poll.records
+|The maximum number of records to pull per batch from Kafka. Increasing this number will mean
+larger transactions in Neo4j memory and may improve throughput.
+|500
+
+|kafka.buffer.memory
+|The total bytes of memory the producer can use to buffer records waiting.  Use this to adjust
+how much memory the plugin may require to hold messages not yet delivered to Neo4j
+|33554432
+
+|kafka.batch.size
+|(Producer only) The producer will attempt to batch records together into fewer requests whenever multiple records are being sent to the same partition. This helps performance on both the client and the server. This configuration controls the default batch size in bytes.
+|16384
+
+|kafka.batch.size
+|(Producer only) The producer will attempt to batch records together into fewer requests whenever multiple records are being sent to the same partition. This helps performance on both the client and the server. This configuration controls the default batch size in bytes.
+|16384
+
+|kafka.max.partition.fetch.bytes
+|(Consumer only) The maximum amount of data per-partition the server will return. Records are fetched in batches by the consumer. If the first record batch in the first non-empty partition of the fetch is larger than this limit, the batch will still be returned to ensure that the consumer can make progress. 
+|1048576
+
+|kafka.group.id
+|A unique string that identifies the consumer group this consumer belongs to.
+|N/A
+|===
+
+[[confluent_cloud]]
+=== Confluent Cloud
+
+Configuring a connection to a Confluent Cloud instance should follow 
+link:https://docs.confluent.io/current/cloud/using/config-client.html#java-client[Confluent's Java Client]
+configuration advice, and the advice just above.  At a minimum, to configure this, you will need:
+
+* `bootstrap_server_url`
+* `api-key`
+* `api-secret`
+
+[[configuration_plugin]]
+=== Plugin Configuration
+
+Any configuration option that starts with `streams.` controls how the plugin itself behaves.  For a full
+list of options available, see the documentation subsections on the producer and consumer.
+
+[[configuration_docker]]
+=== A Note on Running Neo4j in Docker
+
+When Neo4j is run in a docker, some special considerations apply; please see 
+link:https://neo4j.com/docs/operations-manual/current/docker/configuration/[Neo4j Docker Configuration]
+for more information.  In particular, the configuration format used in `neo4j.conf` looks different.
+
+Please note that the Neo4j Docker image use a naming convention; you can override every neo4j.conf property by prefix it with `NEO4J_` and using the following transformations:
+
+* single underscore is converted in double underscore: `_ -> __`
+* point is converted in single underscore: `.` -> `_`
+
+Example:
+
+* `dbms.memory.heap.max_size=8G` -> `NEO4J_dbms_memory_heap_max__size: 8G`
+* `dbms.logs.debug.level=DEBUG` -> `NEO4J_dbms_logs_debug_level: DEBUG`
+
+For more information and examples see the link:/docker[Docker section] of the documentation.
+
+[[restart]]
+=== Restart Neo4j
+
+Once the plugin is installed and configured, restarting the database will make it active.
+If you have configured Neo4j to consume from kafka, it will begin immediately processing messages.