Skip to content

Commit 6896669

Browse files
authored
Merge pull request #215 from moxious/3.5
Documentation improvements
2 parents 7012a81 + ec2334a commit 6896669

File tree

7 files changed

+339
-36
lines changed

7 files changed

+339
-36
lines changed

doc/asciidoc/consumer/index.adoc

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,11 @@
11
[[consumer]]
2-
== Neo4j Streams Consumer
2+
== Consumer: Kafka -> Neo4j
33

44
ifdef::env-docs[]
55
[abstract]
66
--
7-
This chapter describes the Neo4j Streams Consumer in the Neo4j Streams Library.
7+
This chapter describes the Neo4j Streams Consumer in the Neo4j Streams Library. Use this section
8+
to configure Neo4j to consume data from Kafka and create new nodes and relationships.
89
--
910
endif::env-docs[]
1011

doc/asciidoc/developing/index.adoc

Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
[[developing]]
2+
== Developing Neo4j Streams
3+
4+
ifdef::env-docs[]
5+
[abstract]
6+
--
7+
This chapter describes setting up Neo4j Streams for local development.
8+
--
9+
endif::env-docs[]
10+
11+
[[build-locally]]
12+
=== Build locally
13+
14+
----
15+
mvn clean install
16+
----
17+
18+
1. Copy `<project_dir>/target/neo4j-streams-<VERSION>.jar` into `$NEO4J_HOME/plugins`
19+
2. Restart Neo4j
20+
21+
[[gendocs]]
22+
=== Generating this Documentation
23+
24+
1. `cd docs && ./gradlew clean packageHTML`
25+
2. `cd build/html && python3 -m http.server`
26+
3. Browse to http://localhost:8000/
27+
28+

doc/asciidoc/index.adoc

Lines changed: 68 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
= Neo4j Streaming Data Integrations User Guide v{docs-version}
22
:toc: left
33
:experimental:
4-
:toclevels: 2
4+
:toclevels: 3
55
:sectid:
66
:sectlinks:
77
:img: https://github.com/neo4j-contrib/neo4j-streams/raw/gh-pages/3.4/images
@@ -11,73 +11,108 @@ ifdef::backend-html5[(C) {copyright}]
1111

1212
License: link:{common-license-page-uri}[Creative Commons 4.0]
1313

14-
1514
[abstract]
1615
--
1716
This is the user guide for Neo4j Streams version {docs-version}, authored by the Neo4j Labs Team.
1817
--
1918

2019
The guide covers the following areas:
2120

21+
* <<quickstart>> -- Get Started Fast with the most Common Scenarios
2222
* <<introduction>> -- An introduction to Neo4j Streams
2323
* <<producer>> -- Sends transaction event handler events to a Kafka topic
2424
* <<consumer>> -- Ingests events from a Kafka topic into Neo4j
2525
* <<procedures>> -- Procedures for consuming and producing Kafka events
26-
* <<docker>> -- Docker Compose files for local testing
26+
* <<docker>> -- Docker Compose files for local testing; example configurations
2727
* <<kafka-connect>> -- Kafka Connect Sink plugin
28+
* <<cluster>> -- Using with Neo4j Causal Cluster
29+
* <<developing>> -- Developing Neo4j Streams
2830
29-
30-
[[introduction]]
31-
== Introduction
31+
[[quickstart]]
32+
== Quick Start
3233

3334
ifdef::env-docs[]
3435
[abstract]
3536
--
36-
This chapter provides an introduction to the Neo4j Streams Library, and instructions for installation.
37+
Get started fast for common scenarios, using neo4j-streams as a plugin.
3738
--
3839
endif::env-docs[]
3940

41+
=== Install the Plugin
4042

41-
Many user and customers want to integrate Kafka and other streaming solutions with Neo4j.
42-
Either to ingest data into the graph from other sources.
43-
Or to send update events (change data capture - CDC) to the event log for later consumption.
43+
* Download the latest release jar from https://github.com/neo4j-contrib/neo4j-streams/releases/latest
44+
* Copy it into `$NEO4J_HOME/plugins` and configure the relevant connections
4445

45-
This extension was developed to satisfy all these use-cases and more to come.
46+
=== Configure Kafka Connection
4647

47-
The project is composed of several parts:
48+
If you are running locally or against a standalone machine, configure `neo4j.conf` to point to that server:
4849

49-
* Neo4j Streams Procedure: a procedure to send a payload to a topic
50-
* Neo4j Streams Producer: a transaction event handler events that sends data to a Kafka topic
51-
* Neo4j Streams Consumer: a Neo4j application that ingest data from Kafka topics into Neo4j via templated Cypher Statements
52-
* Kafka-Connect Plugin: a plugin for the Confluent Platform that allows to ingest data into Neo4j, from Kafka topics, via Cypher queries.
50+
.neo4j.conf
51+
[source,ini]
52+
----
53+
kafka.zookeeper.connect=localhost:2181
54+
kafka.bootstrap.servers=localhost:9092
55+
----
56+
57+
If you are using Confluent Cloud (managed Kafka), you can connect to Kafka in this way, filling
58+
in your own `CONFLUENT_CLOUD_ENDPOINT`, `CONFLUENT_API_KEY`, and `CONFLUENT_API_SECRET`
5359

54-
[[installation]]
55-
=== Installation
60+
.neo4j.conf
61+
[source,ini]
62+
----
63+
kafka.bootstrap.servers: <<CONFLUENT_CLOUD_ENDPOINT_HERE>>
64+
kafka.sasl.jaas.config: org.apache.kafka.common.security.plain.PlainLoginModule required username="<<CONFLUENT_API_KEY HERE>>" password="<<CONFLUENT_API_SECRET HERE>>";
65+
kafka.ssl.endpoint.identification.algorithm: https
66+
kafka.security.protocol: SASL_SSL
67+
kafka.sasl.mechanism: PLAIN
68+
kafka.request.timeout_ms: 20000
69+
kafka.retry.backoff.ms: 500
70+
----
5671

57-
Download the latest release jar from https://github.com/neo4j-contrib/neo4j-streams/releases/latest
72+
=== Decide: Consumer, Producer, or Both
5873

59-
Copy it into `$NEO4J_HOME/plugins` and configure the relevant connections.
74+
Follow one or both subsections according to your use case and need:
6075

61-
The minimal setup in your `neo4j.conf` is:
76+
==== Consumer
6277

78+
Take data from Kafka and store it in Neo4j (Neo4j as a data sink) by adding configuration such as:
79+
80+
.neo4j.conf
81+
[source,ini]
6382
----
64-
kafka.zookeeper.connect=localhost:2181
65-
kafka.bootstrap.servers=localhost:9092
83+
streams.sink.enabled=true
84+
streams.sink.topic.cypher.my-ingest-topic=MERGE (n:Label {id: event.id}) ON CREATE SET n += event.properties
6685
----
6786

68-
For each module there are additional configs that are explained in the individual sections.
87+
This will process every message that comes in on `my-ingest-topic` with the given cypher statement. When
88+
that cypher statement executes, the `event` variable that is referenced will be set to the message received,
89+
so this sample cypher will create a `(:Label)` node in the graph with the given ID, copying all of the
90+
properties in the source message.
91+
92+
For full details on what you can do here, see the link:/consumer[Consumer Section] of the documentation.
6993

94+
==== Producer
7095

71-
[[build-locally]]
72-
==== Build locally
96+
Produce data from Neo4j and send it to a Kafka topic (Neo4j as a source) by adding configuration such as:
7397

98+
.neo4j.conf
7499
----
75-
mvn clean install
100+
streams.source.topic.nodes.my-nodes-topic=Person{*}
101+
streams.source.topic.relationships.my-rels-topic=KNOWS{*}
102+
streams.source.enabled=true
103+
streams.source.schema.polling.interval=10000
76104
----
77105

78-
1. Copy `<project_dir>/target/neo4j-streams-<VERSION>.jar` into `$NEO4J_HOME/plugins`
79-
2. Restart Neo4j
106+
This will produce all graph nodes labeled `(:Person)` on to the topic `my-nodes-topic` and all
107+
relationships of type `-[:KNOWS]->` to the topic named `my-rels-topic`. Further, schema changes will
108+
be polled every 10,000 ms, which affects how quickly the database picks up new indexes/schema changes.
80109

110+
The expressions `Person{\*}` and `KNOWS{*}` are _patterns_. You can find documentation on how to change
111+
these in the link:/producer/#_patterns[Patterns section].
112+
113+
For full details on what you can do here, see the link:/producer[Producer Section] of the documentation.
114+
115+
include::introduction/index.adoc[]
81116

82117
include::producer/index.adoc[]
83118

@@ -88,3 +123,7 @@ include::procedures/index.adoc[]
88123
include::docker/index.adoc[]
89124

90125
include::kafka-connect/index.adoc[]
126+
127+
include::neo4j-cluster/index.adoc[]
128+
129+
include::developing/index.adoc[]
Lines changed: 164 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,164 @@
1+
[[introduction]]
2+
== Introduction
3+
4+
ifdef::env-docs[]
5+
[abstract]
6+
--
7+
This chapter provides an introduction to the Neo4j Streams Library, and instructions for installation.
8+
--
9+
endif::env-docs[]
10+
11+
Many user and customers want to integrate Kafka and other streaming solutions with Neo4j.
12+
Either to ingest data into the graph from other sources.
13+
Or to send update events (change data capture - CDC) to the event log for later consumption.
14+
15+
This extension was developed to satisfy all these use-cases and more to come.
16+
17+
The project is composed of several parts:
18+
19+
* Neo4j Streams Procedure: a procedure to send a payload to a topic
20+
* Neo4j Streams Producer: a transaction event handler events that sends data to a Kafka topic
21+
* Neo4j Streams Consumer: a Neo4j application that ingest data from Kafka topics into Neo4j via templated Cypher Statements
22+
* Kafka-Connect Plugin: a plugin for the Confluent Platform that allows to ingest data into Neo4j, from Kafka topics, via Cypher queries.
23+
24+
[[before_begin]]
25+
=== Before you Begin
26+
27+
Neo4j streams can run in two modes:
28+
29+
* As a **Neo4j plugin**, neo4j-streams runs inside of the database, and can both consume and produce messages
30+
to Kafka.
31+
* As a **Kafka Connect worker**, neo4j-streams is deployed separately from the Neo4j database. At this time,
32+
the connect worker can be used to push data to Neo4j (Neo4j as the consumer) but does not yet support
33+
change data capture (CDC) coming _from_ Neo4j.
34+
35+
Experienced Neo4j users will likely prefer running the software as a Neo4j Plugin. Kafka administrators
36+
may prefer using the Kafka Connect method.
37+
38+
The remainder of the introduction section assumes you are running Neo4j Streams as a Neo4j plugin.
39+
More information on the alternative Kafka Connect method can be link:/kafka-connect/[found in this section].
40+
41+
[[installation]]
42+
=== Installation
43+
44+
Download the latest release jar from https://github.com/neo4j-contrib/neo4j-streams/releases/latest
45+
46+
Copy it into `$NEO4J_HOME/plugins` and configure the relevant connections.
47+
48+
[[configuration]]
49+
=== Configuration Example
50+
51+
Configuring neo4j-streams comes in three different parts, depending on your need:
52+
53+
. *Required*: Configuring a connection to Kafka
54+
. _Optional_: Configuring Neo4j to ingest from Kafka (link:/consumer[Consumer])
55+
. _Optional_: Configuring Neo4j to produce records to Kafka (link:/producer[Producer])
56+
57+
Below is a complete configured example of using the plugin in both modes, assuming kafka running
58+
on localhost. See the relevant subsections to adjust the configuration as necessary.
59+
60+
.neo4j.conf
61+
[source,ini]
62+
----
63+
kafka.zookeeper.connect=localhost:2181
64+
kafka.bootstrap.servers=localhost:9092
65+
66+
streams.sink.enabled=true
67+
streams.sink.topic.cypher.topic-name=MERGE (n:Person { id: event.id }) SET n += event
68+
69+
streams.source.enabled=true
70+
streams.source.topic.nodes.new-person-topic=Person{*}
71+
streams.source.topic.relationships.who-knows-who=KNOWS{*}
72+
streams.source.schema.polling.interval=10000
73+
----
74+
75+
The rest of this section will deal with overall plugin configuration.
76+
77+
[[kafka_settings]]
78+
=== Kafka Settings
79+
80+
Any configuration option that starts with `kafka.` will be passed to the underlying Kafka driver. Neo4j
81+
streams uses the official Confluent Kafka producer and consumer java clients.
82+
Configuration settings which are valid for those connectors will also work for Neo4j Streams.
83+
84+
For example, in the
85+
kafka documentation linked below, the configuration setting named `batch.size` should be stated as
86+
`kafka.batch.size` in Neo4j Streams.
87+
88+
The following are common configuration settings you may wish to use. _This is not a complete
89+
list_. The full list of configuration options and reference material is available from Confluent's
90+
site for link:https://docs.confluent.io/current/installation/configuration/consumer-configs.html#cp-config-consumer[consumer configurations] and
91+
link:https://docs.confluent.io/current/installation/configuration/producer-configs.html#cp-config-producer[producer configurations]
92+
93+
.Most Common Needed Configuration Settings
94+
|===
95+
|Setting Name |Description |Default Value
96+
97+
|kafka.max.poll.records
98+
|The maximum number of records to pull per batch from Kafka. Increasing this number will mean
99+
larger transactions in Neo4j memory and may improve throughput.
100+
|500
101+
102+
|kafka.buffer.memory
103+
|The total bytes of memory the producer can use to buffer records waiting. Use this to adjust
104+
how much memory the plugin may require to hold messages not yet delivered to Neo4j
105+
|33554432
106+
107+
|kafka.batch.size
108+
|(Producer only) The producer will attempt to batch records together into fewer requests whenever multiple records are being sent to the same partition. This helps performance on both the client and the server. This configuration controls the default batch size in bytes.
109+
|16384
110+
111+
|kafka.batch.size
112+
|(Producer only) The producer will attempt to batch records together into fewer requests whenever multiple records are being sent to the same partition. This helps performance on both the client and the server. This configuration controls the default batch size in bytes.
113+
|16384
114+
115+
|kafka.max.partition.fetch.bytes
116+
|(Consumer only) The maximum amount of data per-partition the server will return. Records are fetched in batches by the consumer. If the first record batch in the first non-empty partition of the fetch is larger than this limit, the batch will still be returned to ensure that the consumer can make progress.
117+
|1048576
118+
119+
|kafka.group.id
120+
|A unique string that identifies the consumer group this consumer belongs to.
121+
|N/A
122+
|===
123+
124+
[[confluent_cloud]]
125+
=== Confluent Cloud
126+
127+
Configuring a connection to a Confluent Cloud instance should follow
128+
link:https://docs.confluent.io/current/cloud/using/config-client.html#java-client[Confluent's Java Client]
129+
configuration advice, and the advice just above. At a minimum, to configure this, you will need:
130+
131+
* `bootstrap_server_url`
132+
* `api-key`
133+
* `api-secret`
134+
135+
[[configuration_plugin]]
136+
=== Plugin Configuration
137+
138+
Any configuration option that starts with `streams.` controls how the plugin itself behaves. For a full
139+
list of options available, see the documentation subsections on the producer and consumer.
140+
141+
[[configuration_docker]]
142+
=== A Note on Running Neo4j in Docker
143+
144+
When Neo4j is run in a docker, some special considerations apply; please see
145+
link:https://neo4j.com/docs/operations-manual/current/docker/configuration/[Neo4j Docker Configuration]
146+
for more information. In particular, the configuration format used in `neo4j.conf` looks different.
147+
148+
Please note that the Neo4j Docker image use a naming convention; you can override every neo4j.conf property by prefix it with `NEO4J_` and using the following transformations:
149+
150+
* single underscore is converted in double underscore: `_ -> __`
151+
* point is converted in single underscore: `.` -> `_`
152+
153+
Example:
154+
155+
* `dbms.memory.heap.max_size=8G` -> `NEO4J_dbms_memory_heap_max__size: 8G`
156+
* `dbms.logs.debug.level=DEBUG` -> `NEO4J_dbms_logs_debug_level: DEBUG`
157+
158+
For more information and examples see the link:/docker[Docker section] of the documentation.
159+
160+
[[restart]]
161+
=== Restart Neo4j
162+
163+
Once the plugin is installed and configured, restarting the database will make it active.
164+
If you have configured Neo4j to consume from kafka, it will begin immediately processing messages.

0 commit comments

Comments
 (0)