You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
There are two primary components to Grabbit: a client and a server that run in the two CQ instances that you want to copy to and from (respectively).
3
+
Any server with Grabbit installed acts as a Grabbit peer that can send, or receive content to another Grabbit peer.
4
+
5
+
To pull content into a server, a new job needs to be created on the receiving server. To do this, using the RESTful API exposed by Grabbit, PUT /grabbit/job with configuration specifying server to pull from, paths to pull, etc. This is outlined in more detail at link:Running.adoc[Running Grabbit]
4
6
5
7
A recommended systems layout style is to have all content from a production publisher copied down to a staging "data warehouse" (DW) server to which all lower environments (beta, continuous integration, developer workstations, etc.) will connect. This way minimal load is placed on Production, and additional DW machines can be added to scale out if needed, each of which can grab from the "main" DW.
6
-
The client sends an HTTP(S) GET request with a content path and "last grab time" to the server and receives a protobuf stream of all the content below it that has changed. The client's BasicAuth credentials are used to create the JCR Session, so the client can never see content they don't have explicit access to. There are a number of ways to tune how the client works, including specifying multiple focused paths, parallel or serial execution, JCR Session batch size (the number of nodes to cache before flushing to disk), etc.
Copy file name to clipboardExpand all lines: docs/Monitoring.adoc
+5-5Lines changed: 5 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -25,7 +25,7 @@ A job status has the following format :
25
25
```
26
26
27
27
Couple of points worth noting here:
28
-
`"exitCode"` can have 4 states - `UNKNOWN`, `COMPLETED`, `FAILED`, or `VALIDATION_FAILED`. `UNKNOWN` means the job is still running. `COMPLETED` means that the job was completed successfully. `FAILED` means the job failed. `VALIDATION_FAILED` means the job was aborted due to client configuration; This could mean that although the configuration was valid, Grabbit refused to perform some work due to imminent introduction of unintended consequences.
28
+
`"exitCode"` can have 4 states - `UNKNOWN`, `COMPLETED`, `FAILED`, or `VALIDATION_FAILED`. `UNKNOWN` means the job is still running. `COMPLETED` means that the job was completed successfully. `FAILED` means the job failed. `VALIDATION_FAILED` means the job was aborted due to configuration; This could mean that although the configuration was valid, Grabbit refused to sync a path - for e.g, a non-existing parent path. Grabbit will not implicitly write parent nodes.
29
29
`"jcrNodesWritten"` : This indicates how many nodes are currently written (increments by 1000)
30
30
`"timeTaken"` : This will indicate the total time taken to complete content grab for `currentPath`
31
31
@@ -36,9 +36,9 @@ __Sample of a real Grabbit Job status__
36
36
37
37
image::../assets/jobStatus.png[Job Status]
38
38
39
-
Two loggers are predefined for Grabbit. One for Grabbit Server and the other for Grabbit Client.
40
-
They are link:grabbit/src/main/content/SLING-INF/content/apps/grabbit/config/org.apache.sling.commons.log.LogManager.factory.config-com.twcable.grabbit.server.batch.xml[batch-server.log] and link:grabbit/src/main/content/SLING-INF/content/apps/grabbit/config/org.apache.sling.commons.log.LogManager.factory.config-com.twcable.grabbit.client.batch.xml[batch-client.log] respectively.
41
-
These log files are for anything logged in **com.twcable.grabbit.server.batch** and **com.twcable.grabbit.client.batch** packages.
39
+
Two loggers are predefined for Grabbit. One detailing content receive operations, another for content push operations.
40
+
They are link:../src/main/content/SLING-INF/content/apps/grabbit/config/org.apache.sling.commons.log.LogManager.factory.config-com.twcable.grabbit.send.xml[grabbit-send.log] and link:../src/main/content/SLING-INF/content/apps/grabbit/config/org.apache.sling.commons.log.LogManager.factory.config-com.twcable.grabbit.receive.xml[grabbit-receive.log] respectively.
41
+
These log files are for anything logged in **com.twcable.grabbit.server** and **com.twcable.grabbit.client** packages.
42
42
43
-
If you want to see what nodes are being written on the Grabbit Client, change the logging for `batch-client.log` above to `DEBUG` or `TRACE`.
43
+
If you want to see what nodes are being written to a receiving server, change the logging for `grabbit-receive.log` above to `DEBUG` or `TRACE`.
Copy file name to clipboardExpand all lines: docs/Running.adoc
+10-11Lines changed: 10 additions & 11 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,6 +1,6 @@
1
1
== Running
2
2
3
-
Make sure Grabbit Package is installed on both client and server. You can download the package from image:https://api.bintray.com/packages/twcable/aem/Grabbit/images/download.svg[title = "Download", link = "https://bintray.com/twcable/aem/Grabbit/_latestVersion"]
3
+
Make sure Grabbit Package is installed on both the server you are sending content from, and sending content to. You can download the package from image:https://api.bintray.com/packages/twcable/aem/Grabbit/images/download.svg[title = "Download", link = "https://bintray.com/twcable/aem/Grabbit/_latestVersion"]
4
4
5
5
Once that is done, you need just 2 files to sync content between the servers:
6
6
@@ -12,7 +12,7 @@ Once that is done, you need just 2 files to sync content between the servers:
12
12
link:../grabbit.sh[This] shell script can be used to initiate new Grabbit jobs, or monitor existing jobs.
13
13
14
14
- Run grabbit.sh
15
-
- Enter connection details to your Grabbit "client" server (The server you wish to pull content into)
15
+
- Enter connection details to your receiving server (The server you wish to pull content into)
@@ -118,27 +118,26 @@ The corresponding `YAML` configuration for the JSON above will look something li
118
118
- someContent/someOtherExcludeContent
119
119
workflowConfigIds : *damWorkflows
120
120
```
121
-
122
121
===== Required fields
123
122
124
-
* __serverHost__: The server that the client should get its content from.
125
-
* __serverPort__: The port to connect to on the server that the client should use.
126
-
* __serverUsername__: The username the client should use to authenticate against the server.
127
-
* __serverPassword__: The password the client should use to authenticate against the server.
123
+
* __serverHost__: The server host to receive content.
124
+
* __serverPort__: Server port for host above.
125
+
* __serverUsername__: Username for sending server authentication.
126
+
* __serverPassword__: Password for sending server authentication.
128
127
* __pathConfigurations__: The list of paths and their options to pull from the server.
129
128
** __path__: The path to recursively grab content from.
130
129
131
130
===== Optional fields
132
131
133
-
* __serverScheme__: string. The protocol the client should use when connecting to the server. Supported options are `http` and `https`. Defaults to `http`.
134
-
* __deltaContent__: boolean, ```true``` syncs only 'delta' or changed content. Changed content is determined by comparing one of a number of date properties including jcr:lastModified, cq:lastModified, or jcr:created Date with the last successful Grabbit sync date. Nodes without any of previously mentioned date properties will always be synced even with deltaContent on, and if a node's data is changed without updating a date property (ie, from CRX/DE), the change will not be detected. Most common throughput bottlenecks are usually handled by delta sync for cases such as large DAM trees; but if your case warrants a more fine tuned use of delta sync, you may consider adding mix:lastModified to nodes not usually considered for exclusion, such as extremely large unstructured trees. The deltaContent flag __only__ applies to changes made on the server - changes to the client environment will not be detected (and won't be overwritten if changes were made on the client's path but not on the server).
132
+
* __serverScheme__: string. The protocol to use when securing a connection to the sending server. Supported options are `http` and `https`. Defaults to `http`.
133
+
* __deltaContent__: boolean, ```true``` syncs only 'delta' or changed content. Changed content is determined by comparing one of a number of date properties including jcr:lastModified, cq:lastModified, or jcr:created Date with the last successful Grabbit sync date. Nodes without any of previously mentioned date properties will always be synced even with deltaContent on, and if a node's data is changed without updating a date property (ie, from CRX/DE), the change will not be detected. Most common throughput bottlenecks are usually handled by delta sync for cases such as large DAM trees; but if your case warrants a more fine tuned use of delta sync, you may consider adding mix:lastModified to nodes not usually considered for exclusion, such as extremely large unstructured trees. The deltaContent flag __only__ applies to changes made on the server - changes to the receiving environment will not be detected (and won't be overwritten if changes were made on the receiving path but not on the sending path).
135
134
* __batchSize__: integer. Used to specify the number of nodes in one batch, Defaults to 100.
136
-
* __deleteBeforeWrite__: boolean. Before the client retrieves content, should content under each path be cleared? When used in combination with excludePaths, nodes indicated by excludePaths will not be deleted
135
+
* __deleteBeforeWrite__: boolean. Before the receiving server retrieves content, should content under each path be cleared? When used in combination with excludePaths, nodes indicated by excludePaths will not be deleted
137
136
138
137
Under path configurations
139
138
140
139
** __excludePaths__: This allows excluding specific subpaths from what will be retrieved from the parent path. See more detail below.
141
-
** __workflowConfigIds__: Before the client retrieves content for the path from the server, it will make sure that the specified workflows are disabled. They will be re-enabled when all content specifying that workflow has finished copying. (Grabbit handles the situation of multiple paths specifying "overlapping" workflows.) This is particularly useful for areas like the DAM where a number of relatively expensive workflows will just "redo" what is already being copied.
140
+
** __workflowConfigIds__: Before the receiving server retrieves content for the path from the server, it will make sure that the specified workflows are disabled. They will be re-enabled when all content specifying that workflow has finished copying. (Grabbit handles the situation of multiple paths specifying "overlapping" workflows.) This is particularly useful for areas like the DAM where a number of relatively expensive workflows will just "redo" what is already being copied.
142
141
** __deleteBeforeWrite__: Individual path overwrite for global deleteBeforeWrite setting.
143
142
** __deltaContent__: boolean. Individual path overwrite for the global deltaContent setting. Functionality is the same, but on a path-by-path basis, instead of applying to all path configurations. No matter what the global setting is, specifying this field will overwrite it. If not specified, the path will sync according to the global setting.
144
143
** __batchSize__: integer. Individual path override the global batchSize configuration. Functionality is the same, but on path-by-path basis. No matter what the global setting is, specifying this field will overwrite it. If not specified, the path will sync according to the global setting.
Copy file name to clipboardExpand all lines: src/main/content/SLING-INF/content/apps/grabbit/config/org.apache.sling.commons.log.LogManager.factory.config-com.twcable.grabbit.receive.xml
Copy file name to clipboardExpand all lines: src/main/content/SLING-INF/content/apps/grabbit/config/org.apache.sling.commons.log.LogManager.factory.config-com.twcable.grabbit.send.xml
0 commit comments