You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
You may wish to peruse the `diffsync`[GitHub topic](https://github.com/topics/diffsync) for examples of projects using this library.
25
+
> You may wish to peruse the `diffsync`[GitHub topic](https://github.com/topics/diffsync) for examples of projects using this library.
26
26
27
-
# Getting started
27
+
# Documentation
28
28
29
-
To be able to properly compare different datasets, DiffSync relies on a shared data model that both systems must use.
30
-
Specifically, each system or dataset must provide a `DiffSync` "adapter" subclass, which in turn represents its dataset as instances of one or more `DiffSyncModel` data model classes.
29
+
The documentation is available [on Read The Docs](https://diffsync.readthedocs.io/en/latest/index.html).
31
30
32
-
When comparing two systems, DiffSync detects the intersection between the two systems (which data models they have in common, and which attributes are shared between each pair of data models) and uses this intersection to compare and/or synchronize the data.
31
+
# Installation
33
32
34
-
##Define your model with DiffSyncModel
33
+
### Option 1: Install from PyPI.
35
34
36
-
`DiffSyncModel` is based on [Pydantic](https://pydantic-docs.helpmanual.io/) and is using Python typing to define the format of each attribute.
37
-
Each `DiffSyncModel` subclass supports the following class-level attributes:
38
-
-`_modelname` - Defines the type of the model; used to identify common models between different systems (Mandatory)
39
-
-`_identifiers` - List of instance field names used as primary keys for this object (Mandatory)
40
-
-`_shortname` - List of instance field names to use for a shorter name (Optional)
41
-
-`_attributes` - List of non-identifier instance field names for this object; used to identify the fields in common between data models for different systems (Optional)
42
-
-`_children` - Dict of `{<model_name>: <field_name>}` indicating which fields store references to child data model instances. (Optional)
43
-
44
-
> DiffSyncModel instances must be uniquely identified by their unique ID (or, in database terminology, [natural key](https://en.wikipedia.org/wiki/Natural_key)), which is composed of the union of all fields defined in `_identifiers`. The unique ID must be globally meaningful (such as an unique instance name or slug), as it is used to identify object correspondence between differing systems or data sets. It **must not** be a value that is only locally meaningful to a specific data set, such as a database primary key value.
45
-
46
-
> Only fields listed in `_identifiers`, `_attributes`, or `_children` will be potentially included in comparison and synchronization between systems or data sets. Any other fields will be ignored; this allows for a model to additionally contain fields that are only locally relevant (such as database primary key values) and therefore are irrelevant to comparison and synchronization.
47
-
48
-
```python
49
-
from typing import List, Optional
50
-
from diffsync import DiffSyncModel
51
-
52
-
classSite(DiffSyncModel):
53
-
_modelname ="site"
54
-
_identifiers = ("name",)
55
-
_shortname = ()
56
-
_attributes = ("contact_phone",)
57
-
_children = {"device": "devices"}
58
-
59
-
name: str
60
-
contact_phone: Optional[str]
61
-
devices: List =list()
62
-
database_pk: Optional[int] # not listed in _identifiers/_attributes/_children as it's only locally significant
63
35
```
64
-
65
-
### Relationship between models
66
-
67
-
Currently the relationships between models are very loose by design. Instead of storing an object, it's recommended to store the unique id of an object and retrieve it from the store as needed. The `add_child()` API of `DiffSyncModel` provides this behavior as a default.
68
-
69
-
## Define your system adapter with DiffSync
70
-
71
-
A `DiffSync` "adapter" subclass must reference each model available at the top of the object by its modelname and must have a `top_level` attribute defined to indicate how the diff and the synchronization should be done. In the example below, `"site"` is the only top level object so the synchronization engine will only check all known `Site` instances and all children of each Site. In this case, as shown in the code above, `Device`s are children of `Site`s, so this is exactly the intended logic.
72
-
73
-
```python
74
-
from diffsync import DiffSync
75
-
76
-
classBackendA(DiffSync):
77
-
78
-
site = Site
79
-
device = Device
80
-
81
-
top_level = ["site"]
36
+
$ pip install diffsync
82
37
```
83
38
84
-
It's up to the implementer to populate the `DiffSync`'s internal cache with the appropriate data. In the example below we are using the `load()` method to populate the cache but it's not mandatory, it could be done differently.
85
-
86
-
## Store data in a `DiffSync` object
87
-
88
-
To add a site to the local cache/store, you need to pass a valid `DiffSyncModel` object to the `add()` function.
89
-
90
-
```python
91
-
classBackendA(DiffSync):
92
-
[...]
93
-
94
-
defload(self):
95
-
# Store an individual object
96
-
site =self.site(name="nyc")
97
-
self.add(site)
98
-
99
-
# Store an object and define it as a child of another object
### Option 2: Install from a GitHub branch, such as main as shown below.
103
40
```
104
-
105
-
## Update remote system on sync
106
-
107
-
When data synchronization is performed via `sync_from()` or `sync_to()`, DiffSync automatically updates the in-memory
108
-
`DiffSyncModel` objects of the receiving adapter. The implementer of this class is responsible for ensuring that any remote system or data store is updated correspondingly. There are two usual ways to do this, depending on whether it's more
109
-
convenient to manage individual records (as in a database) or modify the entire data store in one pass (as in a file-based data store).
110
-
111
-
### Manage individual records
112
-
113
-
To update individual records in a remote system, you need to extend your `DiffSyncModel` class(es) to define your own `create`, `update` and/or `delete` methods for each model.
114
-
A `DiffSyncModel` instance stores a reference to its parent `DiffSync` adapter instance in case you need to use it to look up other model instances from the `DiffSync`'s cache.
115
-
116
-
```python
117
-
classDevice(DiffSyncModel):
118
-
[...]
119
-
120
-
@classmethod
121
-
defcreate(cls, diffsync, ids, attrs):
122
-
## TODO add your own logic here to create the device on the remote system
123
-
# Call the super().create() method to create the in-memory DiffSyncModel instance
Pull requests are welcomed and automatically built and tested against multiple versions of Python through GitHub Actions.
139
46
140
-
If you prefer to update the entire remote system with the final state after performing all individual create/update/delete operations (as might be the case if your "remote system" is a single YAML or JSON file), the easiest place to implement this logic is in the `sync_complete()` callback method that is automatically invoked by DiffSync upon completion of a sync operation.
47
+
The project is following Network to Code software development guidelines and are leveraging the following:
- Black, Pylint, Bandit, flake8, and pydocstyle for Python linting and formatting.
50
+
- pytest, coverage, and unittest for unit tests.
51
+
52
+
# Questions
53
+
Please see the [documentation](https://diffsync.readthedocs.io/en/latest/index.html) for detailed documentation on how to use `diffsync`. For any additional questions or comments, feel free to swing by the [Network to Code slack channel](https://networktocode.slack.com/) (channel #networktocode). Sign up [here](http://slack.networktocode.com/)
To be able to properly compare different datasets, DiffSync relies on a shared data model that both systems must use.
5
+
Specifically, each system or dataset must provide a `DiffSync` "adapter" subclass, which in turn represents its dataset as instances of one or more `DiffSyncModel` data model classes.
6
+
7
+
When comparing two systems, DiffSync detects the intersection between the two systems (which data models they have in common, and which attributes are shared between each pair of data models) and uses this intersection to compare and/or synchronize the data.
8
+
9
+
## Define your model with DiffSyncModel
10
+
11
+
`DiffSyncModel` is based on [Pydantic](https://pydantic-docs.helpmanual.io/) and is using Python typing to define the format of each attribute.
12
+
Each `DiffSyncModel` subclass supports the following class-level attributes:
13
+
-`_modelname` - Defines the type of the model; used to identify common models between different systems (Mandatory)
14
+
-`_identifiers` - List of instance field names used as primary keys for this object (Mandatory)
15
+
-`_shortname` - List of instance field names to use for a shorter name (Optional)
16
+
-`_attributes` - List of non-identifier instance field names for this object; used to identify the fields in common between data models for different systems (Optional)
17
+
-`_children` - Dict of `{<model_name>: <field_name>}` indicating which fields store references to child data model instances. (Optional)
18
+
19
+
> DiffSyncModel instances must be uniquely identified by their unique ID (or, in database terminology, [natural key](https://en.wikipedia.org/wiki/Natural_key)), which is composed of the union of all fields defined in `_identifiers`. The unique ID must be globally meaningful (such as an unique instance name or slug), as it is used to identify object correspondence between differing systems or data sets. It **must not** be a value that is only locally meaningful to a specific data set, such as a database primary key value.
20
+
21
+
> Only fields listed in `_identifiers`, `_attributes`, or `_children` will be potentially included in comparison and synchronization between systems or data sets. Any other fields will be ignored; this allows for a model to additionally contain fields that are only locally relevant (such as database primary key values) and therefore are irrelevant to comparison and synchronization.
22
+
23
+
```python
24
+
from typing import List, Optional
25
+
from diffsync import DiffSyncModel
26
+
27
+
classSite(DiffSyncModel):
28
+
_modelname ="site"
29
+
_identifiers = ("name",)
30
+
_shortname = ()
31
+
_attributes = ("contact_phone",)
32
+
_children = {"device": "devices"}
33
+
34
+
name: str
35
+
contact_phone: Optional[str]
36
+
devices: List =list()
37
+
database_pk: Optional[int] # not listed in _identifiers/_attributes/_children as it's only locally significant
38
+
```
39
+
40
+
### Relationship between models
41
+
42
+
Currently the relationships between models are very loose by design. Instead of storing an object, it's recommended to store the unique id of an object and retrieve it from the store as needed. The `add_child()` API of `DiffSyncModel` provides this behavior as a default.
43
+
44
+
## Define your system adapter with DiffSync
45
+
46
+
A `DiffSync` "adapter" subclass must reference each model available at the top of the object by its modelname and must have a `top_level` attribute defined to indicate how the diff and the synchronization should be done. In the example below, `"site"` is the only top level object so the synchronization engine will only check all known `Site` instances and all children of each Site. In this case, as shown in the code above, `Device`s are children of `Site`s, so this is exactly the intended logic.
47
+
48
+
```python
49
+
from diffsync import DiffSync
50
+
51
+
classBackendA(DiffSync):
52
+
53
+
site = Site
54
+
device = Device
55
+
56
+
top_level = ["site"]
57
+
```
58
+
59
+
It's up to the implementer to populate the `DiffSync`'s internal cache with the appropriate data. In the example below we are using the `load()` method to populate the cache but it's not mandatory, it could be done differently.
60
+
61
+
## Store data in a `DiffSync` object
62
+
63
+
To add a site to the local cache/store, you need to pass a valid `DiffSyncModel` object to the `add()` function.
64
+
65
+
```python
66
+
classBackendA(DiffSync):
67
+
[...]
68
+
69
+
defload(self):
70
+
# Store an individual object
71
+
site =self.site(name="nyc")
72
+
self.add(site)
73
+
74
+
# Store an object and define it as a child of another object
When data synchronization is performed via `sync_from()` or `sync_to()`, DiffSync automatically updates the in-memory
83
+
`DiffSyncModel` objects of the receiving adapter. The implementer of this class is responsible for ensuring that any remote system or data store is updated correspondingly. There are two usual ways to do this, depending on whether it's more
84
+
convenient to manage individual records (as in a database) or modify the entire data store in one pass (as in a file-based data store).
85
+
86
+
### Manage individual records
87
+
88
+
To update individual records in a remote system, you need to extend your `DiffSyncModel` class(es) to define your own `create`, `update` and/or `delete` methods for each model.
89
+
A `DiffSyncModel` instance stores a reference to its parent `DiffSync` adapter instance in case you need to use it to look up other model instances from the `DiffSync`'s cache.
90
+
91
+
```python
92
+
classDevice(DiffSyncModel):
93
+
[...]
94
+
95
+
@classmethod
96
+
defcreate(cls, diffsync, ids, attrs):
97
+
## TODO add your own logic here to create the device on the remote system
98
+
# Call the super().create() method to create the in-memory DiffSyncModel instance
## TODO add your own logic here to update the device on the remote system
103
+
# Call the super().update() method to update the in-memory DiffSyncModel instance
104
+
returnsuper().update(attrs)
105
+
106
+
defdelete(self):
107
+
## TODO add your own logic here to delete the device on the remote system
108
+
# Call the super().delete() method to remove the DiffSyncModel instance from its parent DiffSync adapter
109
+
super().delete()
110
+
returnself
111
+
```
112
+
113
+
### Bulk/batch modifications
114
+
115
+
If you prefer to update the entire remote system with the final state after performing all individual create/update/delete operations (as might be the case if your "remote system" is a single YAML or JSON file), the easiest place to implement this logic is in the `sync_complete()` callback method that is automatically invoked by DiffSync upon completion of a sync operation.
0 commit comments