Skip to content

Commit 3489934

Browse files
committed
Cleanup readme
1 parent 3c3060f commit 3489934

File tree

3 files changed

+145
-119
lines changed

3 files changed

+145
-119
lines changed

README.md

Lines changed: 16 additions & 116 deletions
Original file line numberDiff line numberDiff line change
@@ -22,132 +22,32 @@ A.sync_from(B)
2222
A.sync_to(B)
2323
```
2424

25-
You may wish to peruse the `diffsync` [GitHub topic](https://github.com/topics/diffsync) for examples of projects using this library.
25+
> You may wish to peruse the `diffsync` [GitHub topic](https://github.com/topics/diffsync) for examples of projects using this library.
2626
27-
# Getting started
27+
# Documentation
2828

29-
To be able to properly compare different datasets, DiffSync relies on a shared data model that both systems must use.
30-
Specifically, each system or dataset must provide a `DiffSync` "adapter" subclass, which in turn represents its dataset as instances of one or more `DiffSyncModel` data model classes.
29+
the documentation is available [here]((https://diffsync.readthedocs.io/en/latest/index.html))
3130

32-
When comparing two systems, DiffSync detects the intersection between the two systems (which data models they have in common, and which attributes are shared between each pair of data models) and uses this intersection to compare and/or synchronize the data.
31+
# Installation
3332

34-
## Define your model with DiffSyncModel
33+
### Option 1: Install from PyPI.
3534

36-
`DiffSyncModel` is based on [Pydantic](https://pydantic-docs.helpmanual.io/) and is using Python typing to define the format of each attribute.
37-
Each `DiffSyncModel` subclass supports the following class-level attributes:
38-
- `_modelname` - Defines the type of the model; used to identify common models between different systems (Mandatory)
39-
- `_identifiers` - List of instance field names used as primary keys for this object (Mandatory)
40-
- `_shortname` - List of instance field names to use for a shorter name (Optional)
41-
- `_attributes` - List of non-identifier instance field names for this object; used to identify the fields in common between data models for different systems (Optional)
42-
- `_children` - Dict of `{<model_name>: <field_name>}` indicating which fields store references to child data model instances. (Optional)
43-
44-
> DiffSyncModel instances must be uniquely identified by their unique ID (or, in database terminology, [natural key](https://en.wikipedia.org/wiki/Natural_key)), which is composed of the union of all fields defined in `_identifiers`. The unique ID must be globally meaningful (such as an unique instance name or slug), as it is used to identify object correspondence between differing systems or data sets. It **must not** be a value that is only locally meaningful to a specific data set, such as a database primary key value.
45-
46-
> Only fields listed in `_identifiers`, `_attributes`, or `_children` will be potentially included in comparison and synchronization between systems or data sets. Any other fields will be ignored; this allows for a model to additionally contain fields that are only locally relevant (such as database primary key values) and therefore are irrelevant to comparison and synchronization.
47-
48-
```python
49-
from typing import List, Optional
50-
from diffsync import DiffSyncModel
51-
52-
class Site(DiffSyncModel):
53-
_modelname = "site"
54-
_identifiers = ("name",)
55-
_shortname = ()
56-
_attributes = ("contact_phone",)
57-
_children = {"device": "devices"}
58-
59-
name: str
60-
contact_phone: Optional[str]
61-
devices: List = list()
62-
database_pk: Optional[int] # not listed in _identifiers/_attributes/_children as it's only locally significant
6335
```
64-
65-
### Relationship between models
66-
67-
Currently the relationships between models are very loose by design. Instead of storing an object, it's recommended to store the unique id of an object and retrieve it from the store as needed. The `add_child()` API of `DiffSyncModel` provides this behavior as a default.
68-
69-
## Define your system adapter with DiffSync
70-
71-
A `DiffSync` "adapter" subclass must reference each model available at the top of the object by its modelname and must have a `top_level` attribute defined to indicate how the diff and the synchronization should be done. In the example below, `"site"` is the only top level object so the synchronization engine will only check all known `Site` instances and all children of each Site. In this case, as shown in the code above, `Device`s are children of `Site`s, so this is exactly the intended logic.
72-
73-
```python
74-
from diffsync import DiffSync
75-
76-
class BackendA(DiffSync):
77-
78-
site = Site
79-
device = Device
80-
81-
top_level = ["site"]
36+
$ pip install diffsync
8237
```
8338

84-
It's up to the implementer to populate the `DiffSync`'s internal cache with the appropriate data. In the example below we are using the `load()` method to populate the cache but it's not mandatory, it could be done differently.
85-
86-
## Store data in a `DiffSync` object
87-
88-
To add a site to the local cache/store, you need to pass a valid `DiffSyncModel` object to the `add()` function.
89-
90-
```python
91-
class BackendA(DiffSync):
92-
[...]
93-
94-
def load(self):
95-
# Store an individual object
96-
site = self.site(name="nyc")
97-
self.add(site)
98-
99-
# Store an object and define it as a child of another object
100-
device = self.device(name="rtr-nyc", role="router", site_name="nyc")
101-
self.add(device)
102-
site.add_child(device)
39+
### Option 2: Install from a GitHub branch, such as main as shown below.
10340
```
104-
105-
## Update remote system on sync
106-
107-
When data synchronization is performed via `sync_from()` or `sync_to()`, DiffSync automatically updates the in-memory
108-
`DiffSyncModel` objects of the receiving adapter. The implementer of this class is responsible for ensuring that any remote system or data store is updated correspondingly. There are two usual ways to do this, depending on whether it's more
109-
convenient to manage individual records (as in a database) or modify the entire data store in one pass (as in a file-based data store).
110-
111-
### Manage individual records
112-
113-
To update individual records in a remote system, you need to extend your `DiffSyncModel` class(es) to define your own `create`, `update` and/or `delete` methods for each model.
114-
A `DiffSyncModel` instance stores a reference to its parent `DiffSync` adapter instance in case you need to use it to look up other model instances from the `DiffSync`'s cache.
115-
116-
```python
117-
class Device(DiffSyncModel):
118-
[...]
119-
120-
@classmethod
121-
def create(cls, diffsync, ids, attrs):
122-
## TODO add your own logic here to create the device on the remote system
123-
# Call the super().create() method to create the in-memory DiffSyncModel instance
124-
return super().create(ids=ids, diffsync=diffsync, attrs=attrs)
125-
126-
def update(self, attrs):
127-
## TODO add your own logic here to update the device on the remote system
128-
# Call the super().update() method to update the in-memory DiffSyncModel instance
129-
return super().update(attrs)
130-
131-
def delete(self):
132-
## TODO add your own logic here to delete the device on the remote system
133-
# Call the super().delete() method to remove the DiffSyncModel instance from its parent DiffSync adapter
134-
super().delete()
135-
return self
41+
$ pip install git+https://github.com/networktocode/diffsync.git@main
13642
```
13743

138-
### Bulk/batch modifications
44+
# Contributing
45+
Pull requests are welcomed and automatically built and tested against multiple versions of Python through TravisCI.
13946

140-
If you prefer to update the entire remote system with the final state after performing all individual create/update/delete operations (as might be the case if your "remote system" is a single YAML or JSON file), the easiest place to implement this logic is in the `sync_complete()` callback method that is automatically invoked by DiffSync upon completion of a sync operation.
47+
The project is following Network to Code software development guidelines and are leveraging the following:
14148

142-
```python
143-
class BackendA(DiffSync):
144-
[...]
145-
146-
def sync_complete(self, source: DiffSync, diff: Diff, flags: DiffSyncFlags, logger: structlog.BoundLogger):
147-
## TODO add your own logic to update the remote system now.
148-
# The various parameters passed to this method are for your convenience in implementing more complex logic, and
149-
# can be ignored if you do not need them.
150-
#
151-
# The default DiffSync.sync_complete() method does nothing, but it's always a good habit to call super():
152-
super().sync_complete(source, diff, flags, logger)
153-
```
49+
- Black, Pylint, Bandit, flake8, and pydocstyle for Python linting and formatting.
50+
- pytest, coverage, and unittest for unit tests.
51+
52+
# Questions
53+
Please see the [documentation](https://diffsync.readthedocs.io/en/latest/index.html) for detailed documentation on how to use `diffsync`. For any additional questions or comments, feel free to swing by the [Network to Code slack channel](https://networktocode.slack.com/) (channel #networktocode). Sign up [here](http://slack.networktocode.com/)
Lines changed: 128 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,128 @@
1+
2+
# Getting started
3+
4+
To be able to properly compare different datasets, DiffSync relies on a shared data model that both systems must use.
5+
Specifically, each system or dataset must provide a `DiffSync` "adapter" subclass, which in turn represents its dataset as instances of one or more `DiffSyncModel` data model classes.
6+
7+
When comparing two systems, DiffSync detects the intersection between the two systems (which data models they have in common, and which attributes are shared between each pair of data models) and uses this intersection to compare and/or synchronize the data.
8+
9+
## Define your model with DiffSyncModel
10+
11+
`DiffSyncModel` is based on [Pydantic](https://pydantic-docs.helpmanual.io/) and is using Python typing to define the format of each attribute.
12+
Each `DiffSyncModel` subclass supports the following class-level attributes:
13+
- `_modelname` - Defines the type of the model; used to identify common models between different systems (Mandatory)
14+
- `_identifiers` - List of instance field names used as primary keys for this object (Mandatory)
15+
- `_shortname` - List of instance field names to use for a shorter name (Optional)
16+
- `_attributes` - List of non-identifier instance field names for this object; used to identify the fields in common between data models for different systems (Optional)
17+
- `_children` - Dict of `{<model_name>: <field_name>}` indicating which fields store references to child data model instances. (Optional)
18+
19+
> DiffSyncModel instances must be uniquely identified by their unique ID (or, in database terminology, [natural key](https://en.wikipedia.org/wiki/Natural_key)), which is composed of the union of all fields defined in `_identifiers`. The unique ID must be globally meaningful (such as an unique instance name or slug), as it is used to identify object correspondence between differing systems or data sets. It **must not** be a value that is only locally meaningful to a specific data set, such as a database primary key value.
20+
21+
> Only fields listed in `_identifiers`, `_attributes`, or `_children` will be potentially included in comparison and synchronization between systems or data sets. Any other fields will be ignored; this allows for a model to additionally contain fields that are only locally relevant (such as database primary key values) and therefore are irrelevant to comparison and synchronization.
22+
23+
```python
24+
from typing import List, Optional
25+
from diffsync import DiffSyncModel
26+
27+
class Site(DiffSyncModel):
28+
_modelname = "site"
29+
_identifiers = ("name",)
30+
_shortname = ()
31+
_attributes = ("contact_phone",)
32+
_children = {"device": "devices"}
33+
34+
name: str
35+
contact_phone: Optional[str]
36+
devices: List = list()
37+
database_pk: Optional[int] # not listed in _identifiers/_attributes/_children as it's only locally significant
38+
```
39+
40+
### Relationship between models
41+
42+
Currently the relationships between models are very loose by design. Instead of storing an object, it's recommended to store the unique id of an object and retrieve it from the store as needed. The `add_child()` API of `DiffSyncModel` provides this behavior as a default.
43+
44+
## Define your system adapter with DiffSync
45+
46+
A `DiffSync` "adapter" subclass must reference each model available at the top of the object by its modelname and must have a `top_level` attribute defined to indicate how the diff and the synchronization should be done. In the example below, `"site"` is the only top level object so the synchronization engine will only check all known `Site` instances and all children of each Site. In this case, as shown in the code above, `Device`s are children of `Site`s, so this is exactly the intended logic.
47+
48+
```python
49+
from diffsync import DiffSync
50+
51+
class BackendA(DiffSync):
52+
53+
site = Site
54+
device = Device
55+
56+
top_level = ["site"]
57+
```
58+
59+
It's up to the implementer to populate the `DiffSync`'s internal cache with the appropriate data. In the example below we are using the `load()` method to populate the cache but it's not mandatory, it could be done differently.
60+
61+
## Store data in a `DiffSync` object
62+
63+
To add a site to the local cache/store, you need to pass a valid `DiffSyncModel` object to the `add()` function.
64+
65+
```python
66+
class BackendA(DiffSync):
67+
[...]
68+
69+
def load(self):
70+
# Store an individual object
71+
site = self.site(name="nyc")
72+
self.add(site)
73+
74+
# Store an object and define it as a child of another object
75+
device = self.device(name="rtr-nyc", role="router", site_name="nyc")
76+
self.add(device)
77+
site.add_child(device)
78+
```
79+
80+
## Update remote system on sync
81+
82+
When data synchronization is performed via `sync_from()` or `sync_to()`, DiffSync automatically updates the in-memory
83+
`DiffSyncModel` objects of the receiving adapter. The implementer of this class is responsible for ensuring that any remote system or data store is updated correspondingly. There are two usual ways to do this, depending on whether it's more
84+
convenient to manage individual records (as in a database) or modify the entire data store in one pass (as in a file-based data store).
85+
86+
### Manage individual records
87+
88+
To update individual records in a remote system, you need to extend your `DiffSyncModel` class(es) to define your own `create`, `update` and/or `delete` methods for each model.
89+
A `DiffSyncModel` instance stores a reference to its parent `DiffSync` adapter instance in case you need to use it to look up other model instances from the `DiffSync`'s cache.
90+
91+
```python
92+
class Device(DiffSyncModel):
93+
[...]
94+
95+
@classmethod
96+
def create(cls, diffsync, ids, attrs):
97+
## TODO add your own logic here to create the device on the remote system
98+
# Call the super().create() method to create the in-memory DiffSyncModel instance
99+
return super().create(ids=ids, diffsync=diffsync, attrs=attrs)
100+
101+
def update(self, attrs):
102+
## TODO add your own logic here to update the device on the remote system
103+
# Call the super().update() method to update the in-memory DiffSyncModel instance
104+
return super().update(attrs)
105+
106+
def delete(self):
107+
## TODO add your own logic here to delete the device on the remote system
108+
# Call the super().delete() method to remove the DiffSyncModel instance from its parent DiffSync adapter
109+
super().delete()
110+
return self
111+
```
112+
113+
### Bulk/batch modifications
114+
115+
If you prefer to update the entire remote system with the final state after performing all individual create/update/delete operations (as might be the case if your "remote system" is a single YAML or JSON file), the easiest place to implement this logic is in the `sync_complete()` callback method that is automatically invoked by DiffSync upon completion of a sync operation.
116+
117+
```python
118+
class BackendA(DiffSync):
119+
[...]
120+
121+
def sync_complete(self, source: DiffSync, diff: Diff, flags: DiffSyncFlags, logger: structlog.BoundLogger):
122+
## TODO add your own logic to update the remote system now.
123+
# The various parameters passed to this method are for your convenience in implementing more complex logic, and
124+
# can be ignored if you do not need them.
125+
#
126+
# The default DiffSync.sync_complete() method does nothing, but it's always a good habit to call super():
127+
super().sync_complete(source, diff, flags, logger)
128+
```

docs/source/getting_started/index.rst

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,4 @@
22
Getting Started
33
###############
44

5-
.. mdinclude:: ../../../README.md
6-
:start-line: 28
7-
:end-line: 153
5+
.. mdinclude:: 01-getting-started.md

0 commit comments

Comments
 (0)