-
Notifications
You must be signed in to change notification settings - Fork 195
Fix #840: Add GraphML serializer #1464
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This commit adds a GraphML serializer:
```python
def write_graphml(
graphs: list[PyGraph | PyDiGraph],
keys: list[tuple[str, Domain, str, Type, Any]],
path: str,
/,
compression: str | None = ...,
) -> None: ...
```
`keys` is a list of tuples: id, domain, name of the key, type, and
default value. This commit also introduces the
`read_graphml_with_keys` function, which returns the key definitions
in the same format, along with the list of parsed graphs.
The implementation preserves the ids of graphs, nodes, and edges when
possible. If some ids conflict, fresh ids are generated in the written
GraphML file. The `read_graphml` function has also been updated to
store the graph id in the graph attributes, just like node and edge
ids are stored in the corresponding attributes.
The `write_graphml` function supports gzip compression, as does
`read_graphml`.
Note that the JSON node-link serializer (the other part of Qiskit#840) was
already implemented in Qiskit#1091.
Compared to Qiskit#1462:
- Keys are passed explicitly instead of being inferred (which allows
to use the types `float` and `int`, and to use default values);
- Attributes for graphs, nodes, and edges are taken from the weight of elements, instead of relying on callbacks. This allows write_graphml to act as a proper reciprocal of read_graphml. Round-trip tests have been added.
- IDs are taken from attributes when possible, instead of being generated from indices.
- Multiple graphs can be written to the same file.
- Gzip compression is supported.
- Tests have been added.
Regarding @IvanIsCoding's
comment (Qiskit#1462 (comment)),
about using https://github.com/jonasbb/petgraph-graphml:
- Rustworkx's `graphml.rs` introduces an internal `Graph` data
structure, which is used for `read_graphml`. It is natural to
have `write_graphml` rely on the same data structure.
- `petgraph-graphml` generates ids from indices, which prevents us
from preserving ids accross the `read_graphml`/`write_graphml` round
trip.
IvanIsCoding
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will add more comments later, but I agree that reusing the existing code from reading XML is the best approach
src/graphml.rs
Outdated
| Ok(()) | ||
| } | ||
|
|
||
| fn get_keys(&self, domain: Domain) -> &IndexMap<String, Key> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use DictMap instead of IndexMap
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function returns the key_for_* fields of struct GraphML, which are defined as IndexMap with the default std::hash::RandomState. If I change it here to DictMap, I would also need to change the type in the definition of struct GraphML, which is not introduced in this PR.
Should I go ahead and change it here, or would it be better to open an issue to update the definition of struct GraphML in a separate PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DictMap is just IndexMap with a faster hasher. It should be safe to replace
Pull Request Test Coverage Report for Build 15707571026Details
💛 - Coveralls |
IvanIsCoding
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, this is pretty close to being merged but some comments on my opinion of supporting GraphML.
- For
read_graphml, we don't know which library wrote it so we should do our best to support most things specified in the standard. - For
write_graphml, I think it's ok to have a simplified version. As long as we write a subset that is valid by the standard and is understood by other libraries, we should be fine. I don't think we should support all the expressiviness of GraphML like serializing multiple graphs.
My vision for this PR is to have code where rustworkx.write_graphml(graph, "graph.xml") just works.
| b"edge" => Ok(Domain::Edge), | ||
| b"graph" => Ok(Domain::Graph), | ||
| b"all" => Ok(Domain::All), | ||
| _ => Err(()), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to include a return message explaining why a call to this failed though? Right now that is not possible
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The intention was for the failure cause to be known and owned by the caller, who can inject the error context that makes sense for their usage. As shown in this usage, the function returns a Result<..., ()>, and the responsibility of interpreting or enriching the error lies with the calling code. Typing ensures that this is done properly, since there is no From<()> defined for the error types we use.
src/graphml.rs
Outdated
| pygraph: &StablePyGraph<Ty>, | ||
| attrs: &PyObject, | ||
| ) -> PyResult<Self> { | ||
| let mut attrs: Option<std::collections::HashMap<String, Value>> = attrs.extract(py).ok(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use HashMap from https://crates.io/crates/hashbrown or our own DictMap (which is just an alias for IndexMap with a fast hasher)
src/graphml.rs
Outdated
| Ok(()) | ||
| } | ||
|
|
||
| fn get_keys(&self, domain: Domain) -> &IndexMap<String, Key> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DictMap is just IndexMap with a faster hasher. It should be safe to replace
|
Another comment for
Why am I mentioning this? For this PR it doesn't matter and it's something we can fix later. But ideally, if we called |
|
I have replaced all occurrences of |
IvanIsCoding
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. There are some minor details like adding a docstring, making the function in rustworkx/__init__.py consistent with others, etc. I will send a PR to fix those later.
For all intents and purposes, this completes 99% of the requirements for serialiizing graphs to XML. It is a great addition and it should be awarded the bounty for #840.
* Fix Qiskit#840: Add GraphML serializer This commit adds a GraphML serializer: ```python def write_graphml( graphs: list[PyGraph | PyDiGraph], keys: list[tuple[str, Domain, str, Type, Any]], path: str, /, compression: str | None = ..., ) -> None: ... ``` `keys` is a list of tuples: id, domain, name of the key, type, and default value. This commit also introduces the `read_graphml_with_keys` function, which returns the key definitions in the same format, along with the list of parsed graphs. The implementation preserves the ids of graphs, nodes, and edges when possible. If some ids conflict, fresh ids are generated in the written GraphML file. The `read_graphml` function has also been updated to store the graph id in the graph attributes, just like node and edge ids are stored in the corresponding attributes. The `write_graphml` function supports gzip compression, as does `read_graphml`. Note that the JSON node-link serializer (the other part of Qiskit#840) was already implemented in Qiskit#1091. Compared to Qiskit#1462: - Keys are passed explicitly instead of being inferred (which allows to use the types `float` and `int`, and to use default values); - Attributes for graphs, nodes, and edges are taken from the weight of elements, instead of relying on callbacks. This allows write_graphml to act as a proper reciprocal of read_graphml. Round-trip tests have been added. - IDs are taken from attributes when possible, instead of being generated from indices. - Multiple graphs can be written to the same file. - Gzip compression is supported. - Tests have been added. Regarding @IvanIsCoding's comment (Qiskit#1462 (comment)), about using https://github.com/jonasbb/petgraph-graphml: - Rustworkx's `graphml.rs` introduces an internal `Graph` data structure, which is used for `read_graphml`. It is natural to have `write_graphml` rely on the same data structure. - `petgraph-graphml` generates ids from indices, which prevents us from preserving ids accross the `read_graphml`/`write_graphml` round trip. * Fix clippy comments * Prefix types with GraphML Suggested by @IvanIsCoding: Qiskit#1464 (comment) * Black * Add release notes * Fix stubs error * Add documentation * Remove read_graphml_with_keys / write_graphml for single graph only * Use `DictMap` everywhere Suggested by @IvanIsCoding: Qiskit#1464 (comment) * rustfmt and clippy * Remove unused math module (ruff check) * Use `from __future__ import annotations` for Python <3.10 * Add stub for `write_graphml` * Remove `read_graphml_with_keys` from documentation * Apply suggestions from code review * Update rustworkx/__init__.py * Apply suggestions from code review --------- Co-authored-by: Ivan Carvalho <[email protected]>
This commit adds a GraphML serializer:
keysis a list of tuples: id, domain, name of the key, type, and default value. This commit also introduces theread_graphml_with_keysfunction, which returns the key definitions in the same format, along with the list of parsed graphs.The implementation preserves the ids of graphs, nodes, and edges when possible. If some ids conflict, fresh ids are generated in the written GraphML file. The
read_graphmlfunction has also been updated to store the graph id in the graph attributes, just like node and edge ids are stored in the corresponding attributes.The
write_graphmlfunction supports gzip compression, as doesread_graphml.Note that the JSON node-link serializer (the other part of #840) was already implemented in #1091.
Compared to #1462:
Keys are passed explicitly instead of being inferred (which allows to use the types
floatandint, and to use default values);Attributes for graphs, nodes, and edges are taken from the weight of elements, instead of relying on callbacks. This allows write_graphml to act as a proper reciprocal of read_graphml. Round-trip tests have been added.
IDs are taken from attributes when possible, instead of being generated from indices.
Multiple graphs can be written to the same file.
Gzip compression is supported.
Tests have been added.
Regarding @IvanIsCoding's
comment (#1462 (comment)), about using https://github.com/jonasbb/petgraph-graphml:
Rustworkx's
graphml.rsintroduces an internalGraphdata structure, which is used forread_graphml. It is natural to havewrite_graphmlrely on the same data structure.petgraph-graphmlgenerates ids from indices, which prevents us from preserving ids accross theread_graphml/write_graphmlround trip.