This project proposes the development of a system for analyzing different types of handwritten diagrams and converting them into well-rendered images through a textual syntax. The goal is to create a tool capable of analyzing scanned or photographed sketches of diagrams and automatically generating code that can be rendered into the same diagrams digitally, to eventually integrate or modify them.
- Install D2 CLI from official repository: https://github.com/terrastruct/d2
Try d2 installation.
d2 --version- Install dependencies using Python 3.12
conda create --name diagram python=3.12
conda activate diagram
pip3 install torch opencv-python matplotlib requests pillow pandas torchvision numpy shapely transformers sentencepiece protobuf torchmetrics scikit-learn- Check DIAGRAM installation
python src/main.py -hCLI options can be visualized thanks to flag: -h
CLI is in src/main.py
python src/main.py -hCLI parameters are:
--input path/to/image1.png path/to/image2.png ...provides input images (even more than one)--classifier path/to/classifier_weights.pthweights of classifier network--bbox-detector path/to/object_detector_weights.pthweights of object detection network--outputs-dir-path path/to/output_dirdirectory in which outputs will be dumped--then-compileflag to compile markup language file into images--markups d2-lang mermaidto specify markup languages
We advise you to tune thresholds based on your drawing style. We have tuned thresholds for common diagrams.
For example:
--element_arrow_distance_threshold 260
python src/main.py --input demo/easy_graph.png --classifier demo/classifier_weights.pth --bbox-detector demo/object_detector_weights.pth --outputs-dir-path demo/outcome --then-compile --element_arrow_distance_threshold 150python src/main.py --input demo/hard_graph.png --classifier demo/classifier_weights.pth --bbox-detector demo/object_detector_weights.pth --outputs-dir-path demo/outcome --then-compile --element_arrow_distance_threshold 250Warning
An extra node will be found, because a self arrow is recognized as a node. Anyway, you can remove it from .d2 markup file.
Non-Maximum Suppression is useless because "Node" label has a greater score respects to "Arrow" label.
python src/main.py --input demo/easy_flowchart.png --classifier demo/classifier_weights.pth --bbox-detector demo/object_detector_weights.pth --outputs-dir-path demo/outcome --then-compile --element_arrow_distance_threshold 350python src/main.py --input demo/hard_flowchart.png --classifier demo/classifier_weights.pth --bbox-detector demo/object_detector_weights.pth --outputs-dir-path demo/outcome --then-compile --element_arrow_distance_threshold 350System's components:
- Preprocessor: pre-elaborates images, e.g. straighten images
- Classifier: classifies input images (e.g.
graph-diagram,flowchart) - Extractor: extracts and builds agnostic representation of input diagram
- Transducer: converts agnostic representation of a diagram into a specific markup language (e.g. Mermaid)
- Compiler: produces an input from a markup language file
- Orchestrator: manages other components
The classifier network is used to determine which extraction module to use.
Each extractor is specialized for a single type of diagram.
For example, given an input image of a graph:
The classifier outputs graph-diagram, so the orchestrator forwards the input image to the graph diagram extractor.
The graph diagram extractor produces the graph's adjacency matrix, where the nodes and external nodes (to handle arrows starting from nowhere) are represented in the rows and columns. The value is a non-negative integer indicating the number of connections (the matrix position indicates the source and destination). Additionally, it generates lookup data structures for arrow annotations and node text.
The orchestrator then sends this to the corresponding transducers (based on user input or all available ones) for that type of diagram, which will generate the translations into the target lookup language. For example, in Mermaid, the output might be something like:
flowchart TD
0(Y = X
X =T)
1{X > Y}
2(T = Y)
3(( ))
4(( ))
1-->|Else|4
3-->1
2-->0
0-->4
1-->|Then|2
Then, markup language is compiled using associated compiler.






