Skip to content

nricciardi/diagram

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

519 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

D.I.A.G.R.A.M.: Development of Image Analysis for Graph Recognition And Modeling

This project proposes the development of a system for analyzing different types of handwritten diagrams and converting them into well-rendered images through a textual syntax. The goal is to create a tool capable of analyzing scanned or photographed sketches of diagrams and automatically generating code that can be rendered into the same diagrams digitally, to eventually integrate or modify them.

Get Started

Install

  1. Install D2 CLI from official repository: https://github.com/terrastruct/d2

Try d2 installation.

d2 --version
  1. Install dependencies using Python 3.12
conda create --name diagram python=3.12

conda activate diagram

pip3 install torch opencv-python matplotlib requests pillow pandas torchvision numpy shapely transformers sentencepiece protobuf torchmetrics scikit-learn
  1. Check DIAGRAM installation
python src/main.py -h

Example

CLI options can be visualized thanks to flag: -h

CLI is in src/main.py

python src/main.py -h

CLI parameters are:

  • --input path/to/image1.png path/to/image2.png ... provides input images (even more than one)
  • --classifier path/to/classifier_weights.pth weights of classifier network
  • --bbox-detector path/to/object_detector_weights.pth weights of object detection network
  • --outputs-dir-path path/to/output_dir directory in which outputs will be dumped
  • --then-compile flag to compile markup language file into images
  • --markups d2-lang mermaid to specify markup languages

We advise you to tune thresholds based on your drawing style. We have tuned thresholds for common diagrams.

For example:

  • --element_arrow_distance_threshold 260

Demo

Graph

Easy Graph
python src/main.py --input demo/easy_graph.png --classifier demo/classifier_weights.pth --bbox-detector demo/object_detector_weights.pth --outputs-dir-path demo/outcome --then-compile --element_arrow_distance_threshold 150

Easy graph

Hard Graph
python src/main.py --input demo/hard_graph.png --classifier demo/classifier_weights.pth --bbox-detector demo/object_detector_weights.pth --outputs-dir-path demo/outcome --then-compile --element_arrow_distance_threshold 250

Warning

An extra node will be found, because a self arrow is recognized as a node. Anyway, you can remove it from .d2 markup file.

Non-Maximum Suppression is useless because "Node" label has a greater score respects to "Arrow" label.

Hard graph

Outcome

Flowchart

Easy Flowchart
python src/main.py --input demo/easy_flowchart.png --classifier demo/classifier_weights.pth --bbox-detector demo/object_detector_weights.pth --outputs-dir-path demo/outcome --then-compile --element_arrow_distance_threshold 350

Easy flowchart

Outcome D2

Outcome Mermaid

Hard Flowchart
python src/main.py --input demo/hard_flowchart.png --classifier demo/classifier_weights.pth --bbox-detector demo/object_detector_weights.pth --outputs-dir-path demo/outcome --then-compile --element_arrow_distance_threshold 350

Hard flowchart

Outcome D2

Outcome Mermaid

Project Overview

System's components:

  • Preprocessor: pre-elaborates images, e.g. straighten images
  • Classifier: classifies input images (e.g. graph-diagram, flowchart)
  • Extractor: extracts and builds agnostic representation of input diagram
  • Transducer: converts agnostic representation of a diagram into a specific markup language (e.g. Mermaid)
  • Compiler: produces an input from a markup language file
  • Orchestrator: manages other components

Overview

The classifier network is used to determine which extraction module to use.

Each extractor is specialized for a single type of diagram.

For example, given an input image of a graph:

Input

The classifier outputs graph-diagram, so the orchestrator forwards the input image to the graph diagram extractor.

The graph diagram extractor produces the graph's adjacency matrix, where the nodes and external nodes (to handle arrows starting from nowhere) are represented in the rows and columns. The value is a non-negative integer indicating the number of connections (the matrix position indicates the source and destination). Additionally, it generates lookup data structures for arrow annotations and node text.

The orchestrator then sends this to the corresponding transducers (based on user input or all available ones) for that type of diagram, which will generate the translations into the target lookup language. For example, in Mermaid, the output might be something like:

flowchart TD
	0(Y = X
	X =T)
	1{X > Y}
	2(T = Y)
	3(( ))
	4(( ))

	1-->|Else|4
	3-->1
	2-->0
	0-->4
	1-->|Then|2

Then, markup language is compiled using associated compiler.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors