Ad-Hoc Parser Truth

A tool to analyze ad-hoc string parsers from Python code snippets as part of the TYPES4STRINGS project.

It consists of a database population script and a web app. The population script takes a dataset of ad hoc parsers, consisting of a CSV and a file structure of code snippets. As a result of this script, a SQLite database is generated/populated. For a detailed description, refer to the Populate Database section.

The Streamlit web app visualizes the data and makes it searchable. The affiliation of projects to files and to the extracted code snippets is visualized in a tree-like structure. Combined with the code and metadata information, all at one glance. Further details on starting the web app can be found in the Web App section.

This tool can be seen as a first iteration. With further development, it can be used for annotating these ad hoc parsers with ground truth, resulting in a benchmark dataset that can be used for analyzing parsing programs.

Requirements

Minimum Python Version 3.10

Populate Database

Prerequisites

Initial User

An initial user must be set. This is a user who is saved in the database's User table. This user needs to be configured in config.py, located in the root folder, before the database population script is executed. For this, the config variables INITIAL_USER_NAME and INITIAL_USER_PW_HASH are relevant.

Dataset

For the default configuration, the source for the population script needs to be saved in the data folder, named analysis_results.csv. The CSV needs to be saved directly under the data folder. If the CSV is named differently or saved under another location, the variable CSV_DATA_PATH in the config.py can be changed according to your needs.
wie soll das csv ausschauen
- mit einem Beispiel csv abgespeichert in data/example ==TODO== ⚠️
The path for the code of the parser slices needs to be specified in the CSV in a column called file. (snippets from the original method as the annotated code)
The original code from the methods must be located under the data/ParserExamples/original_methods folder. ==TODO== ⚠️

Dataset Population Script

==TODO== ⚠️

nur wenn noch kein adhocparser.db file in data existiert werden Tabellen erstellt und initial User hinzugefügt
and is allowed to add datasets to the database

Parameter

<importedBy> ==TODO== ⚠️
- The name of the user that is linked to the populated dataset. The user has to exist in the User table of the database.

Run Commands

cd dataset_population/

pip install -r requirements.txt

python3 ./populate_db.py <importedBy>

Database Schema

Web App

Start Web App

cd web-app/

pip install -r requirements.txt

streamlit run app.py
# or
python3 -m streamlit run app.py

Open Topics

Open topics are tracked as GitHub Issues.

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
assets		assets
data		data
dataset_population		dataset_population
web-app		web-app
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
config.py		config.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Ad-Hoc Parser Truth

Requirements

Populate Database

Prerequisites

Initial User

Dataset

Dataset Population Script

Parameter

Run Commands

Database Schema

Web App

Start Web App

Open Topics

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

ipa-lab/parser-truth

Folders and files

Latest commit

History

Repository files navigation

Ad-Hoc Parser Truth

Requirements

Populate Database

Prerequisites

Initial User

Dataset

Dataset Population Script

Parameter

Run Commands

Database Schema

Web App

Start Web App

Open Topics

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages