Skip to content

Add LLM-based tool for auditing Rocq spec–implementation consistency#10

Open
Valentin889 wants to merge 7 commits intomainfrom
vs/autoformalization/comment-code-audit
Open

Add LLM-based tool for auditing Rocq spec–implementation consistency#10
Valentin889 wants to merge 7 commits intomainfrom
vs/autoformalization/comment-code-audit

Conversation

@Valentin889
Copy link
Collaborator

Add LLM-based tool for auditing Rocq spec–implementation consistency

This PR introduces a tool to automatically audit the consistency between specification comments and their Rocq implementations in the mechanization.

The tool extracts Definition and Fixpoint blocks from .v files and sends them to an LLM to check whether the implementation syntactically follows the steps described in the specification comments. The goal is to help detect typos, missing steps, or mismatches introduced during the manual translation of the ECMAScript specification into Rocq.

Added components

  • comment_code_audit.py
    Main script that:

    • Extracts definitions from Rocq files
    • Queries an LLM with configurable prompts
    • Stores results
    • Generates an HTML review report
  • Configuration files

    • config.json – model configuration and generation parameters
    • prompts.json – system prompt and audit prompts
  • Environment setup

    • environment.yml – Conda environment definition
    • requirements.txt – Python dependencies
  • Project utilities

    • .gitignore – ignores .env and generated results/
    • README.md – documentation and usage instructions

Output

Running the script produces:

  • JSON logs containing the full prompts and model responses
  • HTML reports for easier manual inspection

Results are stored under:

results/YYYY-MM-DD/MODEL_NAME/

@Valentin889 Valentin889 self-assigned this Mar 16, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant