Skip to content

JosueAfouda/ai-flooring-pdf-analyzer

Repository files navigation

Flooring & Epoxy Scope Analyzer (MVP)

1. Project Overview

This application is an internal decision-support web product for a flooring contractor who reviews construction bid documents. It centralizes project PDFs, extracts flooring and epoxy scope details, highlights risk signals, and produces a structured recommendation (BID, REVIEW, or PASS).

The business problem is practical: bid packages are large, inconsistent, and time-sensitive. Important scope details can be buried across multiple plans and specifications. This tool reduces manual scanning time and makes decisions more consistent by turning unstructured documents into a traceable, review-ready output.

Why it matters for this domain:

  • Flooring and epoxy scope often appears across scattered sections.
  • Missed exclusions or ambiguous language can create margin risk.
  • Fast, evidence-backed summaries improve bid/no-bid judgment.

2. Product Vision & Functional Scope

What the MVP does

  • Authenticated single-admin access.
  • Project creation and editing with bid metadata.
  • Upload of 1–10 project PDFs.
  • Two-pass extraction focused on flooring/epoxy scope.
  • Risk flag detection and recommendation output.
  • Downloadable project summary PDF.
  • Searchable/filterable project library.
  • Admin deletion and basic usage metrics.

What V1 deliberately does not do

  • Multi-user roles or team workflows.
  • External platform scraping/automation.
  • Quantity takeoff, geometry, or plan measurement.
  • Mobile application support.

Workflow fit

The product sits between document intake and final bid strategy: ingest documents, synthesize scope/risk, then support a go/no-go decision with references.

3. Core Concepts

  • Project: one bid opportunity with metadata (type, source, date, notes) and attached documents.
  • Construction documents (PDFs): plans/specs/addenda uploaded for analysis.
  • Scope extraction: conversion of document text into structured, decision-ready fields.
  • Flooring / epoxy scope: explicit requirements, inclusions, exclusions, materials, prep/coating expectations.
  • Risk flags: warnings about ambiguity, exclusions, assumptions, or potential commercial risk.
  • Recommendation (BID / REVIEW / PASS):
    • BID: generally clear and actionable scope.
    • REVIEW: potentially viable but requires manual clarification.
    • PASS: insufficient fit or low-confidence scope value.
  • References: evidence objects attached to extracted items (file, page, excerpt) for verification.
  • Two-pass extraction:
    • Pass 1 narrows relevant pages.
    • Pass 2 performs detailed structured extraction only on shortlisted pages.

4. Functional Walkthrough (End-to-End)

  1. Authentication: Admin signs in to access protected workflows.
  2. Project creation: A new bid project is created with context data.
  3. PDF upload: Relevant bid documents are attached to the project.
  4. AI extraction: System runs two-pass analysis and saves run status/results.
  5. Review output: User inspects extracted scope, risk flags, recommendation, and references.
  6. Decision support: User uses evidence-backed output to decide bid posture.
  7. Report generation: A structured PDF summary is generated and downloaded.
  8. Project library reuse: Past projects are searched/filtered for comparison and historical recall.

5. Feature-by-Feature Design Methodology

Project management

  • Why: bidding requires stable project context, not loose files.
  • Approach: model each bid as a durable entity with standardized fields.
  • Principle: metadata-first organization improves retrieval and reporting.

PDF ingestion

  • Why: source truth is document-based.
  • Approach: enforce file type/count/size constraints and project association.
  • Principle: controlled intake prevents invalid analysis and runtime drift.

Two-pass AI extraction

  • Why: reduce noise and computational cost while improving relevance.
  • Approach: identify likely pages first, then extract structured details.
  • Principle: scoped context yields more reliable outputs than whole-document prompting.

Reference capture

  • Why: business users must trust and verify outputs quickly.
  • Approach: attach file/page/excerpt evidence to extracted elements.
  • Principle: traceability is mandatory for operational trust.

PDF report generation

  • Why: decisions and handoffs need a shareable artifact.
  • Approach: deterministic summary layout with scope, risk, recommendation, references.
  • Principle: consistency improves communication quality.

Project library and search

  • Why: historical bids are strategic assets.
  • Approach: keyword search and domain filters (status/type/source/date).
  • Principle: retrieval speed enables learning across projects.

Admin and usage tracking

  • Why: internal owner needs operational control and visibility.
  • Approach: secure destructive actions + minimal usage counters.
  • Principle: lightweight governance without enterprise overhead.

6. Technical Architecture (High-Level)

The system uses a web frontend + API backend + Postgres persistence.

  • Frontend (React): handles user workflows, forms, filtering, and output presentation.
  • Backend (FastAPI): owns auth, business rules, extraction orchestration, report generation, and access control.
  • Database (Postgres): stores projects, files, extraction runs, reports, and counters.

Persistence strategy is intentional: uploaded PDFs and generated reports are stored in the database so data survives container restarts and free-tier hosting limitations.

Containerization is used for reproducible environments:

  • Local: Docker Compose runs API + Postgres.
  • Production (Render): Docker image + managed Postgres, without Compose.

7. AI & Automation Philosophy

AI is used as an accelerator for document interpretation, not as an autonomous decision-maker. The system automates extraction and structuring, then presents evidence for human judgment.

Reliability and trust are handled by:

  • Structured output contracts (schema-driven payloads).
  • Explicit run statuses (PENDING, RUNNING, SUCCESS, FAILED).
  • Evidence references on extracted items.
  • Clear distinction between AI suggestion and business decision.

References and structured JSON matter because they make outputs auditable, debuggable, and reusable across reporting and search.

8. Limitations & Future Evolution

Current MVP limitations

  • Single-admin model only.
  • Synchronous extraction execution.
  • Basic usage metrics (not full observability suite).
  • No external bidding platform integrations.

Likely next evolutions

  • Multi-user roles and permissions.
  • Background job queue for heavy extraction workloads.
  • Richer analytics and operational dashboards.
  • Enhanced model strategy and extraction confidence scoring.
  • Broader document intelligence workflows beyond flooring/epoxy.

The architecture is designed to evolve incrementally: clear service boundaries, explicit schemas, and migration-backed data contracts reduce rewrite risk.

9. Getting Started (Brief)

Run locally with Docker

cd /home/vant/Documents/business/ai-flooring-pdf-analyzer
docker compose up --build -d

Open:

  • http://localhost:8000

Default admin credentials:

  • Username: admin
  • Password: admin123

Optional quick validation:

python scripts/create_sample_pdfs.py
bash scripts/smoke.sh

Stop services:

docker compose down

Additional Notes

  • Local DB host port is 55432 (mapped to container 5432) to avoid conflicts.
  • Render deployment configuration is provided in render/render.yaml.
  • Environment variable baseline is documented in .env.example.

About

Internal web app for flooring contractors to manage projects, upload construction plan/spec PDFs, extract flooring and epoxy scopes using AI, and generate clean summary PDFs. Includes project storage, search, and review of past jobs.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors