This project is a multimodal, voice-enabled assistant designed for the civil engineering domain. It uses Azure AI services and a Retrieval-Augmented Generation (RAG) pipeline, all integrated into a simple web interface.
- Azure Speech Service for voice-to-text: Ask questions via voice input, with Azure Speech Service handling speech-to-text conversion for a seamless experience.
- Multimodal Input Support: Upload and analyze multiple PDFs and images (e.g., structural drawings, charts). The system extracts relevant information using Azure Form Recognizer and Computer Vision.
- A custom RAG pipeline for domain-specific contextual answers: Retrieve contextual answers using a custom Retrieval-Augmented Generation (RAG) pipeline built on top of Azure OpenAI (GPT-4o mini).
This assistant is designed as a proof-of-concept to explore how generative AI can support various tasks in civil and structural engineering. While not fully tested for production use, it shows potential in aiding the following areas:
- Assists in creating reports and documentation
- Helps explain design concepts, constraints, and material properties
- Offers basic natural language interaction to discuss simulation inputs and interpret summary-level results.
- Suggests materials based on application context
- Retrieves material property data from uploaded docs or external sources
- Can assist in early-stage brainstorming for retrofitting strategies and simulate conversational walkthroughs for damage scenarios.
- Demonstrates how sensor readings could be interpreted conversationally and how trends or anomalies might be discussed for proactive maintenance.
Acts as the main conversational model, handling natural language understanding, generation, and document-based QA using RAG.
Documents used in the Retrieval-Augmented Generation (RAG) component include:
- PDF textbooks and manuals on structural engineering
- Seismic design standards
- Research papers and datasets on material properties
- Uploaded reports and technical documentation
Indexes uploaded documents (PDFs, images) and retrieves relevant text snippets using vector search during user queries
Extracts text from uploaded images using OCR and document layout understanding (used in RAG pipeline).
Parses complex documents like PDFs, design specs, and scanned engineering drawings to make them queryable.
Converts user speech input into text for voice-enabled chat, and optionally converts responses back to audio for accessibility.
- Rane, N., Choudhary, S., & Rane, J. (2024). Transforming the Civil Engineering Sector with Generative Artificial Intelligence, such as ChatGPT or Bard. SSRN. https://dx.doi.org/10.2139/ssrn.4681718
- Aluga, M. (2023). Application of CHATGPT in civil engineering. East African Journal of Engineering, 6(1), 104-112. https://doi.org/10.37284/eaje.6.1.1272


