The Indian Address Parser is an advanced Natural Language Processing (NLP) tool designed to extract structured address information from unstructured text and complex PDF documents. It utilizes spaCy, Regex-based pattern matching, and custom entity recognition to efficiently identify and extract addresses.
- 📄 Extracts addresses from PDF files and raw text
- 🔍 Uses NLP & Named Entity Recognition (NER) for accurate parsing
- 🗺️ Identifies cities, states, PIN codes, and localities
- ⚡ Optimized for large-scale documents
- 📥 Download extracted addresses in a structured format
To use this project locally, follow these steps:
-
Clone this repository:
git clone https://github.com/Adityagupta-dev/Indian-Address-Parser.git cd Indian-Address-Parser
-
Create a virtual environment (optional but recommended):
python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate
-
Install dependencies:
pip install -r requirements.txt
-
Run the Streamlit app:
streamlit run app.py
- Click Upload a PDF to extract addresses automatically.
- The extracted addresses will be displayed along with confidence scores and structured components.
- Paste text containing addresses in the text box.
- The extracted addresses will be displayed along with confidence scores and structured components.
- The extracted addresses can be downloaded as a structured text file.
🚧 Version 2 is coming soon! 🚧
- Improved address extraction accuracy
- Support for additional document formats
- More robust NLP models
- Customization options for user-specific needs
Contributions are welcome! If you find any issues or have suggestions, feel free to open an issue or submit a pull request.
For any queries, feel free to connect with me on LinkedIn. .
This project is licensed under the MIT License. You are free to use, modify, and distribute it, but attribution is required. See the LICENSE file for more details.
⭐ If you find this project useful, don't forget to star the repo! ⭐