Skip to content

The Indian Address Parser is an advanced Natural Language Processing (NLP) tool designed to extract structured address information from unstructured text or complex PDF documents. It utilizes a combination of spaCy, Regex-based pattern matching, and custom entity recognition to accurately identify and format Indian addresses.

License

Notifications You must be signed in to change notification settings

Adityagupta-dev/Indian-Address-Parser

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

24 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Indian Address Parser

📍 Overview

The Indian Address Parser is an advanced Natural Language Processing (NLP) tool designed to extract structured address information from unstructured text and complex PDF documents. It utilizes spaCy, Regex-based pattern matching, and custom entity recognition to efficiently identify and extract addresses.

🚀 Features

  • 📄 Extracts addresses from PDF files and raw text
  • 🔍 Uses NLP & Named Entity Recognition (NER) for accurate parsing
  • 🗺️ Identifies cities, states, PIN codes, and localities
  • ⚡ Optimized for large-scale documents
  • 📥 Download extracted addresses in a structured format

🔧 Installation

To use this project locally, follow these steps:

  1. Clone this repository:

    git clone https://github.com/Adityagupta-dev/Indian-Address-Parser.git
    cd Indian-Address-Parser
  2. Create a virtual environment (optional but recommended):

    python -m venv venv
    source venv/bin/activate  # On Windows: venv\Scripts\activate
  3. Install dependencies:

    pip install -r requirements.txt
  4. Run the Streamlit app:

    streamlit run app.py

📂 Usage

1️⃣ Upload a PDF File

  • Click Upload a PDF to extract addresses automatically.
  • The extracted addresses will be displayed along with confidence scores and structured components.

2️⃣ Enter Text Manually

  • Paste text containing addresses in the text box.
  • The extracted addresses will be displayed along with confidence scores and structured components.

3️⃣ Download Extracted Addresses

  • The extracted addresses can be downloaded as a structured text file.

🏗️ Work in Progress

🚧 Version 2 is coming soon! 🚧

  • Improved address extraction accuracy
  • Support for additional document formats
  • More robust NLP models
  • Customization options for user-specific needs

🤝 Contributing

Contributions are welcome! If you find any issues or have suggestions, feel free to open an issue or submit a pull request.

📞 Contact

For any queries, feel free to connect with me on LinkedIn. .

📜 License

This project is licensed under the MIT License. You are free to use, modify, and distribute it, but attribution is required. See the LICENSE file for more details.

If you find this project useful, don't forget to star the repo!

About

The Indian Address Parser is an advanced Natural Language Processing (NLP) tool designed to extract structured address information from unstructured text or complex PDF documents. It utilizes a combination of spaCy, Regex-based pattern matching, and custom entity recognition to accurately identify and format Indian addresses.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages