Mobiko NLP

A biodiversity information extraction pipeline using NLP techniques.

Overview

This project provides tools for extracting and classifying biodiversity-related entities from text documents using:

BERT-based Named Entity Recognition (NER). Work in progress!
LLM-based extraction for biodiversity entity classification with structured schemas (Demo version).
spaCy for text processing and noun phrase extraction

Installation

Docker Compose Usage

# Start the development container
docker-compose up -d

# Run commands inside the container
docker-compose exec biodiv python src/ner/bert_ner_baseline.py --in_dir data --out_jsonl output/ner_results.jsonl
docker-compose exec biodiv python src/demo/demo.py --in_dir data --out_jsonl output/demo_results.jsonl

# Run one-off tasks without starting the persistent container
docker-compose run --rm biodiv python src/ner/bert_ner_baseline.py --help

# Stop the container
docker-compose down

Local Installation

If installing locally, refer to Dockerfile for exact dependencies:

pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu
pip install -r requirements.txt

Download spaCy model:

python -m spacy download en_core_web_trf

For OpenAI integration, set your API key:

export OPENAI_API_KEY="your-api-key-here"

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
config		config
scripts		scripts
src		src
Dockerfile		Dockerfile
README.md		README.md
docker-compose.yml		docker-compose.yml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Mobiko NLP

Overview

Installation

Docker Compose Usage

Local Installation

About

Uh oh!

Releases

Packages

Languages

SwissDataScienceCenter/mobiko_nlp

Folders and files

Latest commit

History

Repository files navigation

Mobiko NLP

Overview

Installation

Docker Compose Usage

Local Installation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages