Skip to content

SwissDataScienceCenter/mobiko_nlp

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

29 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Mobiko NLP

A biodiversity information extraction pipeline using NLP techniques.

Overview

This project provides tools for extracting and classifying biodiversity-related entities from text documents using:

  • BERT-based Named Entity Recognition (NER). Work in progress!
  • LLM-based extraction for biodiversity entity classification with structured schemas (Demo version).
  • spaCy for text processing and noun phrase extraction

Installation

Docker Compose Usage

# Start the development container
docker-compose up -d

# Run commands inside the container
docker-compose exec biodiv python src/ner/bert_ner_baseline.py --in_dir data --out_jsonl output/ner_results.jsonl
docker-compose exec biodiv python src/demo/demo.py --in_dir data --out_jsonl output/demo_results.jsonl

# Run one-off tasks without starting the persistent container
docker-compose run --rm biodiv python src/ner/bert_ner_baseline.py --help

# Stop the container
docker-compose down

Local Installation

If installing locally, refer to Dockerfile for exact dependencies:

pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu
pip install -r requirements.txt

Download spaCy model:

python -m spacy download en_core_web_trf

For OpenAI integration, set your API key:

export OPENAI_API_KEY="your-api-key-here"

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published