Skip to content
Open
Show file tree
Hide file tree
Changes from 29 commits
Commits
Show all changes
34 commits
Select commit Hold shift + click to select a range
e763ce0
build(requirements.txt): add initial requirements.txt
YazanShannak Jul 30, 2023
5a50078
feat(Nuha): implement Nuha model inference
YazanShannak Jul 30, 2023
93ed42f
feat(main.py): integrate Nuha in the predict endpoint
YazanShannak Jul 30, 2023
bc5fd2a
fix(main.py): login to huggingface hub
YazanShannak Jul 30, 2023
63791cd
build(Dockerfile): write initial Dockerfile
YazanShannak Jul 30, 2023
c091c72
docs(README.md): Update README.md
YazanShannak Jul 30, 2023
32ce490
feat: added original post and comments to the response
mbaraa Aug 7, 2023
1f0de5d
chore: removed unnecessary async from the request handler
mbaraa Aug 7, 2023
353327f
Merge pull request #2 from jordanopensource/feat/map-post-and-comment…
thamudi Aug 7, 2023
f1a1197
perf(requirements.txt): remove CUDA dependencies
YazanShannak Aug 9, 2023
490b839
refactor(src/model.py): remove unnecessary print
YazanShannak Aug 9, 2023
49402fe
build(Dockerfile): Fix some issues in the Dockerfile
YazanShannak Aug 9, 2023
3d72b66
refactor(main.py-src/model.py): Refactor model output and response to…
YazanShannak Aug 9, 2023
957a882
Merge pull request #3 from jordanopensource/fix/reduce-deps
thamudi Aug 9, 2023
8ee9353
Merge pull request #4 from jordanopensource/fix/cleanup-print
thamudi Aug 9, 2023
66ef60c
Merge pull request #5 from jordanopensource/fix/Dockerfile
thamudi Aug 9, 2023
59eebd4
Merge branch 'development' into refactor/model-response
thamudi Aug 9, 2023
1968f9b
Merge pull request #6 from jordanopensource/refactor/model-response
thamudi Aug 9, 2023
5709268
build(requirements.txt): Add extra index for torch-cpu
YazanShannak Aug 9, 2023
ef6cbeb
refactor: optimize docker image from 3G to 1.4G
thamudi Aug 13, 2023
778a977
build: add drone file
thamudi Aug 13, 2023
19d67b1
feat: add healthcheck endpoint
thamudi Aug 13, 2023
2f0776c
update critical dependencies
mbaraa Nov 30, 2023
a62d9e2
update less critical dependencies
mbaraa Nov 30, 2023
4795379
Add Multiclass api capabilities
YazanShannak Jan 21, 2024
781af62
Merge pull request #9 from jordanopensource/feature/multi-class
thamudi Jan 21, 2024
5e872e8
ci(.drone.yml): use the container jsonnet template
itsmohmans Sep 2, 2024
0947688
Merge pull request #12 from jordanopensource/ci/update-drone-template
itsmohmans Sep 3, 2024
e251f65
Add GitHub Actions workflow to schedule milestones weekly (#14)
thamudi Mar 9, 2025
fcce9c9
Merge branch 'main' into development
evilmooncake May 11, 2025
c55790d
chore: remove old drone file
thamudi Jul 31, 2025
6b01771
builds: add new wp builds file
thamudi Jul 31, 2025
6cd971e
ci: update pipeline build args
thamudi Sep 30, 2025
4a0acc8
add missing CI_PIPELINE_NUMBER
thamudi Sep 30, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions .dockerignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
venv
.*
*.md
LICENSE
__pycache__
9 changes: 9 additions & 0 deletions .drone.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
# Drone CI File!

kind: template
load: container.jsonnet
data:
repositoryName: josaorg/nuha-api
releaseName: nuha-api
buildArgs:

24 changes: 24 additions & 0 deletions .github/workflows/schedule-milestones.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
name: schedule-milestones

on:
schedule:
- cron: 0 0 * * SUN # Run every Sunday at midnight

jobs:
generate:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2

- name: Schedule Milestones
uses: readmeio/[email protected]
id: scheduled
with:
token: ${{ secrets.GITHUB_TOKEN }}
title: 'S-'
days: Thursday
count: 4
format: YYYY-MM-DD

- name: Created Milestones
run: echo ${{ steps.scheduled.outputs.milestones }}
Comment on lines +9 to +24

Check warning

Code scanning / CodeQL

Workflow does not contain permissions Medium

Actions job or workflow does not limit the permissions of the GITHUB_TOKEN. Consider setting an explicit permissions block, using the following as a minimal starting point: {contents: read}

Copilot Autofix

AI 6 months ago

To fix the issue, we need to add a permissions block to the workflow. Since the workflow uses the GITHUB_TOKEN to create milestones, it requires contents: read (to read repository contents) and issues: write (to create milestones). These permissions should be explicitly defined at the job level to ensure the workflow has only the necessary access.

The permissions block will be added under the generate job, specifying contents: read and issues: write.


Suggested changeset 1
.github/workflows/schedule-milestones.yaml

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/.github/workflows/schedule-milestones.yaml b/.github/workflows/schedule-milestones.yaml
--- a/.github/workflows/schedule-milestones.yaml
+++ b/.github/workflows/schedule-milestones.yaml
@@ -8,2 +8,5 @@
   generate:
+    permissions:
+      contents: read
+      issues: write
     runs-on: ubuntu-latest
EOF
@@ -8,2 +8,5 @@
generate:
permissions:
contents: read
issues: write
runs-on: ubuntu-latest
Copilot is powered by AI and may make mistakes. Always verify output.
21 changes: 21 additions & 0 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
FROM python:3.10.6-slim AS builder

WORKDIR /app

RUN pip install --upgrade pip
ADD requirements.txt /tmp
RUN pip install -r /tmp/requirements.txt
COPY . /app


# Run stage
FROM python:3.10.6-slim

WORKDIR /app

COPY --from=builder /usr/local/lib/python3.10/site-packages /usr/local/lib/python3.10/site-packages
COPY --from=builder /usr/local/bin/ /usr/local/bin/
COPY --from=builder /app .

ENTRYPOINT [ "uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000" ]

50 changes: 47 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -63,6 +63,11 @@ To get a local copy up and running follow these simple steps.

### Prerequisites

This project depends on a trained text classification model, which is hosted on [Hugging Face](https://huggingface.co/). You can either train your own model or use the one provided by JOSA. The model is defined in environment variables, which are passed to the application at runtime:
1. HUGGINGFACE_TOKEN: The Hugging Face API token.
2. MODEL_PATH: The model path on Hugging Face.
3. MODEL_VERSION: The model version on Hugging Face.

### Installation

1. Clone the repo
Expand All @@ -71,21 +76,60 @@ To get a local copy up and running follow these simple steps.
git clone https://github.com/jordanopensource/nuha-api.git
```

2.
2. Create a virtual environment

```sh
python3 -m venv venv
```

3. Activate the virtual environment

```sh
source venv/bin/activate
```

4. Install the dependencies

```sh
pip install -r requirements.txt
```



### Running

#### Development

To run the project locally for development purposes:

1.
1. Activate the virtual environment

```sh
source venv/bin/activate
```

2. Run the project
```sh
HUGGINGFACE_TOKEN="" MODEL_PATH="" MODEL_VERSION="" uvicorn app.main:app --reload
```

#### Production

To build and run the project locally for production purposes:

1.
1. Build the Docker image

```sh
docker build -t nuha-api .
```

2. Run the Docker container

```sh
docker run -d -p 8000:8000 -e HUGGINGFACE_TOKEN="" -e MODEL_PATH="" -e MODEL_VERSION="" nuha-api
```



___

Expand Down
49 changes: 49 additions & 0 deletions main.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
"""Nuha API main module."""
import os
from fastapi import FastAPI
from fastapi.requests import Request
import huggingface_hub

from src.interface import PredictionRequest, PredictionResponse
from src.model import Nuha, PredictionResult

app = FastAPI(
title="Nuha API",
description="API to serve ML model for hate-speech classification",
)


@app.on_event("startup")
def on_startup():
"""Load model on startup."""
model_path = os.environ.get("MODEL_PATH")
model_version = os.environ.get("MODEL_VERSION")
huggingface_token = os.environ.get("HUGGINGFACE_TOKEN")

huggingface_hub.login(token=huggingface_token)
app.state.model = Nuha(model_path=model_path, model_version=model_version)

@app.get('/healthcheck')
def healthcheck(request: Request):
return 'A healthy response'

@app.post("/predict")
def predict(
request: Request, comments: list[PredictionRequest]
) -> list[PredictionResponse]:
"""Classify comments into hatespeech or not."""
model = request.app.state.model

results: list[PredictionResult]
results = model.predict([c.comment for c in comments])

return [
{
"label": result.label,
"score": result.score,
"model_version": model.model_version,
"comment": comment.comment,
"post": comment.post,
}
for result, comment in zip(results, comments)
]
67 changes: 67 additions & 0 deletions requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
--extra-index-url https://download.pytorch.org/whl/cpu
aiohttp==3.8.6
aiosignal==1.3.1
annotated-types==0.5.0
anyio==3.7.1
async-timeout==4.0.2
attrs==23.1.0
certifi==2023.7.22
charset-normalizer==3.2.0
click==8.1.6
coloredlogs==15.0.1
datasets==2.14.4
dill==0.3.7
evaluate==0.4.0
exceptiongroup==1.1.2
fastapi==0.101.0
filelock==3.9.0
flatbuffers==23.5.26
frozenlist==1.4.0
fsspec==2023.6.0
h11==0.14.0
httptools==0.6.0
huggingface-hub==0.16.4
humanfriendly==10.0
idna==3.4
Jinja2==3.1.2
MarkupSafe==2.1.2
mpmath==1.3.0
multidict==6.0.4
multiprocess==0.70.15
networkx==3.0
numpy==1.25.2
onnx==1.14.0
onnxruntime==1.15.1
optimum==1.11.0
packaging==23.1
pandas==2.0.3
protobuf==4.24.0
pyarrow==14.0.1
pydantic==2.1.1
pydantic_core==2.4.0
python-dateutil==2.8.2
python-dotenv==1.0.0
pytz==2023.3
PyYAML==6.0.1
regex==2023.8.8
requests==2.31.0
responses==0.18.0
safetensors==0.3.2
sentencepiece==0.1.99
six==1.16.0
sniffio==1.3.0
starlette==0.27.0
sympy==1.11.1
tokenizers==0.13.3
torch==2.0.1+cpu
tqdm==4.65.2
transformers==4.31.0
typing_extensions==4.7.1
tzdata==2023.3
urllib3==2.0.7
uvicorn==0.23.2
uvloop==0.17.0
watchfiles==0.19.0
websockets==11.0.3
xxhash==3.3.0
yarl==1.9.2
Empty file added src/__init__.py
Empty file.
21 changes: 21 additions & 0 deletions src/interface.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
"""Interface type definitions."""

from typing import Literal, Optional
from pydantic import BaseModel # pylint:disable=E0611


class PredictionRequest(BaseModel):
"""Single instance of comment to predict."""

comment: str
post: Optional[str]


class PredictionResponse(BaseModel):
"""Single instance of comment prediction"""

label: Literal["offensive-language", "not-online-violence", "gender-based-violence"]
score: float
model_version: str
comment: str
post: str
54 changes: 54 additions & 0 deletions src/model.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
"""Model for Nuha."""
from dataclasses import dataclass
from typing import Literal
from optimum.onnxruntime import ORTModelForSequenceClassification
from optimum.pipelines import pipeline
from transformers import AutoTokenizer


@dataclass
class PredictionResult:
"""Model prediction result."""

label: Literal["offensive-language", "not-online-violence", "gender-based-violence"]
score: float


class Nuha:
"""Encapsulator for Nuha."""

BATCH_SIZE = 32

def __init__(self, model_path: str, model_version: str) -> None:
self.model_path = model_path
self.model_version = model_version
self.device = "cpu"
self.tokenizer = AutoTokenizer.from_pretrained(
pretrained_model_name_or_path=model_path, revision=model_version
)
self.model = ORTModelForSequenceClassification.from_pretrained(
model_id=model_path, revision=model_version
)

self.classifier = pipeline(
task="text-classification",
model=self.model,
accelerator="ort",
tokenizer=self.tokenizer,
device=self.device,
)

def predict(self, batch: list[str]) -> list[PredictionResult]:
"""Run model inference on a batch of comments.

Returns:
list[PredictionResult]: list of labels and scores for each comment
"""
output = self.classifier(batch, batch_size=self.BATCH_SIZE)
print(output)
return [
PredictionResult(
label=o["label"].lower().replace(" ", "-"), score=o["score"]
)
for o in output
]
Loading