-
Notifications
You must be signed in to change notification settings - Fork 0
Merge 'development' into 'main' #15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
evilmooncake
wants to merge
34
commits into
main
Choose a base branch
from
development
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from 29 commits
Commits
Show all changes
34 commits
Select commit
Hold shift + click to select a range
e763ce0
build(requirements.txt): add initial requirements.txt
YazanShannak 5a50078
feat(Nuha): implement Nuha model inference
YazanShannak 93ed42f
feat(main.py): integrate Nuha in the predict endpoint
YazanShannak bc5fd2a
fix(main.py): login to huggingface hub
YazanShannak 63791cd
build(Dockerfile): write initial Dockerfile
YazanShannak c091c72
docs(README.md): Update README.md
YazanShannak 32ce490
feat: added original post and comments to the response
mbaraa 1f0de5d
chore: removed unnecessary async from the request handler
mbaraa 353327f
Merge pull request #2 from jordanopensource/feat/map-post-and-comment…
thamudi f1a1197
perf(requirements.txt): remove CUDA dependencies
YazanShannak 490b839
refactor(src/model.py): remove unnecessary print
YazanShannak 49402fe
build(Dockerfile): Fix some issues in the Dockerfile
YazanShannak 3d72b66
refactor(main.py-src/model.py): Refactor model output and response to…
YazanShannak 957a882
Merge pull request #3 from jordanopensource/fix/reduce-deps
thamudi 8ee9353
Merge pull request #4 from jordanopensource/fix/cleanup-print
thamudi 66ef60c
Merge pull request #5 from jordanopensource/fix/Dockerfile
thamudi 59eebd4
Merge branch 'development' into refactor/model-response
thamudi 1968f9b
Merge pull request #6 from jordanopensource/refactor/model-response
thamudi 5709268
build(requirements.txt): Add extra index for torch-cpu
YazanShannak ef6cbeb
refactor: optimize docker image from 3G to 1.4G
thamudi 778a977
build: add drone file
thamudi 19d67b1
feat: add healthcheck endpoint
thamudi 2f0776c
update critical dependencies
mbaraa a62d9e2
update less critical dependencies
mbaraa 4795379
Add Multiclass api capabilities
YazanShannak 781af62
Merge pull request #9 from jordanopensource/feature/multi-class
thamudi 5e872e8
ci(.drone.yml): use the container jsonnet template
itsmohmans 0947688
Merge pull request #12 from jordanopensource/ci/update-drone-template
itsmohmans e251f65
Add GitHub Actions workflow to schedule milestones weekly (#14)
thamudi fcce9c9
Merge branch 'main' into development
evilmooncake c55790d
chore: remove old drone file
thamudi 6b01771
builds: add new wp builds file
thamudi 6cd971e
ci: update pipeline build args
thamudi 4a0acc8
add missing CI_PIPELINE_NUMBER
thamudi File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,5 @@ | ||
| venv | ||
| .* | ||
| *.md | ||
| LICENSE | ||
| __pycache__ |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,9 @@ | ||
| # Drone CI File! | ||
|
|
||
| kind: template | ||
| load: container.jsonnet | ||
| data: | ||
| repositoryName: josaorg/nuha-api | ||
| releaseName: nuha-api | ||
| buildArgs: | ||
|
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,24 @@ | ||
| name: schedule-milestones | ||
|
|
||
| on: | ||
| schedule: | ||
| - cron: 0 0 * * SUN # Run every Sunday at midnight | ||
|
|
||
| jobs: | ||
| generate: | ||
| runs-on: ubuntu-latest | ||
| steps: | ||
| - uses: actions/checkout@v2 | ||
|
|
||
| - name: Schedule Milestones | ||
| uses: readmeio/[email protected] | ||
| id: scheduled | ||
| with: | ||
| token: ${{ secrets.GITHUB_TOKEN }} | ||
| title: 'S-' | ||
| days: Thursday | ||
| count: 4 | ||
| format: YYYY-MM-DD | ||
|
|
||
| - name: Created Milestones | ||
| run: echo ${{ steps.scheduled.outputs.milestones }} | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,21 @@ | ||
| FROM python:3.10.6-slim AS builder | ||
|
|
||
| WORKDIR /app | ||
|
|
||
| RUN pip install --upgrade pip | ||
| ADD requirements.txt /tmp | ||
| RUN pip install -r /tmp/requirements.txt | ||
| COPY . /app | ||
|
|
||
|
|
||
| # Run stage | ||
| FROM python:3.10.6-slim | ||
|
|
||
| WORKDIR /app | ||
|
|
||
| COPY --from=builder /usr/local/lib/python3.10/site-packages /usr/local/lib/python3.10/site-packages | ||
| COPY --from=builder /usr/local/bin/ /usr/local/bin/ | ||
| COPY --from=builder /app . | ||
|
|
||
| ENTRYPOINT [ "uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000" ] | ||
|
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,49 @@ | ||
| """Nuha API main module.""" | ||
| import os | ||
| from fastapi import FastAPI | ||
| from fastapi.requests import Request | ||
| import huggingface_hub | ||
|
|
||
| from src.interface import PredictionRequest, PredictionResponse | ||
| from src.model import Nuha, PredictionResult | ||
|
|
||
| app = FastAPI( | ||
| title="Nuha API", | ||
| description="API to serve ML model for hate-speech classification", | ||
| ) | ||
|
|
||
|
|
||
| @app.on_event("startup") | ||
| def on_startup(): | ||
| """Load model on startup.""" | ||
| model_path = os.environ.get("MODEL_PATH") | ||
| model_version = os.environ.get("MODEL_VERSION") | ||
| huggingface_token = os.environ.get("HUGGINGFACE_TOKEN") | ||
|
|
||
| huggingface_hub.login(token=huggingface_token) | ||
| app.state.model = Nuha(model_path=model_path, model_version=model_version) | ||
|
|
||
| @app.get('/healthcheck') | ||
| def healthcheck(request: Request): | ||
| return 'A healthy response' | ||
|
|
||
| @app.post("/predict") | ||
| def predict( | ||
| request: Request, comments: list[PredictionRequest] | ||
| ) -> list[PredictionResponse]: | ||
| """Classify comments into hatespeech or not.""" | ||
| model = request.app.state.model | ||
|
|
||
| results: list[PredictionResult] | ||
| results = model.predict([c.comment for c in comments]) | ||
|
|
||
| return [ | ||
| { | ||
| "label": result.label, | ||
| "score": result.score, | ||
| "model_version": model.model_version, | ||
| "comment": comment.comment, | ||
| "post": comment.post, | ||
| } | ||
| for result, comment in zip(results, comments) | ||
| ] |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,67 @@ | ||
| --extra-index-url https://download.pytorch.org/whl/cpu | ||
| aiohttp==3.8.6 | ||
| aiosignal==1.3.1 | ||
| annotated-types==0.5.0 | ||
| anyio==3.7.1 | ||
| async-timeout==4.0.2 | ||
| attrs==23.1.0 | ||
| certifi==2023.7.22 | ||
| charset-normalizer==3.2.0 | ||
| click==8.1.6 | ||
| coloredlogs==15.0.1 | ||
| datasets==2.14.4 | ||
| dill==0.3.7 | ||
| evaluate==0.4.0 | ||
| exceptiongroup==1.1.2 | ||
| fastapi==0.101.0 | ||
| filelock==3.9.0 | ||
| flatbuffers==23.5.26 | ||
| frozenlist==1.4.0 | ||
| fsspec==2023.6.0 | ||
| h11==0.14.0 | ||
| httptools==0.6.0 | ||
| huggingface-hub==0.16.4 | ||
| humanfriendly==10.0 | ||
| idna==3.4 | ||
| Jinja2==3.1.2 | ||
| MarkupSafe==2.1.2 | ||
| mpmath==1.3.0 | ||
| multidict==6.0.4 | ||
| multiprocess==0.70.15 | ||
| networkx==3.0 | ||
| numpy==1.25.2 | ||
| onnx==1.14.0 | ||
| onnxruntime==1.15.1 | ||
| optimum==1.11.0 | ||
| packaging==23.1 | ||
| pandas==2.0.3 | ||
| protobuf==4.24.0 | ||
| pyarrow==14.0.1 | ||
| pydantic==2.1.1 | ||
| pydantic_core==2.4.0 | ||
| python-dateutil==2.8.2 | ||
| python-dotenv==1.0.0 | ||
| pytz==2023.3 | ||
| PyYAML==6.0.1 | ||
| regex==2023.8.8 | ||
| requests==2.31.0 | ||
| responses==0.18.0 | ||
| safetensors==0.3.2 | ||
| sentencepiece==0.1.99 | ||
| six==1.16.0 | ||
| sniffio==1.3.0 | ||
| starlette==0.27.0 | ||
| sympy==1.11.1 | ||
| tokenizers==0.13.3 | ||
| torch==2.0.1+cpu | ||
| tqdm==4.65.2 | ||
| transformers==4.31.0 | ||
| typing_extensions==4.7.1 | ||
| tzdata==2023.3 | ||
| urllib3==2.0.7 | ||
| uvicorn==0.23.2 | ||
| uvloop==0.17.0 | ||
| watchfiles==0.19.0 | ||
| websockets==11.0.3 | ||
| xxhash==3.3.0 | ||
| yarl==1.9.2 |
Empty file.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,21 @@ | ||
| """Interface type definitions.""" | ||
|
|
||
| from typing import Literal, Optional | ||
| from pydantic import BaseModel # pylint:disable=E0611 | ||
|
|
||
|
|
||
| class PredictionRequest(BaseModel): | ||
| """Single instance of comment to predict.""" | ||
|
|
||
| comment: str | ||
| post: Optional[str] | ||
|
|
||
|
|
||
| class PredictionResponse(BaseModel): | ||
| """Single instance of comment prediction""" | ||
|
|
||
| label: Literal["offensive-language", "not-online-violence", "gender-based-violence"] | ||
| score: float | ||
| model_version: str | ||
| comment: str | ||
| post: str |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,54 @@ | ||
| """Model for Nuha.""" | ||
| from dataclasses import dataclass | ||
| from typing import Literal | ||
| from optimum.onnxruntime import ORTModelForSequenceClassification | ||
| from optimum.pipelines import pipeline | ||
| from transformers import AutoTokenizer | ||
|
|
||
|
|
||
| @dataclass | ||
| class PredictionResult: | ||
| """Model prediction result.""" | ||
|
|
||
| label: Literal["offensive-language", "not-online-violence", "gender-based-violence"] | ||
| score: float | ||
|
|
||
|
|
||
| class Nuha: | ||
| """Encapsulator for Nuha.""" | ||
|
|
||
| BATCH_SIZE = 32 | ||
|
|
||
| def __init__(self, model_path: str, model_version: str) -> None: | ||
| self.model_path = model_path | ||
| self.model_version = model_version | ||
| self.device = "cpu" | ||
| self.tokenizer = AutoTokenizer.from_pretrained( | ||
| pretrained_model_name_or_path=model_path, revision=model_version | ||
| ) | ||
| self.model = ORTModelForSequenceClassification.from_pretrained( | ||
| model_id=model_path, revision=model_version | ||
| ) | ||
|
|
||
| self.classifier = pipeline( | ||
| task="text-classification", | ||
| model=self.model, | ||
| accelerator="ort", | ||
| tokenizer=self.tokenizer, | ||
| device=self.device, | ||
| ) | ||
|
|
||
| def predict(self, batch: list[str]) -> list[PredictionResult]: | ||
| """Run model inference on a batch of comments. | ||
|
|
||
| Returns: | ||
| list[PredictionResult]: list of labels and scores for each comment | ||
| """ | ||
| output = self.classifier(batch, batch_size=self.BATCH_SIZE) | ||
| print(output) | ||
| return [ | ||
| PredictionResult( | ||
| label=o["label"].lower().replace(" ", "-"), score=o["score"] | ||
| ) | ||
| for o in output | ||
| ] |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Check warning
Code scanning / CodeQL
Workflow does not contain permissions Medium
Copilot Autofix
AI 6 months ago
To fix the issue, we need to add a
permissionsblock to the workflow. Since the workflow uses theGITHUB_TOKENto create milestones, it requirescontents: read(to read repository contents) andissues: write(to create milestones). These permissions should be explicitly defined at the job level to ensure the workflow has only the necessary access.The
permissionsblock will be added under thegeneratejob, specifyingcontents: readandissues: write.