|
1 | | -# 🌿 iNaturalist 2021 Image Classifier (FastAPI + EVA-CLIP) |
| 1 | +# 🌿 iNaturalist 2021 Image Classifier & LAION Aesthetic Scoring API |
2 | 2 |
|
3 | | -This project serves a high-accuracy image classification API using a **Vision Transformer** model fine-tuned on the **iNaturalist 2021** biodiversity dataset. It supports top-k prediction and an optional debug mode with detailed logits, scores, and resized input images. |
| 3 | +This project provides a robust FastAPI-based backend for two advanced computer vision endpoints: |
4 | 4 |
|
5 | | -## 🧠 Model Details |
| 5 | +- **Biodiversity Image Classification** using a fine-tuned Vision Transformer (ViT) on the iNaturalist 2021 dataset. |
| 6 | +- **Aesthetic Scoring** using the LAION regression head on OpenAI CLIP ViT-B/16 features. |
6 | 7 |
|
7 | | -- **Model Family**: [`timm`](https://github.com/rwightman/pytorch-image-models) |
8 | | -- **Model Name**: `eva02_large_patch14_clip_336.merged2b_ft_inat21` |
9 | | -- **Source**: Hugging Face Hub via `hf-hub:timm/eva02_large_patch14_clip_336.merged2b_ft_inat21` |
10 | | -- **Architecture**: Vision Transformer (EVA-CLIP backbone) |
11 | | -- **Pretraining**: Internally pre-trained CLIP-like architecture |
12 | | -- **Fine-tuned On**: iNaturalist 2021 (10,000+ species of plants, animals, fungi, and microbes) |
13 | | -- **Output Classes**: Mapped using `inat21_class_index.json` |
14 | | -- **Label URL**: Provided via `model.default_cfg['label_url']` |
| 8 | +Both endpoints support raw image uploads, and leverage state-of-the-art models for their respective tasks while keeping our data private. |
15 | 9 |
|
16 | | -## 🖼️ Input Format |
| 10 | +--- |
17 | 11 |
|
18 | | -- Accepts raw image bytes (e.g., `image/jpeg`, `image/png`) |
19 | | -- Auto-converted to RGB using Pillow |
20 | | -- Resized to 384x384, then center cropped to 336x336 |
21 | | -- Normalized using CLIP-style mean and std values: |
22 | | - - `mean = [0.48145466, 0.4578275, 0.40821073]` |
23 | | - - `std = [0.26862954, 0.26130258, 0.27577711]` |
| 12 | +## 🧠 Model Details |
24 | 13 |
|
25 | | -## CLI commands |
26 | | -- `make build-ai-api` |
27 | | -- `make ai-api` |
| 14 | +### 1. Biodiversity Classifier |
28 | 15 |
|
| 16 | +- **Model Family:** [`timm`](https://github.com/rwightman/pytorch-image-models) |
| 17 | +- **Model Name:** `eva02_large_patch14_clip_336.merged2b_ft_inat21` |
| 18 | +- **Source:** Hugging Face Hub |
| 19 | +- **Architecture:** Vision Transformer (EVA-CLIP backbone) |
| 20 | +- **Fine-tuned On:** iNaturalist 2021 (10,000+ species) |
| 21 | +- **Output Classes:** Mapped using `inat21_class_index.json` |
29 | 22 |
|
30 | | -# Aesthetic Scoring |
| 23 | +### 2. Aesthetic Scoring |
31 | 24 |
|
32 | | -This project provides an API for **aesthetic scoring** of images using a regression head trained on top of OpenAI's CLIP ViT-B/16 backbone. |
| 25 | +- **Backbone:** OpenAI CLIP ViT-B/16 |
| 26 | +- **Regression Head:** Multilayer Perceptron (MLP) trained for aesthetic prediction ([LAION aesthetic predictor](https://github.com/LAION-AI/aesthetic-predictor)) |
| 27 | +- **Head Weights:** `models/aesthetic/sa_0_4_vit_b_16_linear.pth` |
| 28 | +- **Feature Dimension:** 512 |
33 | 29 |
|
34 | 30 | --- |
35 | 31 |
|
36 | | -## 🧠 Model Details |
| 32 | +## 🚀 API Endpoints |
37 | 33 |
|
38 | | -- **Backbone:** OpenAI CLIP ViT-B/16 |
39 | | -- **Regression Head:** Multilayer Perceptron (MLP) trained for aesthetic prediction |
40 | | -- **Head Weights:** `models/aesthetic/sa_0_4_vit_b_16_linear.pth` |
41 | | -- **Feature Dimension:** 512 |
| 34 | +### 1. `/classify` — Biodiversity Image Classification |
42 | 35 |
|
| 36 | +**Description:** |
| 37 | +Predicts the top-3 most likely species for a given image using a ViT model fine-tuned on iNaturalist 2021. |
| 38 | + |
| 39 | +**Request:** |
| 40 | +- **Method:** `POST` |
| 41 | +- **Content-Type:** `image/jpeg` or `image/png` |
| 42 | +- **Body:** Raw image bytes |
| 43 | + |
| 44 | +**Example (using curl):** |
| 45 | +```sh |
| 46 | +curl -X POST -H "Content-Type: image/jpeg" --data-binary @your_image.jpg http://localhost:8080/classify |
| 47 | +``` |
| 48 | + |
| 49 | +**Response:** |
| 50 | +- **Status:** 200 OK |
| 51 | +- **Content-Type:** `application/json` |
| 52 | +- **Body:** JSON object with top-3 species predictions, e.g., |
| 53 | +```json |
| 54 | +{ |
| 55 | + "predictions": [ |
| 56 | + {"species": "Cardinalis cardinalis", "score": 0.987}, |
| 57 | + {"species": "Pica pica", "score": 0.005}, |
| 58 | + {"species": "Corvus corax", "score": 0.003} |
| 59 | + ] |
| 60 | +} |
| 61 | +``` |
| 62 | + |
| 63 | +### 2. `/score` — Aesthetic Scoring |
| 64 | + |
| 65 | +**Description:** |
| 66 | +Predicts the aesthetic score of an image on a scale from 0 to 10 using the LAION regression head. |
| 67 | + |
| 68 | +**Request:** |
| 69 | +- **Method:** `POST` |
| 70 | +- **Content-Type:** `image/jpeg` or `image/png` |
| 71 | +- **Body:** Raw image bytes |
| 72 | + |
| 73 | +**Example (using curl):** |
| 74 | +```sh |
| 75 | +curl -X POST -H "Content-Type: image/jpeg" --data-binary @your_image.jpg http://localhost:8080/score |
| 76 | +``` |
| 77 | + |
| 78 | +**Response:** |
| 79 | +- **Status:** 200 OK |
| 80 | +- **Content-Type:** `application/json` |
| 81 | +- **Body:** JSON object with the aesthetic score, e.g., |
| 82 | +```json |
| 83 | +{ |
| 84 | + "score": 7.5 |
| 85 | +} |
| 86 | +``` |
| 87 | + |
| 88 | +--- |
43 | 89 |
|
44 | 90 | ## Local setup |
45 | 91 |
|
|
0 commit comments