Skip to content

Commit 308509b

Browse files
authored
docs(README): prune duplicated content
1 parent ae54edb commit 308509b

File tree

1 file changed

+19
-66
lines changed

1 file changed

+19
-66
lines changed

README.md

Lines changed: 19 additions & 66 deletions
Original file line numberDiff line numberDiff line change
@@ -5,18 +5,17 @@
55
[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)
66
[![Coverage](https://img.shields.io/badge/coverage-90%25-brightgreen.svg)](https://github.com/hspedro/babeltron/actions/workflows/test.yml)
77

8-
A Python-based REST API that leverages powerful multilingual translation models (M2M100 and NLLB) to
9-
provide efficient text translation services. Babeltron exposes a simple interface
10-
for translating text between multiple languages, making powerful neural machine
11-
translation accessible through straightforward API endpoints.
8+
A Python-based REST API that leverages multilingual translation models (M2M100, NLLB, and others)
9+
to provide text translation. Babeltron exposes a REST API for translating text between multiple languages
10+
with language detection if none source was specified.
1211

1312
## Features
1413

15-
- Receives a text, source language and destination language, then returns the text
16-
translated
17-
- Supports two powerful translation models:
18-
- **M2M100**: Supports 100+ languages
19-
- **NLLB (No Language Left Behind)**: Supports 200+ languages, including many low-resource languages
14+
- Receives a text, source language and destination language, then returns the text translated
15+
- Receives a text and destination language, detects source language, then returns the text translated
16+
- Supports two translation models:
17+
- **M2M100**: Supports 100+ languages, CC-BY-NC license
18+
- **NLLB (No Language Left Behind)**: Supports 200+ languages, CC-BY-NC license
2019
- **Language Detection**: Automatically detects the language of text using the Lingua language detector, which is highly accurate even for short text snippets
2120
- Lingua prioritizes quality over quantity, focusing on accurate detection rather than supporting every possible language
2221
- Supports 75 different languages with high precision, even though the translation models accept 200+ languages
@@ -43,39 +42,22 @@ curl -sSL https://install.python-poetry.org | python3 -
4342
make install
4443
```
4544

46-
## Development Commands
45+
## Running the Project
4746

4847
The project includes several helpful make commands:
4948

50-
- `make install` - Install project dependencies
49+
- `make serve` - Serves the API in port 8000
50+
- `make download-model` - Downloads model from HuggingFace and stores locally in `./models`
51+
- `make docker-up` - Serves the API and dockerized services like otel, jeager, prometheus, and valkey (cache)
52+
53+
### Development Commands
54+
5155
- `make test` - Run tests
5256
- `make lint` - Run linters (flake8, isort, black)
5357
- `make format` - Format code with isort and black
5458
- `make coverage` - Run tests with coverage report
5559
- `make coverage-html` - Generate HTML coverage report
5660

57-
## Testing and Code Coverage
58-
59-
### Running Tests with Coverage
60-
61-
To run tests with coverage reporting:
62-
63-
```bash
64-
make coverage
65-
```
66-
67-
For a detailed HTML coverage report:
68-
69-
```bash
70-
make coverage-html
71-
```
72-
73-
The HTML report will be generated in the `htmlcov` directory.
74-
75-
### Coverage Configuration
76-
77-
The project uses a `.coveragerc` file to configure coverage settings. This ensures consistent coverage reporting across different environments.
78-
7961
## Downloading Translation Models
8062

8163
Babeltron supports two types of translation models: M2M100 and NLLB (No Language Left Behind). You can download models of different sizes depending on your needs and hardware constraints:
@@ -318,6 +300,7 @@ The following environment variables can be used to configure the application:
318300
- `PORT`: Port to run the API server on (default: `8000`)
319301
- `WORKER_COUNT`: Number of worker processes to use (default: `1`)
320302
- `BABELTRON_MODEL_TYPE`: Type of model to use in the API (`m2m100` or `nllb`, default: `m2m100`)
303+
- `AUTH_USERNAME` and `AUTH_PASSWORD`: Enables basic auth on translate and detect endpoints
321304

322305
### Docker Volume Mounts
323306

@@ -442,42 +425,12 @@ For a detailed technical overview of the system architecture, including diagrams
442425

443426
![Babeltron Overview](docs/images/overview.png)
444427

445-
## Downloading Models
446-
447-
Before using Babeltron, you need to download at least one model:
448-
449-
### Translation Models
450-
451-
```bash
452-
# Download the default M2M100 small model (418M parameters)
453-
make download-model-m2m-small
454-
455-
# Download the M2M100 medium model (1.2B parameters)
456-
make download-model-m2m-medium
457-
458-
# Download the M2M100 large model (12B parameters)
459-
make download-model-m2m-large
460-
461-
# Download the NLLB small model (600M parameters)
462-
make download-model-nllb-small
463-
464-
# Download the NLLB large model (3.3B parameters)
465-
make download-model-nllb-large
466-
```
467-
468-
### Language Detection Model
469-
470-
```bash
471-
# Download the XLM-RoBERTa model for language detection
472-
make download-model-xlm-roberta
473-
```
474-
475428
## API Usage
476429

477430
### Translation
478431

479432
```bash
480-
curl -X POST "http://localhost:8000/translate" \
433+
curl -X POST "http://localhost:8000/api/v1/translate" \
481434
-H "Content-Type: application/json" \
482435
-d '{"text": "Hello, how are you?", "source_lang": "en", "target_lang": "fr"}'
483436
```
@@ -489,10 +442,10 @@ Response:
489442
}
490443
```
491444

492-
### Language Detection
445+
### Detection
493446

494447
```bash
495-
curl -X POST "http://localhost:8000/detect" \
448+
curl -X POST "http://localhost:8000/api/v1/detect" \
496449
-H "Content-Type: application/json" \
497450
-d '{"text": "Hello, how are you?"}'
498451
```

0 commit comments

Comments
 (0)