excoffierleonard
diff --git a/‎.dockerignore‎
Lines changed: 2 additions & 0 deletions b/‎.dockerignore‎
Lines changed: 2 additions & 0 deletions
diff --git a/‎CLAUDE.md‎
Lines changed: 0 additions & 28 deletions b/‎CLAUDE.md‎
Lines changed: 0 additions & 28 deletions
diff --git a/‎Cargo.lock‎
Lines changed: 9 additions & 53 deletions b/‎Cargo.lock‎
Lines changed: 9 additions & 53 deletions
diff --git a/‎Cargo.toml‎
Lines changed: 36 additions & 21 deletions b/‎Cargo.toml‎
Lines changed: 36 additions & 21 deletions
diff --git a/‎dockerfile‎ ‎Dockerfile‎dockerfile renamed to Dockerfile b/‎dockerfile‎ ‎Dockerfile‎dockerfile renamed to Dockerfile
diff --git a/‎README.md‎
Lines changed: 26 additions & 134 deletions b/‎README.md‎
Lines changed: 26 additions & 134 deletions
diff --git a/‎crates/core/assets/eng.traineddata‎ ‎assets/ocr/eng.traineddata‎crates/core/assets/eng.traineddata renamed to assets/ocr/eng.traineddata b/‎crates/core/assets/eng.traineddata‎ ‎assets/ocr/eng.traineddata‎crates/core/assets/eng.traineddata renamed to assets/ocr/eng.traineddata
diff --git a/‎crates/core/assets/fra.traineddata‎ ‎assets/ocr/fra.traineddata‎crates/core/assets/fra.traineddata renamed to assets/ocr/fra.traineddata b/‎crates/core/assets/fra.traineddata‎ ‎assets/ocr/fra.traineddata‎crates/core/assets/fra.traineddata renamed to assets/ocr/fra.traineddata
diff --git a/‎crates/web/assets/favicon.png‎ ‎assets/web/favicon.png‎crates/web/assets/favicon.png renamed to assets/web/favicon.png b/‎crates/web/assets/favicon.png‎ ‎assets/web/favicon.png‎crates/web/assets/favicon.png renamed to assets/web/favicon.png
diff --git a/‎crates/web/assets/index.html‎ ‎assets/web/index.html‎crates/web/assets/index.html renamed to assets/web/index.html b/‎crates/web/assets/index.html‎ ‎assets/web/index.html‎crates/web/assets/index.html renamed to assets/web/index.html
@@ -1,3 +1,5 @@
+.git
+
 /target
 
 .env
@@ -1,36 +1,51 @@
-[workspace]
-members = ["crates/core", "crates/web", "crates/cli", "crates/test-utils"]
-resolver = "3"
-
-[workspace.package]
+[package]
+name = "parser"
 version = "0.1.7"
 edition = "2024"
 authors = ["Leonard Excoffier"]
 license = "MIT"
 repository = "https://github.com/excoffierleonard/parser"
+description = "A library and web API for extracting text from various file formats including PDF, DOCX, XLSX, PPTX, images via OCR, and more"
+readme = "README.md"
+keywords = ["parser", "pdf", "docx", "text-extraction", "ocr"]
+categories = ["text-processing", "parsing", "web-programming::http-server"]
 
-[workspace.dependencies]
-parser-core = { path = "crates/core", version = "0.1.3" }
-parser-test-utils = { path = "crates/test-utils" }
-actix-multipart = "0.7.2"
-actix-web = "4.9.0"
+[lib]
+name = "parser"
+path = "src/lib.rs"
+
+[[bin]]
+name = "parser-web"
+path = "src/main.rs"
+
+[dependencies]
+# Core parsing dependencies
 calamine = "0.26.1"
-clap = { version = "4.5.1", features = ["derive"] }
-criterion = "0.5"
 docx-rs = "0.4.17"
-dotenvy = "0.15.7"
-env_logger = "0.11.6"
-futures-util = "0.3.31"
 infer = "0.16.0"
 lazy_static = "1.4.0"
 mime = "0.3.17"
-mime_guess = "2.0.5"
-num_cpus = "1.16.0"
 pdf-extract = "0.8.0"
-rayon = "1.10.0"
 regex = "1.11.1"
-rust-embed = { version = "8.5.0", features = ["interpolate-folder-path"] }
-serde = { version = "1.0.217", features = ["derive"] }
-tesseract = "0.15.1"
 tempfile = "3.9.0"
+tesseract = "0.15.1"
 zip = "2.3.0"
+
+# Web API dependencies
+actix-web = "4.9.0"
+actix-multipart = "0.7.2"
+futures-util = "0.3.31"
+rayon = "1.10.0"
+serde = { version = "1.0.217", features = ["derive"] }
+mime_guess = "2.0.5"
+rust-embed = { version = "8.5.0", features = ["interpolate-folder-path"] }
+env_logger = "0.11.6"
+dotenvy = "0.15.7"
+
+[dev-dependencies]
+criterion = "0.5"
+num_cpus = "1.16.0"
+
+[[bench]]
+name = "function_parse"
+harness = false
@@ -1,153 +1,45 @@
 # Parser
 
-A Rust-based document parsing system that extracts text content from various file formats.
+A Rust library for extracting text from various document formats.
 
-[Live Demo](https://parser.excoffierleonard.com) | [API Endpoint](https://parser.excoffierleonard.com/parse)
+[Website](https://parser.excoffierleonard.com)
 
 ![Website Preview](website_preview.png)
 
-## 📚 Overview
+## Features
 
-Parser is a modular Rust project that provides comprehensive document parsing capabilities through multiple interfaces:
+- PDF, DOCX, XLSX, PPTX documents
+- OCR for images (PNG, JPEG, WebP) with English and French support
+- Plain text formats (TXT, CSV, JSON)
 
-- **Core library**: The foundation providing parsing functionality for various file formats
-- **CLI tool**: Command-line interface for quick file parsing
-- **Web API**: REST service for parsing files via HTTP requests
-- **Web UI**: Simple interface for testing the parser functionality
+## Installation
 
-## 📦 Project Structure
-
-The project is organized as a Rust workspace with multiple crates:
-
-- **parser-core**: The core parsing engine
-- **parser-cli**: Command-line interface
-- **parser-web**: Web API and frontend
-- **test-utils**: Shared testing utilities
-
-## 📄 Supported File Types
-
-- **Documents**: PDF (`.pdf`), Word (`.docx`), PowerPoint (`.pptx`), Excel (`.xlsx`)
-- **Text**: Plain text (`.txt`), CSV, JSON, YAML, source code, and other text-based formats
-- **Images**: PNG, JPEG, WebP, and other image formats with OCR (Optical Character Recognition)
-
-The OCR functionality supports English and French languages.
-
-## 🛠️ Getting Started
-
-### Prerequisites
-
-- [Rust](https://www.rust-lang.org/learn/get-started) (latest stable)
-- OCR Dependencies:
-  - Tesseract development libraries
-  - Leptonica development libraries
-  - Clang development libraries
-
-#### Installing OCR Dependencies
-
-**Debian/Ubuntu:**
-
-```bash
-sudo apt install libtesseract-dev libleptonica-dev libclang-dev
-```
-
-**macOS:**
-
-```bash
-brew install tesseract
-```
-
-**Windows:**
-Follow the instructions at [Tesseract GitHub repository](https://github.com/tesseract-ocr/tesseract).
-
-### Building from Source
-
-```bash
-# Build all crates
-cargo build
-
-# Build in release mode
-cargo build --release
-```
-
-### Using the CLI
-
-```bash
-# Run directly with cargo
-cargo run -p parser-cli -- path/to/file1.pdf path/to/file2.docx
-
-# Or use the built binary
-./target/release/parser-cli path/to/file1.pdf path/to/file2.docx
-```
-
-### Running the Web Server
-
-```bash
-# Run the web server
-cargo run -p parser-web
-
-# With custom port
-PARSER_APP_PORT=9000 cargo run -p parser-web
-
-# With file serving enabled (for frontend)
-ENABLE_FILE_SERVING=true cargo run -p parser-web
-```
-
-## 🚀 Deployment
-
-The easiest way to deploy the service is using Docker:
-
-```bash
-curl -o compose.yaml https://raw.githubusercontent.com/excoffierleonard/parser/refs/heads/main/compose.yaml && \
-docker compose up -d
-```
-
-### Environment Variables
-
-- `PARSER_APP_PORT`: The port on which the web service listens (default: 8080)
-- `ENABLE_FILE_SERVING`: Enable serving frontend files (default: false)
-
-## 🧪 Development
-
-### Testing
-
-```bash
-# Run all tests
-cargo test --workspace
-
-# Run specific test
-cargo test test_name
+```toml
+[dependencies]
+parser = "0.1"
 ```
 
-### Benchmarking
+## Usage
 
-```bash
-# Run benchmarks
-cargo bench --workspace
+```rust
+use parser::parse;
 
-# Run benchmark script
-./scripts/benchmark.sh
+fn main() -> Result<(), Box<dyn std::error::Error>> {
+    let data = std::fs::read("document.pdf")?;
+    let text = parse(&data)?;
+    println!("{}", text);
+    Ok(())
+}
 ```
 
-### Code Quality
+## System Dependencies
 
-```bash
-# Run linter
-cargo clippy --workspace -- -D warnings
+Requires Tesseract OCR libraries:
 
-# Format code
-cargo fmt --all
-```
-
-### Building with Scripts
-
-```bash
-# Full build script
-./scripts/build.sh
-
-# Deployment tests
-./scripts/deploy-tests.sh
-```
+- **Debian/Ubuntu:** `sudo apt install libtesseract-dev libleptonica-dev libclang-dev`
+- **macOS:** `brew install tesseract`
+- **Windows:** Follow the instructions at [Tesseract GitHub repository](https://github.com/tesseract-ocr/tesseract)
 
-## 📜 License
+## License
 
-This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
+MIT
-Original file line number
+Diff line change
@@ @@ -1,3 +1,5 @@ @@
 +.git
++
 /target
 .env