Skip to content

Commit b2c494a

Browse files
dev (#63)
* feat: add document parsing functionality for various formats - Implemented DOCX parser using docx_rs for extracting text from Microsoft Word documents. - Added image parser utilizing Tesseract OCR for text extraction from images (PNG, JPEG, WebP). - Created PDF parser using pdf_extract for extracting text from PDF documents. - Developed PPTX parser for extracting text from Microsoft PowerPoint presentations. - Introduced XLSX parser using calamine for extracting text from Excel spreadsheets. - Added plain text parser for handling UTF-8 encoded text files, including TXT, CSV, and JSON formats. - Established a web API using Actix for file parsing, supporting multipart file uploads. - Implemented error handling for API responses with appropriate status codes. - Added tests for all parsers and API endpoints to ensure functionality and correctness. - Included assets for testing various file formats in the tests directory. * Update dependencies and refactor web server functionality - Updated dependencies in Cargo.toml for improved performance and security. - Changed description and categories in Cargo.toml for clarity. - Refactored main.rs to simplify server initialization and remove unnecessary conditionals. - Renamed web module documentation to reflect web server functionality. - Updated routes documentation to clarify purpose. - Simplified static file serving logic in static_files.rs, improving error handling and response structure. * refactor: clean up Dockerfile and remove unnecessary comments; update entrypoint for parser * refactor: remove common test utilities and replace with direct file path handling in tests * refactor: remove obsolete benchmark, build, and deployment test scripts * feat: add CI/CD workflow for Docker image build, publish, and deployment
1 parent 6c64883 commit b2c494a

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

70 files changed

+1157
-2028
lines changed

.dockerignore

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,5 @@
1+
.git
2+
13
/target
24

35
.env

.github/workflows/ci-cd.yaml

Lines changed: 107 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,107 @@
1+
name: CI/CD
2+
3+
permissions:
4+
contents: read
5+
packages: write
6+
7+
on:
8+
pull_request:
9+
branches: [main]
10+
push:
11+
branches: [main]
12+
13+
jobs:
14+
ci:
15+
name: CI
16+
runs-on: ubuntu-latest
17+
steps:
18+
- uses: actions/checkout@v6
19+
20+
- name: Cache system dependencies
21+
uses: awalsh128/cache-apt-pkgs-action@v1
22+
with:
23+
packages: libtesseract-dev libleptonica-dev libclang-dev
24+
version: 1.0
25+
26+
- name: Cache Rust dependencies
27+
uses: Swatinem/rust-cache@v2
28+
29+
- name: Format
30+
run: cargo fmt --all -- --check
31+
32+
- name: Lint
33+
run: cargo clippy --workspace --all-features --all-targets -- -D warnings
34+
35+
- name: Build
36+
run: cargo build --workspace --all-features --all-targets
37+
38+
- name: Test
39+
run: cargo test --workspace --all-features --all-targets
40+
41+
- name: Verify Documentation
42+
run: cargo doc --no-deps --all-features --document-private-items
43+
44+
- name: Set up Docker Buildx
45+
uses: docker/setup-buildx-action@v3
46+
47+
- name: Build Docker image
48+
uses: docker/build-push-action@v6
49+
with:
50+
context: .
51+
push: false
52+
cache-from: type=gha
53+
cache-to: type=gha,mode=max
54+
55+
publish-docker:
56+
name: Publish Docker image
57+
needs: ci
58+
if: github.ref_name == 'main' && github.event_name == 'push'
59+
runs-on: ubuntu-latest
60+
environment: production
61+
steps:
62+
- uses: actions/checkout@v6
63+
64+
- name: Login to Github Container Registry
65+
uses: docker/login-action@v3
66+
with:
67+
registry: ghcr.io
68+
username: ${{ github.actor }}
69+
password: ${{ secrets.GITHUB_TOKEN }}
70+
71+
- name: Set up Docker Buildx
72+
uses: docker/setup-buildx-action@v3
73+
74+
- name: Build and Push Docker image
75+
uses: docker/build-push-action@v6
76+
with:
77+
context: .
78+
push: true
79+
tags: ghcr.io/${{ github.repository }}:latest
80+
cache-from: type=gha
81+
cache-to: type=gha,mode=max
82+
83+
deploy-docker:
84+
name: Deploy Docker image to Production Server
85+
needs: publish-docker
86+
if: github.ref_name == 'main' && github.event_name == 'push'
87+
runs-on: ubuntu-latest
88+
environment: production
89+
steps:
90+
- uses: actions/checkout@v6
91+
92+
- name: Copy compose file to server
93+
uses: appleboy/scp-action@v1
94+
with:
95+
host: ${{ secrets.PROD_HOST }}
96+
username: ${{ secrets.PROD_USERNAME }}
97+
key: ${{ secrets.PROD_SSH_KEY }}
98+
source: compose.yaml
99+
target: /opt/${{ github.event.repository.name }}/
100+
101+
- name: Deploy via SSH
102+
uses: appleboy/ssh-action@v1
103+
with:
104+
host: ${{ secrets.PROD_HOST }}
105+
username: ${{ secrets.PROD_USERNAME }}
106+
key: ${{ secrets.PROD_SSH_KEY }}
107+
script: docker compose -f /opt/${{ github.event.repository.name }}/compose.yaml up -d --pull always

.github/workflows/ci.yaml

Lines changed: 0 additions & 90 deletions
This file was deleted.

.github/workflows/deploy.yaml

Lines changed: 0 additions & 46 deletions
This file was deleted.

.github/workflows/publish.yaml

Lines changed: 0 additions & 35 deletions
This file was deleted.

CLAUDE.md

Lines changed: 0 additions & 28 deletions
This file was deleted.

0 commit comments

Comments
 (0)