A powerful REST API service that provides intelligent audio transcription and text summarization capabilities powered by OpenAI and Google's AI models.
- Audio Transcription: Convert audio files to text using OpenAI's Whisper model
- Text Summarization: Generate organized summaries and topic breakdowns using Google's Gemini AI
- Combined Workflow: Transcribe audio files and automatically summarize the content
- Multi-format Support: Supports MP3, WAV, and M4A audio formats
- RESTful API: Clean, well-documented REST endpoints
- Configurable: Flexible configuration through environment variables
- Production Ready: Comprehensive logging and error handling
- Go 1.24.1 or higher
- OpenAI API key (for audio transcription)
- Google Gemini API key (for text summarization)
- Clone the repository:
git clone https://github.com/8bury/sumup4me.git
cd sumup4me- Install dependencies:
go mod tidy- Build the application:
go build -o sumup4me ./cmd/apiCreate a .env file in the root directory with the following variables:
# OpenAI Configuration
OPENAI_API_KEY=your_openai_api_key_here
OPENAI_BASE_URL=https://api.openai.com/v1 # Optional, defaults to OpenAI's API
# Google Gemini Configuration
GEMINI_API_KEY=your_gemini_api_key_here| Variable | Description | Required | Default |
|---|---|---|---|
OPENAI_API_KEY |
Your OpenAI API key for Whisper transcription | Yes | - |
OPENAI_BASE_URL |
OpenAI API base URL | No | https://api.openai.com/v1 |
GEMINI_API_KEY |
Your Google Gemini API key for text summarization | Yes | - |
- Start the server:
./sumup4meThe server will start on port 8080 by default.
- The API will be available at
http://localhost:8080
http://localhost:8080/v1
Convert audio files to text.
Endpoint: POST /v1/transcribe
Request:
- Method: POST
- Content-Type: multipart/form-data
- Body: Audio file (key: "audio")
Supported Formats: MP3, WAV, M4A
Response:
{
"transcription": "Transcribed text content here..."
}Example using curl:
curl -X POST http://localhost:8080/v1/transcribe \
-F "audio=@your-audio-file.mp3"Generate an organized summary of text content.
Endpoint: POST /v1/sumup/text
Request:
- Method: POST
- Query Parameter:
text(the text to summarize)
Response:
"• Topic 1: Summary point about first topic\n• Topic 2: Summary point about second topic\n..."Example using curl:
curl -X POST "http://localhost:8080/v1/sumup/text?text=Your long text content here..."Transcribe audio and automatically generate a summary.
Endpoint: POST /v1/sumup/audio
Request:
- Method: POST
- Content-Type: multipart/form-data
- Body: Audio file (key: "audio")
Response:
"• Topic 1: Summary of transcribed content\n• Topic 2: Another key point\n..."Example using curl:
curl -X POST http://localhost:8080/v1/sumup/audio \
-F "audio=@your-audio-file.mp3"All endpoints return appropriate HTTP status codes and error messages:
400 Bad Request:
{
"error": "Error description"
}500 Internal Server Error:
{
"error": "Internal server error description"
}sumup4me/
├── cmd/
│ └── api/
│ └── main.go # Application entry point
├── internal/
│ ├── audio/
│ │ └── audio.go # Audio file validation utilities
│ ├── config/
│ │ └── config.go # Application configuration and DI setup
│ ├── controller/
│ │ ├── sumup.controller.go # Summary and combined workflow endpoints
│ │ └── transcribing.controller.go # Transcription endpoints
│ ├── dao/
│ │ ├── sumup.dao.go # Gemini AI integration
│ │ └── transcribing.dao.go # OpenAI Whisper integration
│ ├── model/
│ │ ├── transcription.model.go # Data models
│ │ └── errors.go # Error models
│ └── service/
│ ├── sumup.service.go # Business logic for summarization
│ └── transcribing.service.go # Business logic for transcription
├── go.mod # Go module definition
├── go.sum # Go module checksums
└── README.md # This file
go run ./cmd/apigo build -ldflags="-s -w" -o sumup4me ./cmd/apiYou can test the API using the provided curl examples or any HTTP client like Postman, Insomnia, or HTTPie.
- Fork the repository
- Create your feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add some amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
- The transcription service is configured for Portuguese language by default
- Audio files are processed in memory - ensure adequate server resources for large files
- All API responses include comprehensive logging for debugging and monitoring
- The summarization uses Gemini 2.0 Flash model for optimal performance and quality
- Store API keys securely and never commit them to version control
- Use environment variables or secure secret management systems in production
- Consider implementing rate limiting and authentication for production deployments
Built with ❤️ using Go, OpenAI Whisper, and Google Gemini