Multi-Source Media Tool Server – Requirements Document
Multi-Source Media MCP Server (M3S) An MCP Tool Implementation for Multi-Source Image and Video Access & Generation
This project is an MCP (Model Context Protocol) tool implementation, designed to provide a unified interface for large models (e.g., Gemini, LangChain, ChatGPT Agents, etc.) to access and generate images and videos from multiple sources.
⚠️ Note: This is not a traditional backend service, but an MCP tool server that exposes its capabilities via the MCP SDK.
- Provide unified image/video access and generation capabilities for large models
- Offer transparent access to diverse content sources (3rd-party APIs, web crawlers, user uploads, AI generation)
- Ensure security, scalability, and maintainability
- Maintain comprehensive logging and request traceability
- Language: Go 1.25.1
- MCP SDK:
go-sdk - Additional Dependencies: HTTP client, environment variable management, and a shared utility package for generic MCP tool response handling.
- Project Structure: Modular design, for example (can be adjusted and optimized as needed) :
gemini-mcp/
├── cmd/
│ └── server/
│ └── main.go # Program entry point, initializes MCP Server
│
├── mcp/
│ ├── server.go # Creates MCP server instance, registers all tools
│ └── middleware/
│ └── logging.go # Logging middleware for MCP tools
│ └── tools/
│ ├── pexels/
│ │ ├── client.go # Pexels API client implementation
│ │ ├── handler.go # MCP tool handlers for Pexels API
│ │ └── schema.go # Defines Pexels tool input/output JSON schemas
│ │
│ ├── test_greet/
│ │ ├── handler.go # Implements "greet" tool logic
│ │ └── schema.go # Defines greet tool input/output JSON schemas
│ │
│ └── tool_xxx/ # Placeholder for other tools
│ ├── handler.go
│ └── schema.go
│
├── test/
│ ├── tool_greet_test.go # Unit tests for greet tool
│ ├── tool_summary_test.go # Unit tests for summary tool
│ └── integration_test.go # Integration tests (run after feature development)
│
├── config/
│ └── config.go # (Optional) Configuration loading, API Key, environment variables
│
├── utils/
│ ├── env.go # Utility functions for environment variable expansion
│ ├── logger.go # General utility functions (logging, error handling, etc.)
│ └── mcp_tool_response.go # Generic handler for MCP tool responses (e.g., TextContent)
│
├── go.mod
├── go.sum
└── README.md
Objective: Provide a unified interface to retrieve images and videos from third-party platforms.
Supported Platforms:
- Unsplash – Image search and retrieval
- Pexels – Image and video search and retrieval
Features:
- Keyword-based search for photos and videos
- Retrieve single photo or video details by ID
- Access curated collections of photos and videos
- Pagination support for search results and collections
- Optional filters for orientation, size, color (for photos), and locale
Extensibility:
- Architecture should easily accommodate new platforms
- Platform API keys configurable via environment variables
Objective: Enable AI-based image generation and editing capabilities accessible via MCP tools.
Features:
- Text-to-Image: Generate images from prompts
- Image-to-Image: Modify or regenerate images using a source image and a prompt
- Post-processing: Upscaling, style transfer, etc.
Configurable Backends: OpenAI, Stability AI, Hugging Face, local Stable Diffusion, etc.
Configurable Parameters: Prompt, size, style, random seed, etc.
Objective: Retrieve image content from search engines or web pages.
Features:
- Asynchronous web crawling for images
- Automatic filtering of duplicates or invalid links
- Pagination and crawl-depth control
Compliance:
- Respect
robots.txt - Avoid copyright violations
Objective: Allow users to upload and manage their own images.
Features:
- Image upload
- List and manage uploaded images
- Regenerate or edit uploaded images using AI tools
- Local or cloud storage integration (e.g., AWS S3, Google Cloud Storage)
- Each functional module must be exposed as an MCP Tool
- Maintain a consistent tool interface for LLM invocation
- Support both STDIO and StreamHTTP communication modes
- Input/output/error structures must be strongly typed and schema-annotated
- Each tool must have clear documentation including parameters, return types, and examples
- Design should emphasize extensibility, enabling addition of new data sources or AI features
Simple Example:
package main
import (
"context"
"log"
"net/http"
"github.com/modelcontextprotocol/go-sdk/mcp"
)
func main() {
server := mcp.NewServer(&mcp.Implementation{Name: "greeter"}, nil)
type args struct {
Name string `json:"name" jsonschema:"the person to greet"`
}
mcp.AddTool(server, &mcp.Tool{
Name: "greet",
Description: "say hi",
}, func(ctx context.Context, req *mcp.CallToolRequest, args args) (*mcp.CallToolResult, any, error) {
return &mcp.CallToolResult{
Content: []mcp.Content{
&mcp.TextContent{Text: "Hi " + args.Name},
},
}, nil, nil
})
// 1. use stdio transport to run the server
// if err := server.Run(context.Background(), &mcp.StdioTransport{}); err != nil {
// log.Printf("Server failed: %v", err)
// }
// 2. use streamhttp transport to run the server
handler := mcp.NewStreamableHTTPHandler(func(r *http.Request) *mcp.Server { return server }, nil)
http.Handle("/mcp", handler)
addr := "127.0.0.1:18061"
log.Printf("✅ MCP server running on http://%s/mcp", addr)
if err := http.ListenAndServe(addr, nil); err != nil {
log.Fatalf("Server failed: %v", err)
}
}| Category | Requirement |
|---|---|
| Performance | Support concurrent access; async handling for multi-source queries and generation |
| Error Handling | Implement retry mechanisms; standardized error responses |
| Scalability | New data sources or AI backends can be added with minimal intrusion |
| Security | Validate uploaded files; prevent malicious input or unsafe content |
| Configuration | API keys, AI backends, and other parameters configurable via environment or config files |
-
Upon startup, the MCP server should register the following tools:
- Image Search
- Video Search
- AI Image Generation
- AI Image Editing
- User Image Management
-
The upper-level LLM should be able to call these tools through MCP protocol seamlessly
-
All tools should return standardized response structures containing metadata (e.g., source, size, URL)
- Support for Text-to-Video generation
- Embedding-based similarity search
- User access control and permissions
- Caching and performance optimization
- AI-based tagging, categorization, and caption generation
- This project implements an MCP Tool Server in Go
- It provides multi-source image and video access & generation capabilities
- Emphasizes standardized MCP interfaces, modularity, extensibility, and security
- Model Context Protocol: The open protocol that connects AI applications to the systems where context lives
- Example Clients: A list of applications that support MCP integrations
- Antitrust Policy: MCP Project Antitrust Policy for participants and contributors
- Contributor Communication: Communication strategy and framework for the Model Context Protocol community
- Governance and Stewardship: Learn about the Model Context Protocol's governance structure and how to participate in the community
- SEP Guidelines: Specification Enhancement Proposal (SEP) guidelines for proposing changes to the Model Context Protocol
- Working and Interest Groups: Learn about the two forms of collaborative groups within the Model Context Protocol's governance structure - Working Groups and Interest Groups.
- Roadmap: Our plans for evolving Model Context Protocol
- Build an MCP client: Get started building your own client that can integrate with all MCP servers.
- Build an MCP server: Get started building your own server to use in Claude for Desktop and other clients.
- Connect to local MCP servers: Learn how to extend Claude Desktop with local MCP servers to enable file system access and other powerful integrations
- Connect to remote MCP Servers: Learn how to connect Claude to remote MCP servers and extend its capabilities with internet-hosted tools and data sources
- What is the Model Context Protocol (MCP)?
- Architecture overview
- Understanding MCP clients
- Understanding MCP servers
- SDKs: Official SDKs for building with Model Context Protocol
- MCP Inspector: In-depth guide to using the MCP Inspector for testing and debugging Model Context Protocol servers
- Understanding Authorization in MCP: Learn how to implement secure authorization for MCP servers using OAuth 2.1 to protect sensitive resources and operations
- Example Servers: A list of example servers and implementations
- Architecture
- Authorization
- Overview
- Lifecycle
- Security Best Practices
- Transports
- Cancellation
- Ping
- Progress
- Key Changes
- Elicitation
- Roots
- Sampling
- Specification
- Schema Reference
- Overview
- Prompts
- Resources
- Tools
- Completion
- Logging
- Pagination
- Versioning