Skip to content

Latest commit

 

History

History
299 lines (227 loc) · 13.5 KB

File metadata and controls

299 lines (227 loc) · 13.5 KB

Gemini Memories

Multi-Source Media Tool Server – Requirements Document

1. Project Name

Multi-Source Media MCP Server (M3S) An MCP Tool Implementation for Multi-Source Image and Video Access & Generation


2. Project Overview

This project is an MCP (Model Context Protocol) tool implementation, designed to provide a unified interface for large models (e.g., Gemini, LangChain, ChatGPT Agents, etc.) to access and generate images and videos from multiple sources.

⚠️ Note: This is not a traditional backend service, but an MCP tool server that exposes its capabilities via the MCP SDK.

Project Goals

  • Provide unified image/video access and generation capabilities for large models
  • Offer transparent access to diverse content sources (3rd-party APIs, web crawlers, user uploads, AI generation)
  • Ensure security, scalability, and maintainability
  • Maintain comprehensive logging and request traceability

3. Development Environment

  • Language: Go 1.25.1
  • MCP SDK: go-sdk
  • Additional Dependencies: HTTP client, environment variable management, and a shared utility package for generic MCP tool response handling.
  • Project Structure: Modular design, for example (can be adjusted and optimized as needed) :
gemini-mcp/
├── cmd/
│   └── server/
│       └── main.go               # Program entry point, initializes MCP Server
│
├── mcp/
│   ├── server.go                 # Creates MCP server instance, registers all tools
│   └── middleware/
│       └── logging.go            # Logging middleware for MCP tools
│   └── tools/
│       ├── pexels/
│       │   ├── client.go         # Pexels API client implementation
│       │   ├── handler.go        # MCP tool handlers for Pexels API
│       │   └── schema.go         # Defines Pexels tool input/output JSON schemas
│       │
│       ├── test_greet/
│       │   ├── handler.go        # Implements "greet" tool logic
│       │   └── schema.go         # Defines greet tool input/output JSON schemas
│       │
│       └── tool_xxx/             # Placeholder for other tools
│           ├── handler.go
│           └── schema.go
│
├── test/
│   ├── tool_greet_test.go        # Unit tests for greet tool
│   ├── tool_summary_test.go      # Unit tests for summary tool
│   └── integration_test.go       # Integration tests (run after feature development)
│
├── config/
│   └── config.go                 # (Optional) Configuration loading, API Key, environment variables
│
├── utils/
│   ├── env.go                    # Utility functions for environment variable expansion
│   ├── logger.go                 # General utility functions (logging, error handling, etc.)
│   └── mcp_tool_response.go      # Generic handler for MCP tool responses (e.g., TextContent)
│
├── go.mod
├── go.sum
└── README.md

4. Functional Requirements

4.1 Multi-Source Image and Video Access

Objective: Provide a unified interface to retrieve images and videos from third-party platforms.

Supported Platforms:

  • Unsplash – Image search and retrieval
  • Pexels – Image and video search and retrieval

Features:

  • Keyword-based search for photos and videos
  • Retrieve single photo or video details by ID
  • Access curated collections of photos and videos
  • Pagination support for search results and collections
  • Optional filters for orientation, size, color (for photos), and locale

Extensibility:

  • Architecture should easily accommodate new platforms
  • Platform API keys configurable via environment variables

4.2 AI Image Generation and Editing

Objective: Enable AI-based image generation and editing capabilities accessible via MCP tools.

Features:

  • Text-to-Image: Generate images from prompts
  • Image-to-Image: Modify or regenerate images using a source image and a prompt
  • Post-processing: Upscaling, style transfer, etc.

Configurable Backends: OpenAI, Stability AI, Hugging Face, local Stable Diffusion, etc.

Configurable Parameters: Prompt, size, style, random seed, etc.


4.3 Search Engine Crawling

Objective: Retrieve image content from search engines or web pages.

Features:

  • Asynchronous web crawling for images
  • Automatic filtering of duplicates or invalid links
  • Pagination and crawl-depth control

Compliance:

  • Respect robots.txt
  • Avoid copyright violations

4.4 User-Owned Image Management

Objective: Allow users to upload and manage their own images.

Features:

  • Image upload
  • List and manage uploaded images
  • Regenerate or edit uploaded images using AI tools
  • Local or cloud storage integration (e.g., AWS S3, Google Cloud Storage)

5. MCP Tool Design Requirements

  • Each functional module must be exposed as an MCP Tool
  • Maintain a consistent tool interface for LLM invocation
  • Support both STDIO and StreamHTTP communication modes
  • Input/output/error structures must be strongly typed and schema-annotated
  • Each tool must have clear documentation including parameters, return types, and examples
  • Design should emphasize extensibility, enabling addition of new data sources or AI features

Simple Example:

package main

import (
	"context"
	"log"
	"net/http"
	"github.com/modelcontextprotocol/go-sdk/mcp"
)

func main() {
	server := mcp.NewServer(&mcp.Implementation{Name: "greeter"}, nil)

	type args struct {
		Name string `json:"name" jsonschema:"the person to greet"`
	}

	mcp.AddTool(server, &mcp.Tool{
		Name:        "greet",
		Description: "say hi",
	}, func(ctx context.Context, req *mcp.CallToolRequest, args args) (*mcp.CallToolResult, any, error) {
		return &mcp.CallToolResult{
			Content: []mcp.Content{
				&mcp.TextContent{Text: "Hi " + args.Name},
			},
		}, nil, nil
	})

	// 1. use stdio transport to run the server
	// if err := server.Run(context.Background(), &mcp.StdioTransport{}); err != nil {
	// 	log.Printf("Server failed: %v", err)
	// }

	// 2. use streamhttp transport to run the server
	handler := mcp.NewStreamableHTTPHandler(func(r *http.Request) *mcp.Server { return server }, nil)

	http.Handle("/mcp", handler)

	addr := "127.0.0.1:18061"
	log.Printf("✅ MCP server running on http://%s/mcp", addr)

	if err := http.ListenAndServe(addr, nil); err != nil {
		log.Fatalf("Server failed: %v", err)
	}
}

6. Non-Functional Requirements

Category Requirement
Performance Support concurrent access; async handling for multi-source queries and generation
Error Handling Implement retry mechanisms; standardized error responses
Scalability New data sources or AI backends can be added with minimal intrusion
Security Validate uploaded files; prevent malicious input or unsafe content
Configuration API keys, AI backends, and other parameters configurable via environment or config files

8. Expected Outputs and Behaviors

  • Upon startup, the MCP server should register the following tools:

    • Image Search
    • Video Search
    • AI Image Generation
    • AI Image Editing
    • User Image Management
  • The upper-level LLM should be able to call these tools through MCP protocol seamlessly

  • All tools should return standardized response structures containing metadata (e.g., source, size, URL)


9. Future Extensions

  • Support for Text-to-Video generation
  • Embedding-based similarity search
  • User access control and permissions
  • Caching and performance optimization
  • AI-based tagging, categorization, and caption generation

10. Summary

  • This project implements an MCP Tool Server in Go
  • It provides multi-source image and video access & generation capabilities
  • Emphasizes standardized MCP interfaces, modularity, extensibility, and security

11. Detail Model Context Protocol Documentation Links