Gemini Memories

Multi-Source Media Tool Server – Requirements Document

1. Project Name

Multi-Source Media MCP Server (M3S) An MCP Tool Implementation for Multi-Source Image and Video Access & Generation

2. Project Overview

This project is an MCP (Model Context Protocol) tool implementation, designed to provide a unified interface for large models (e.g., Gemini, LangChain, ChatGPT Agents, etc.) to access and generate images and videos from multiple sources.

⚠️ Note: This is not a traditional backend service, but an MCP tool server that exposes its capabilities via the MCP SDK.

Project Goals

Provide unified image/video access and generation capabilities for large models
Offer transparent access to diverse content sources (3rd-party APIs, web crawlers, user uploads, AI generation)
Ensure security, scalability, and maintainability
Maintain comprehensive logging and request traceability

3. Development Environment

Language: Go 1.25.1
MCP SDK: go-sdk
Additional Dependencies: HTTP client, environment variable management, and a shared utility package for generic MCP tool response handling.
Project Structure: Modular design, for example (can be adjusted and optimized as needed) :

gemini-mcp/
├── cmd/
│   └── server/
│       └── main.go               # Program entry point, initializes MCP Server
│
├── mcp/
│   ├── server.go                 # Creates MCP server instance, registers all tools
│   └── middleware/
│       └── logging.go            # Logging middleware for MCP tools
│   └── tools/
│       ├── pexels/
│       │   ├── client.go         # Pexels API client implementation
│       │   ├── handler.go        # MCP tool handlers for Pexels API
│       │   └── schema.go         # Defines Pexels tool input/output JSON schemas
│       │
│       ├── test_greet/
│       │   ├── handler.go        # Implements "greet" tool logic
│       │   └── schema.go         # Defines greet tool input/output JSON schemas
│       │
│       └── tool_xxx/             # Placeholder for other tools
│           ├── handler.go
│           └── schema.go
│
├── test/
│   ├── tool_greet_test.go        # Unit tests for greet tool
│   ├── tool_summary_test.go      # Unit tests for summary tool
│   └── integration_test.go       # Integration tests (run after feature development)
│
├── config/
│   └── config.go                 # (Optional) Configuration loading, API Key, environment variables
│
├── utils/
│   ├── env.go                    # Utility functions for environment variable expansion
│   ├── logger.go                 # General utility functions (logging, error handling, etc.)
│   └── mcp_tool_response.go      # Generic handler for MCP tool responses (e.g., TextContent)
│
├── go.mod
├── go.sum
└── README.md

4. Functional Requirements

4.1 Multi-Source Image and Video Access

Objective: Provide a unified interface to retrieve images and videos from third-party platforms.

Supported Platforms:

Unsplash – Image search and retrieval
Pexels – Image and video search and retrieval

Features:

Keyword-based search for photos and videos
Retrieve single photo or video details by ID
Access curated collections of photos and videos
Pagination support for search results and collections
Optional filters for orientation, size, color (for photos), and locale

Extensibility:

Architecture should easily accommodate new platforms
Platform API keys configurable via environment variables

4.2 AI Image Generation and Editing

Objective: Enable AI-based image generation and editing capabilities accessible via MCP tools.

Features:

Text-to-Image: Generate images from prompts
Image-to-Image: Modify or regenerate images using a source image and a prompt
Post-processing: Upscaling, style transfer, etc.

Configurable Backends: OpenAI, Stability AI, Hugging Face, local Stable Diffusion, etc.

Configurable Parameters: Prompt, size, style, random seed, etc.

4.3 Search Engine Crawling

Objective: Retrieve image content from search engines or web pages.

Features:

Asynchronous web crawling for images
Automatic filtering of duplicates or invalid links
Pagination and crawl-depth control

Compliance:

Respect robots.txt
Avoid copyright violations

4.4 User-Owned Image Management

Objective: Allow users to upload and manage their own images.

Features:

Image upload
List and manage uploaded images
Regenerate or edit uploaded images using AI tools
Local or cloud storage integration (e.g., AWS S3, Google Cloud Storage)

5. MCP Tool Design Requirements

Each functional module must be exposed as an MCP Tool
Maintain a consistent tool interface for LLM invocation
Support both STDIO and StreamHTTP communication modes
Input/output/error structures must be strongly typed and schema-annotated
Each tool must have clear documentation including parameters, return types, and examples
Design should emphasize extensibility, enabling addition of new data sources or AI features

Simple Example:

package main

import (
	"context"
	"log"
	"net/http"
	"github.com/modelcontextprotocol/go-sdk/mcp"
)

func main() {
	server := mcp.NewServer(&mcp.Implementation{Name: "greeter"}, nil)

	type args struct {
		Name string `json:"name" jsonschema:"the person to greet"`
	}

	mcp.AddTool(server, &mcp.Tool{
		Name:        "greet",
		Description: "say hi",
	}, func(ctx context.Context, req *mcp.CallToolRequest, args args) (*mcp.CallToolResult, any, error) {
		return &mcp.CallToolResult{
			Content: []mcp.Content{
				&mcp.TextContent{Text: "Hi " + args.Name},
			},
		}, nil, nil
	})

	// 1. use stdio transport to run the server
	// if err := server.Run(context.Background(), &mcp.StdioTransport{}); err != nil {
	// 	log.Printf("Server failed: %v", err)
	// }

	// 2. use streamhttp transport to run the server
	handler := mcp.NewStreamableHTTPHandler(func(r *http.Request) *mcp.Server { return server }, nil)

	http.Handle("/mcp", handler)

	addr := "127.0.0.1:18061"
	log.Printf("✅ MCP server running on http://%s/mcp", addr)

	if err := http.ListenAndServe(addr, nil); err != nil {
		log.Fatalf("Server failed: %v", err)
	}
}

6. Non-Functional Requirements

Category	Requirement
Performance	Support concurrent access; async handling for multi-source queries and generation
Error Handling	Implement retry mechanisms; standardized error responses
Scalability	New data sources or AI backends can be added with minimal intrusion
Security	Validate uploaded files; prevent malicious input or unsafe content
Configuration	API keys, AI backends, and other parameters configurable via environment or config files

8. Expected Outputs and Behaviors

Upon startup, the MCP server should register the following tools:
- Image Search
- Video Search
- AI Image Generation
- AI Image Editing
- User Image Management
The upper-level LLM should be able to call these tools through MCP protocol seamlessly
All tools should return standardized response structures containing metadata (e.g., source, size, URL)

9. Future Extensions

Support for Text-to-Video generation
Embedding-based similarity search
User access control and permissions
Caching and performance optimization
AI-based tagging, categorization, and caption generation

10. Summary

This project implements an MCP Tool Server in Go
It provides multi-source image and video access & generation capabilities
Emphasizes standardized MCP interfaces, modularity, extensibility, and security

11. Detail Model Context Protocol Documentation Links

Model Context Protocol: The open protocol that connects AI applications to the systems where context lives
Example Clients: A list of applications that support MCP integrations
Antitrust Policy: MCP Project Antitrust Policy for participants and contributors
Contributor Communication: Communication strategy and framework for the Model Context Protocol community
Governance and Stewardship: Learn about the Model Context Protocol's governance structure and how to participate in the community
SEP Guidelines: Specification Enhancement Proposal (SEP) guidelines for proposing changes to the Model Context Protocol
Working and Interest Groups: Learn about the two forms of collaborative groups within the Model Context Protocol's governance structure - Working Groups and Interest Groups.
Roadmap: Our plans for evolving Model Context Protocol
Build an MCP client: Get started building your own client that can integrate with all MCP servers.
Build an MCP server: Get started building your own server to use in Claude for Desktop and other clients.
Connect to local MCP servers: Learn how to extend Claude Desktop with local MCP servers to enable file system access and other powerful integrations
Connect to remote MCP Servers: Learn how to connect Claude to remote MCP servers and extend its capabilities with internet-hosted tools and data sources
What is the Model Context Protocol (MCP)?
Architecture overview
Understanding MCP clients
Understanding MCP servers
SDKs: Official SDKs for building with Model Context Protocol
MCP Inspector: In-depth guide to using the MCP Inspector for testing and debugging Model Context Protocol servers
Understanding Authorization in MCP: Learn how to implement secure authorization for MCP servers using OAuth 2.1 to protect sensitive resources and operations
Example Servers: A list of example servers and implementations
Architecture
Authorization
Overview
Lifecycle
Security Best Practices
Transports
Cancellation
Ping
Progress
Key Changes
Elicitation
Roots
Sampling
Specification
Schema Reference
Overview
Prompts
Resources
Tools
Completion
Logging
Pagination
Versioning

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Gemini Memories

1. Project Name

2. Project Overview

Project Goals

3. Development Environment

4. Functional Requirements

4.1 Multi-Source Image and Video Access

4.2 AI Image Generation and Editing

4.3 Search Engine Crawling

4.4 User-Owned Image Management

5. MCP Tool Design Requirements

6. Non-Functional Requirements

8. Expected Outputs and Behaviors

9. Future Extensions

10. Summary

11. Detail Model Context Protocol Documentation Links

FilesExpand file tree

GEMINI.md

Latest commit

History

GEMINI.md

File metadata and controls

Gemini Memories

1. Project Name

2. Project Overview

Project Goals

3. Development Environment

4. Functional Requirements

4.1 Multi-Source Image and Video Access

4.2 AI Image Generation and Editing

4.3 Search Engine Crawling

4.4 User-Owned Image Management

5. MCP Tool Design Requirements

6. Non-Functional Requirements

8. Expected Outputs and Behaviors

9. Future Extensions

10. Summary

11. Detail Model Context Protocol Documentation Links