Skip to content

Commit fc7e639

Browse files
committed
docs: update README with actual implementation details and fix mcpServers config
1 parent 8a6dea1 commit fc7e639

File tree

1 file changed

+88
-68
lines changed

1 file changed

+88
-68
lines changed

README.md

Lines changed: 88 additions & 68 deletions
Original file line numberDiff line numberDiff line change
@@ -10,64 +10,56 @@ An MCP (Model Context Protocol) server for interacting with the Internet Archive
1010

1111
This MCP server provides tools to:
1212
- Save web pages to the Wayback Machine
13-
- Retrieve archived versions of web pages
14-
- Check archive status and availability
15-
- Search the Wayback Machine CDX API
13+
- Retrieve archived versions of web pages
14+
- Check archive status and statistics
15+
- Search the Wayback Machine CDX API for available snapshots
1616

1717
## Features
1818

1919
- **No API keys required** - Uses public Wayback Machine endpoints
2020
- **Save pages** - Archive any publicly accessible URL
21-
- **Retrieve archives** - Get archived versions with timestamps
22-
- **Verify archives** - Check if saves were successful
23-
- **Search archives** - Query available snapshots for a URL
24-
25-
## Architecture Plan
26-
27-
### Core Tools
28-
29-
1. **save_url**
30-
- Triggers archiving of a URL
31-
- Returns the archive timestamp and URL
32-
- Handles rate limiting and retries
33-
34-
2. **get_archived_url**
35-
- Retrieves the most recent archived version
36-
- Option to specify a specific timestamp
37-
- Returns the wayback URL
38-
39-
3. **check_archive_status**
40-
- Verifies if an archive request completed
41-
- Returns status and final archive URL
42-
43-
4. **search_archives**
44-
- Query CDX API for available snapshots
45-
- Filter by date range, status code, mimetype
46-
- Support different match types (exact, prefix, host, domain)
47-
- Return list of available versions with metadata
48-
49-
5. **get_archive_availability**
50-
- Check if a URL has been archived
51-
- Return closest snapshot to a given timestamp
52-
- Return summary of archive coverage
53-
54-
6. **get_timemap**
55-
- Retrieve TimeMap for a URL (all available timestamps)
56-
- Returns list of all archived versions
57-
- Implements Memento Protocol
58-
59-
7. **search_internet_archive**
60-
- Search across Internet Archive collections
61-
- Not limited to Wayback Machine
62-
- Find related archived content
63-
64-
### Technical Implementation
21+
- **Retrieve archives** - Get archived versions with optional timestamps
22+
- **Archive statistics** - Get capture counts and yearly statistics
23+
- **Search archives** - Query available snapshots with date filtering
24+
- **Rate limiting** - Built-in rate limiting to respect service limits
25+
26+
## Tools
27+
28+
### 1. **save_url**
29+
Archive a URL to the Wayback Machine.
30+
- **Input**: `url` (required) - The URL to save
31+
- **Output**: Success status, archived URL, and timestamp
32+
- Handles rate limiting automatically
33+
34+
### 2. **get_archived_url**
35+
Retrieve an archived version of a URL.
36+
- **Input**:
37+
- `url` (required) - The URL to retrieve
38+
- `timestamp` (optional) - Specific timestamp (YYYYMMDDhhmmss) or "latest"
39+
- **Output**: Archived URL, timestamp, and availability status
40+
41+
### 3. **search_archives**
42+
Search for all archived versions of a URL.
43+
- **Input**:
44+
- `url` (required) - The URL to search for
45+
- `from` (optional) - Start date (YYYY-MM-DD)
46+
- `to` (optional) - End date (YYYY-MM-DD)
47+
- `limit` (optional) - Maximum results (default: 10)
48+
- **Output**: List of snapshots with dates, URLs, status codes, and mime types
49+
50+
### 4. **check_archive_status**
51+
Check archival statistics for a URL.
52+
- **Input**: `url` (required) - The URL to check
53+
- **Output**: Archive status, first/last capture dates, total captures, yearly statistics
54+
55+
### Technical Details
6556

6657
- **Transport**: Stdio (for Claude Desktop integration)
67-
- **HTTP Client**: Built-in fetch for API calls
68-
- **Rate Limiting**: Respect Wayback Machine limits
69-
- **Error Handling**: Graceful handling of failed saves
70-
- **Validation**: URL validation before operations
58+
- **HTTP Client**: Built-in fetch with timeout support
59+
- **Rate Limiting**: 15 requests per minute (conservative limit)
60+
- **Error Handling**: Graceful handling with detailed error messages
61+
- **Validation**: URL and timestamp validation
62+
- **TypeScript**: Full type safety with Zod schema validation
7163

7264
### API Endpoints (No Keys Required)
7365

@@ -89,40 +81,63 @@ This MCP server provides tools to:
8981
```
9082
mcp-wayback-machine/
9183
├── src/
92-
│ ├── index.ts # Main server entry point
84+
│ ├── index.ts # MCP server entry point
9385
│ ├── tools/ # Tool implementations
94-
│ │ ├── save.ts
95-
│ │ ├── retrieve.ts
96-
│ │ ├── search.ts
97-
│ │ └── status.ts
86+
│ │ ├── save.ts # save_url tool
87+
│ │ ├── retrieve.ts # get_archived_url tool
88+
│ │ ├── search.ts # search_archives tool
89+
│ │ └── status.ts # check_archive_status tool
9890
│ ├── utils/ # Utilities
99-
│ │ ├── http.ts # HTTP client wrapper
100-
│ │ ├── validation.ts # URL validation
101-
│ │ └── rate-limit.ts # Rate limiting
102-
│ └── types.ts # TypeScript types
103-
├── tests/ # Test files
91+
│ │ ├── http.ts # HTTP client with timeout
92+
│ │ ├── validation.ts # URL/timestamp validation
93+
│ │ └── rate-limit.ts # Rate limiting implementation
94+
│ └── *.test.ts # Test files (alongside source)
95+
├── dist/ # Built JavaScript files
10496
├── package.json
10597
├── tsconfig.json
10698
└── README.md
10799
```
108100

109101
## Installation
110102

103+
### From npm
104+
```bash
105+
npm install -g mcp-wayback-machine
106+
```
107+
108+
### From source
111109
```bash
112-
npm install
113-
npm run build
110+
git clone https://github.com/Mearman/mcp-wayback-machine.git
111+
cd mcp-wayback-machine
112+
yarn install
113+
yarn build
114114
```
115115

116116
## Usage
117117

118-
Configure in Claude Desktop settings:
118+
### Claude Desktop Configuration
119+
120+
Add to your Claude Desktop settings:
121+
122+
#### Using npm installation
123+
```json
124+
{
125+
"mcpServers": {
126+
"wayback-machine": {
127+
"command": "npx",
128+
"args": ["mcp-wayback-machine"]
129+
}
130+
}
131+
}
132+
```
119133

134+
#### Using local installation
120135
```json
121136
{
122137
"mcpServers": {
123138
"wayback-machine": {
124139
"command": "node",
125-
"args": ["/path/to/mcp-wayback-machine/dist/index.js"]
140+
"args": ["/absolute/path/to/mcp-wayback-machine/dist/index.js"]
126141
}
127142
}
128143
}
@@ -131,11 +146,16 @@ Configure in Claude Desktop settings:
131146
## Development
132147

133148
```bash
134-
npm run dev # Run in development mode
135-
npm test # Run tests
136-
npm run build # Build for production
149+
yarn dev # Run in development mode with hot reload
150+
yarn test # Run tests
151+
yarn test:watch # Run tests in watch mode
152+
yarn build # Build for production
153+
yarn start # Run production build
137154
```
138155

156+
### Testing
157+
The project uses Vitest for testing. Tests are located alongside source files with `.test.ts` extensions.
158+
139159
## Resources
140160

141161
### Official Documentation

0 commit comments

Comments
 (0)