The Program Integrity Alliance (PIA) aims to make working with U.S. Government datasets easier and AI-friendly. We have ingested hundreds of thousands of documents and articles across a range of sources, and this list is growing. This MCP server enables AIs to search this data at a more detailed level than on most source websites, for example, searching within PDF reports to find the exact pages where text and images appear.
Full attribution is given to the amazing open federal data sources, and all links in the data provided by PIA will always direct back to the original source.
Currently, the list of datasets includes:
- U.S. Government Accountability Office (GAO) - 10k Federal Reports since 2010 and 5.5k Open Oversight Recommendations
- Oversight.gov - 28k OIG Federal Reports since 2010, and 29k Open Oversight Recommendations
- U.S. Congress - Bill texts for sessions 118 and 119
- Department of Justice (DOJ) - 195k Press Releases since 2000
- Federal Agency annual reports - Congressional Justification, Financial Report, Performance Report - 139 reports across 10 priority agencies, with best coverage in 2024.
- All Congression Research Service reports - 22k reports provided by EveryCRSReport.com
- U.S. Presidential Executive Orders - Order for the last 7 presidencies as provided by the Federal Register
This data is updated weekly, and we will be adding more datasets and tools soon.
If you have any questions, or requests for other datasets, we look forward to hearing from you by raising an issue here.
π€ Contribute β’ π Report Bugs or Questions
- π Document Search: Query PIA database with comprehensive OData filtering options
- π Faceted Search: Discover available filter fields and values
- π AI Instruction Prompts: Prompts that instruct LLMs on how to summarize search results and use search tools
- Go to https://mcp.programintegrity.org/get-api-key
- If you don't have a free PIA account, click the 'No account? Create one' link, otherwise log in
- Once logged in, you should automatically receive your key
- Download and run the latest version of Docker Desktop
- Navigate to 'MCP Toolkit'
- Search for 'Program Integrity Alliance'
- Add as a server by clicking '+'
- Under 'Configuration' enter your key
- In 'MCP Toolkit' navigate to 'Clients'
- Choose one, eg 'Claude Desktop'
- Start your Client
- You should now see 'pia_search_content' and other tools
To install PIA Server for Claude Desktop automatically via Smithery:
npx -y @smithery/cli install pia-mcp-server --client claude
Install using uv:
uv tool install pia-mcp-server
For development:
# Clone and set up development environment
git clone https://github.com/Program-Integrity-Alliance/pia-mcp-local.git
cd pia-mcp-local
# Create and activate virtual environment
uv venv
source .venv/bin/activate
# Install with test dependencies
uv pip install -e ".[test]"
For Docker:
# Build the Docker image if you want to use a local image
git clone https://github.com/Program-Integrity-Alliance/pia-mcp-local.git
cd pia-mcp-local
docker build -t pia-mcp-server:latest .
Add this configuration to your MCP client config file:
{
"mcpServers": {
"pia-mcp-server": {
"command": "uv",
"args": [
"tool",
"run",
"pia-mcp-server",
"--api-key", "YOUR_API_KEY"
],
"cwd": "/path/to/your/pia-mcp-local"
}
}
}
For Docker:
You must build the Docker image ...
docker build -t pia-mcp-server:latest .
Then add this to your Client, eg Claude ...
{
"mcpServers": {
"pia-mcp-server": {
"command": "docker",
"args": [
"run",
"--rm",
"-i",
"pia-mcp-server:latest",
"--api-key", "YOUR_API_KEY"
]
}
}
}
The server provides 11 tools for searching the Program Integrity Alliance (PIA) database:
Purpose: Comprehensive search tool for querying document content and recommendations in the PIA database.
Description: Returns comprehensive results with full citation information and clickable links for proper attribution. Each result includes corresponding citations with data source attribution. Major data sources include: Department of Justice (198k+ docs), Congress.gov (29k+ docs), Oversight.gov (22k+ docs), CRS (22k+ docs), GAO (10k+ docs). Supports complex OData filtering with boolean logic, operators, and grouping.
Parameters:
query
(required): Search query textfilter
(optional): OData filter expression supporting complex boolean logicpage
(optional): Page number (default: 1)page_size
(optional): Results per page (default: 10)search_mode
(optional): Search mode (default: content)limit
(optional): Maximum results limitinclude_facets
(optional): Include facets in results (default: false)
Purpose: Get available facets (filter values) for the PIA database content search.
Description: This can help understand what filter values are available before performing content searches. Major data sources include: Department of Justice (198k+ docs), Congress.gov (29k+ docs), Oversight.gov (22k+ docs), CRS (22k+ docs), GAO (10k+ docs).
Parameters:
query
(optional): Optional query to get facets for (default: "")filter
(optional): Optional OData filter expression
Purpose: Search the Program Integrity Alliance (PIA) database for document titles only.
Description: Returns document titles and metadata without searching the full content. Useful for finding specific documents by title or discovering available documents. Major data sources include: Department of Justice (198k+ docs), Congress.gov (29k+ docs), Oversight.gov (22k+ docs), CRS (22k+ docs), GAO (10k+ docs).
Parameters:
query
(required): Search query text (searches document titles only)filter
(optional): OData filter expression supporting complex boolean logicpage
(optional): Page number (default: 1)page_size
(optional): Results per page (default: 10)limit
(optional): Maximum results limitinclude_facets
(optional): Include facets in results (default: false)
Purpose: Get available facets (filter values) for the PIA database title search.
Description: This can help understand what filter values are available before performing title searches. Major data sources include: Department of Justice (198k+ docs), Congress.gov (29k+ docs), Oversight.gov (22k+ docs), CRS (22k+ docs), GAO (10k+ docs).
Parameters:
query
(optional): Optional query to get facets for (default: "")filter
(optional): Optional OData filter expression
Purpose: Search for GAO document content and recommendations.
Description: This tool automatically filters results to only include documents from the Government Accountability Office (GAO). Returns comprehensive results with full citation information and clickable links for proper attribution.
Parameters:
query
(required): Search query textfilter
(optional): OData filter expression (SourceDocumentDataSource is automatically set to 'GAO')page
(optional): Page number (default: 1)page_size
(optional): Results per page (default: 10)search_mode
(optional): Search mode (default: content)limit
(optional): Maximum results limitinclude_facets
(optional): Include facets in results (default: false)
Purpose: Search for OIG document content and recommendations.
Description: This tool automatically filters results to only include documents from Office of Inspector General (OIG) sources. Returns comprehensive results with full citation information and clickable links for proper attribution.
Parameters:
query
(required): Search query textfilter
(optional): OData filter expression (SourceDocumentDataSource is automatically set to 'OIG')page
(optional): Page number (default: 1)page_size
(optional): Results per page (default: 10)search_mode
(optional): Search mode (default: content)limit
(optional): Maximum results limitinclude_facets
(optional): Include facets in results (default: false)
Purpose: Search for CRS document content and recommendations.
Description: This tool automatically filters results to only include documents from Congressional Research Service (CRS). Returns comprehensive results with full citation information and clickable links for proper attribution.
Parameters:
query
(required): Search query textfilter
(optional): OData filter expression (SourceDocumentDataSource is automatically set to 'CRS')page
(optional): Page number (default: 1)page_size
(optional): Results per page (default: 10)search_mode
(optional): Search mode (default: content)limit
(optional): Maximum results limitinclude_facets
(optional): Include facets in results (default: false)
Purpose: Search for Department of Justice document content and recommendations.
Description: This tool automatically filters results to only include documents from the Department of Justice. Returns comprehensive results with full citation information and clickable links for proper attribution.
Parameters:
query
(required): Search query textfilter
(optional): OData filter expression (SourceDocumentDataSource is automatically set to 'Department of Justice')page
(optional): Page number (default: 1)page_size
(optional): Results per page (default: 10)search_mode
(optional): Search mode (default: content)limit
(optional): Maximum results limitinclude_facets
(optional): Include facets in results (default: false)
Purpose: Search for Congress.gov document content and recommendations.
Description: This tool automatically filters results to only include documents from Congress.gov. Returns comprehensive results with full citation information and clickable links for proper attribution.
Parameters:
query
(required): Search query textfilter
(optional): OData filter expression (SourceDocumentDataSource is automatically set to 'Congress.gov')page
(optional): Page number (default: 1)page_size
(optional): Results per page (default: 10)search_mode
(optional): Search mode (default: content)limit
(optional): Maximum results limitinclude_facets
(optional): Include facets in results (default: false)
Purpose: Simple search interface for ChatGPT Connectors.
Description: Search the Program Integrity Alliance (PIA) database and return a list of potentially relevant search results with titles, snippets, and URLs for citation. This endpoint is one of the supported for OpenAI's MCP spec when integrating ChatGPT Connectors.
Parameters:
query
(required): A search query string to find relevant documents in the PIA database
Purpose: Document retrieval by ID for ChatGPT Connectors.
Description: Retrieve the full contents of a specific document from the PIA database using its unique identifier. This endpoint is one of the supported for OpenAI's MCP spec when integrating ChatGPT Connectors.
Parameters:
id
(required): A unique identifier for the document to retrieve
Comprehensive search with OData filtering and faceting. The filter
parameter uses standard OData query syntax.
- Content Search (
pia_search_content
): Searches within document content and recommendations for comprehensive results - Title Search (
pia_search_titles
): Searches document titles only - faster and useful for document discovery
Example Filter Expressions:
- Basic filter:
"SourceDocumentDataSource eq 'GAO'"
- Multiple conditions:
"SourceDocumentDataSource eq 'GAO' or SourceDocumentDataSource eq 'OIG'"
- Complex grouping:
"SourceDocumentDataSource eq 'GAO' and RecStatus ne 'Closed'"
- Negation:
"SourceDocumentDataSource ne 'Department of Justice' and not (RecStatus eq 'Closed')"
- List membership:
"IsIntegrityRelated eq 'Yes' and RecPriorityFlag in ('High', 'Critical')"
- Date ranges:
"SourceDocumentPublishDate ge '2020-01-01' and SourceDocumentPublishDate le '2024-12-31'"
- Boolean grouping:
"(SourceDocumentDataSource eq 'GAO' or SourceDocumentDataSource eq 'OIG') and RecStatus eq 'Open'"
OData Filter Operators:
eq
- equals:field eq 'value'
ne
- not equals:field ne 'value'
gt
- greater than:amount gt 1000
ge
- greater than or equal:date ge '2023-01-01'
lt
- less than:amount lt 5000
le
- less than or equal:date le '2023-12-31'
in
- value in list:status in ('Active', 'Pending')
OData Logical Operators:
and
- logical AND:field1 eq 'value' and field2 gt 100
or
- logical OR:status eq 'Active' or status eq 'Pending'
not
- logical NOT:not (status eq 'Inactive')
()
- grouping:(field1 eq 'A' or field1 eq 'B') and field2 gt 0
OData String Functions:
contains(field, 'text')
- field contains textstartswith(field, 'prefix')
- field starts with prefixendswith(field, 'suffix')
- field ends with suffix
Discover available field names and values for filtering.
Tool Name: pia_search_facets
Parameters:
query
(optional): Optional query to get facets for (default: "")
Purpose:
- Discover available field names (e.g.,
data_source
,document_type
,agency
) - Find possible field values (e.g., "OIG", "GAO", "audit_report")
- Understand data types for each field (string, date, number)
This information helps you construct proper filter
expressions for the search tools.
To effectively use OData filters, follow this workflow:
Use the pia_search_facets
tool to explore what fields are available for filtering. You can provide a query to get facets relevant to your search topic, or omit the query to see all available fields.
The facets response will show available fields and their possible values:
{
"SourceDocumentDataSource": ["OIG", "GAO", "CMS", "FBI"],
"RecStatus": ["Open", "Closed", "In Progress"],
"RecPriorityFlag": ["High", "Medium", "Low", "Critical"],
"IsIntegrityRelated": ["Yes", "No"],
"SourceDocumentPublishDate": "2020-01-01 to 2024-12-31"
}
Use the pia_search
tool with discovered fields to create precise OData filters:
Basic Example:
Query: "Medicare fraud"
Filter: "SourceDocumentDataSource eq 'GAO' and SourceDocumentPublishDate ge '2023-01-01' and IsIntegrityRelated eq 'Yes'"
Complex Example:
Query: "healthcare violations"
Filter: "(SourceDocumentDataSource eq 'OIG' or SourceDocumentDataSource eq 'CMS') and RecPriorityFlag in ('High', 'Critical') and SourceDocumentPublishDate ge '2023-01-01'"
The server provides prompts that instruct the calling LLM on how to effectively use PIA tools and format responses:
Provides guidance on how to summarize information from PIA search results with proper citations.
Prompt Name: summarization_guidance
Purpose: Ensures LLM creates fact-based summaries with inline citations and proper reference formatting
Arguments: None (reusable guidance)
Returns: Comprehensive instructions that guide the LLM to:
- Only include facts that appear in the provided search results (no prior knowledge)
- Use proper inline citation format [n] for every factual statement
- Create a References section with format: [n] Document Title β Page X β Source Name β URL
- Follow objective, factual style guidelines without speculation or filler
- Include all necessary attribution elements exactly as provided in search results
- Organize information logically and ensure every fact has supporting citations
Provides guidance on how to perform PIA searches with or without filters.
Prompt Name: search_guidance
Purpose: Guides LLM through proper search workflow including filter discovery and OData syntax for all four search tools
Arguments: None (reusable guidance)
Returns: Comprehensive instructions that guide the LLM to:
- Run unfiltered searches by default unless filter criteria are mentioned
- Choose between content search (comprehensive) and title search (fast discovery)
- Use
pia_search_content_facets
orpia_search_titles_facets
to discover available filter fields and values - Build valid OData filter expressions with correct syntax and actual field names
- Apply proper OData operators:
eq
,ne
,gt
,ge
,lt
,le
,and
,or
- Fall back to unfiltered search when filtered search returns no results
- Validate all filter fields against available facets before use
The API key is always provided via the MCP server configuration. Additional settings can be configured through environment variables:
Variable | Purpose | Default |
---|---|---|
PIA_API_URL |
PIA API endpoint | https://mcp.programintegrity.org/ |
REQUEST_TIMEOUT |
API request timeout (seconds) | 60 |
MAX_RESULTS |
Maximum results per query | 50 |
The API key must be provided in your MCP client configuration using the --api-key
argument. Contact the Program Integrity Alliance to obtain your API key.
{
"mcpServers": {
"pia-mcp-server": {
"command": "pia-mcp-server",
"args": ["--api-key", "YOUR_API_KEY"]
}
}
}
Replace YOUR_API_KEY
with your actual PIA API key.
Run the test suite:
python -m pytest
Run with coverage:
python -m pytest --cov=pia_mcp_server
Released under the MIT License. See the LICENSE file for details.
Made with β€οΈ for Government Transparency and Accountability