Harrier EMR MCP gives your AI agent structured, bounded access to EMR operational evidence. By the end of this module you will have Harrier installed, connected to a client, and have run your first diagnosis against a real or demo EMR job.
What you need
- Python 3.11+
- An AWS account with EMR access (or use the Harrier Demo Lab)
- An MCP-compatible client: Claude Desktop, AWS DevOps Agent, or any client that supports MCP stdio transport
Installation
Install from PyPI:
pip install harrier-emr-mcp
Verify the install:
harrier --version
# harrier-emr-mcp 1.2.0
Configure your MCP client
Claude Desktop
Add Harrier to your claude_desktop_config.json:
{
"mcpServers": {
"harrier": {
"command": "python",
"args": ["-m", "harrier_emr_mcp"],
"env": {
"AWS_PROFILE": "your-ops-profile",
"AWS_DEFAULT_REGION": "us-east-1"
}
}
}
}
Restart Claude Desktop. You should see Harrier appear in the tools panel.
AWS DevOps Agent
Refer to the DevOps Agent integration guide for the full configuration. The key settings are identical — Harrier runs as a subprocess with your AWS credentials available in the environment.
Running your first diagnosis
With Harrier connected, ask your agent:
“My EMR Serverless job run
jr-abc123failed. Can you diagnose what happened?”
The agent will call harrier_diagnose_emr_serverless with the job run ID. Harrier collects evidence from the EMR API, CloudWatch, and S3 logs, classifies the failure pattern, and returns a structured triage report.
A typical response looks like this:
Initial Diagnosis Report
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Likely path SHUFFLE_SPILL
Confidence Medium-High
Runtime EMR Serverless
Checks
✓ Observability Log evidence available in S3 and CloudWatch
✗ Spark Runtime Shuffle spill detected in executor logs
— Infrastructure Not checked
— Configuration Not checked
Evidence
Signal Task reported 4.2GB bytes spilled to disk.
Interpretation Spark shuffle spill is degrading job performance or
causing task failure on low-disk workers.
Next check Inspect failed stages, increase spark.sql.shuffle.partitions,
or upgrade the worker size.
Video walkthrough
This video will be updated with a live demo against an EMR Serverless job once the demo lab is live.
What Harrier does not do
Harrier is an initial diagnosis tool. It reads evidence and classifies failure patterns — it does not mutate your cluster, cancel jobs, restart applications, or apply fixes.
Recommendations from Harrier are advisory. You decide what to act on.
Next steps
- Run the Demo Lab to see Harrier against a realistic EMR failure
- Read MCP Tool Contracts to understand what each tool does and what permissions it needs
- Continue to MCP Server Patterns to learn how Harrier is built