Harrier EMR MCP gives your AI agent structured, bounded access to EMR operational evidence. By the end of this module you will have Harrier installed, connected to a client, and have run your first diagnosis against a real or demo EMR job.

What you need

  • Python 3.11+
  • An AWS account with EMR access (or use the Harrier Demo Lab)
  • An MCP-compatible client: Claude Desktop, AWS DevOps Agent, or any client that supports MCP stdio transport

Installation

Install from PyPI:

pip install harrier-emr-mcp

Verify the install:

harrier --version
# harrier-emr-mcp 1.2.0

Configure your MCP client

Claude Desktop

Add Harrier to your claude_desktop_config.json:

{
  "mcpServers": {
    "harrier": {
      "command": "python",
      "args": ["-m", "harrier_emr_mcp"],
      "env": {
        "AWS_PROFILE": "your-ops-profile",
        "AWS_DEFAULT_REGION": "us-east-1"
      }
    }
  }
}

Restart Claude Desktop. You should see Harrier appear in the tools panel.

AWS DevOps Agent

Refer to the DevOps Agent integration guide for the full configuration. The key settings are identical — Harrier runs as a subprocess with your AWS credentials available in the environment.

Running your first diagnosis

With Harrier connected, ask your agent:

“My EMR Serverless job run jr-abc123 failed. Can you diagnose what happened?”

The agent will call harrier_diagnose_emr_serverless with the job run ID. Harrier collects evidence from the EMR API, CloudWatch, and S3 logs, classifies the failure pattern, and returns a structured triage report.

A typical response looks like this:

Initial Diagnosis Report
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Likely path     SHUFFLE_SPILL
Confidence      Medium-High
Runtime         EMR Serverless

Checks
  ✓ Observability     Log evidence available in S3 and CloudWatch
  ✗ Spark Runtime     Shuffle spill detected in executor logs
  — Infrastructure    Not checked
  — Configuration     Not checked

Evidence
  Signal          Task reported 4.2GB bytes spilled to disk.
  Interpretation  Spark shuffle spill is degrading job performance or
                  causing task failure on low-disk workers.
  Next check      Inspect failed stages, increase spark.sql.shuffle.partitions,
                  or upgrade the worker size.

Video walkthrough

Harrier EMR MCP quickstart demo (placeholder)

This video will be updated with a live demo against an EMR Serverless job once the demo lab is live.

What Harrier does not do

Harrier is an initial diagnosis tool. It reads evidence and classifies failure patterns — it does not mutate your cluster, cancel jobs, restart applications, or apply fixes.

Recommendations from Harrier are advisory. You decide what to act on.

Next steps