Navvy
AI-Powered Browser Automation • Chrome Extension • MCP Tools
Navvy connects Claude to your browser so it can see, navigate, and interact with web pages autonomously. It chains a Chrome extension, WebSocket server, Claude CLI, and MCP tool server into a seamless AI browsing agent.
Problem
Browser automation tools are either too rigid (Selenium, Playwright) requiring exact selectors that break constantly, or too limited (simple extensions that can only read text). There was no way to give an AI model full autonomous control of a browser — seeing screenshots, reading DOM structure, clicking elements, typing, navigating — all through natural language commands. Navvy bridges that gap by connecting Claude directly to Chrome via the DevTools Protocol and native OS input.
Architecture
┌───────────────┐ WebSocket ┌───────────────┐ stdio ┌───────────┐ CDP ┌─────────┐
│ Extension │ ◄────────────► │ Server │ ◄────────► │ Claude CLI│ ◄──────► │ MCP │
│ (Side Panel) │ (ws://3300) │ (Express+WS) │ │ │ │ Server │
└───────────────┘ └───────────────┘ └───────────┘ └────┬────┘
│
CDP + OS Input
│
┌─────▼─────┐
│ Chrome │
│ (port 9222)│
└───────────┘Four components chained together: the Chrome extension provides a side panel UI, the WebSocket server relays prompts to Claude CLI and streams responses back, Claude runs with MCP browser tools, and the MCP server exposes 13 browser automation tools using Chrome DevTools Protocol and native OS-level input.
Technical Stack
Extension
Chrome Manifest V3, side panel UI, WebSocket client, conversation persistence
Server
Express + WebSocket relay, Claude CLI process management, stream-JSON parsing
MCP Server
13 browser tools via Model Context Protocol, CDP client, coordinate mapping
Input System
OS-detecting dispatcher — cliclick on macOS, PowerShell + user32.dll on Windows
Languages
TypeScript throughout — extension, server, and MCP server
Cross-Platform
macOS + Windows support with automated setup scripts (bash + PowerShell)
Browser Tools (13 MCP Tools)
browser_screenshotCapture page as base64 PNG
browser_get_domSimplified DOM tree with ids, classes, roles
browser_clickClick element by CSS selector
browser_click_atClick at viewport coordinates
browser_typeType text at focused element
browser_key_pressPress keys (return, tab, escape)
browser_navigateNavigate to a URL
browser_scrollScroll page up or down
browser_evaluateExecute JavaScript on page
browser_get_urlGet current page URL and title
browser_waitWait for selector or duration
browser_tabsList all open browser tabs
browser_switch_tabSwitch to a tab by ID
How It Works
Quick Start
Load the extension in Chrome, open the side panel, and start chatting. Tell it to "Search for headphones under $50" and watch it browse autonomously.
Project Stats
13
MCP Tools
4
Components
2
Platforms
MIT
License
Tech Breakdown
Impact
- Enables fully autonomous web browsing through natural language — no selectors or scripts needed
- Cross-platform support for macOS and Windows with native OS-level input
- Built on the Model Context Protocol (MCP) standard, making tools reusable across any MCP-compatible AI agent
- Open-source under MIT license — designed for the community to extend and build upon