Navvy

AI-Powered Browser Automation • Chrome Extension • MCP Tools

Navvy connects Claude to your browser so it can see, navigate, and interact with web pages autonomously. It chains a Chrome extension, WebSocket server, Claude CLI, and MCP tool server into a seamless AI browsing agent.

Problem

Browser automation tools are either too rigid (Selenium, Playwright) requiring exact selectors that break constantly, or too limited (simple extensions that can only read text). There was no way to give an AI model full autonomous control of a browser — seeing screenshots, reading DOM structure, clicking elements, typing, navigating — all through natural language commands. Navvy bridges that gap by connecting Claude directly to Chrome via the DevTools Protocol and native OS input.

Architecture

┌───────────────┐   WebSocket    ┌───────────────┐   stdio    ┌───────────┐   CDP    ┌─────────┐
│   Extension   │ ◄────────────► │    Server      │ ◄────────► │ Claude CLI│ ◄──────► │   MCP   │
│  (Side Panel) │   (ws://3300)  │  (Express+WS)  │            │           │          │  Server │
└───────────────┘                └───────────────┘            └───────────┘          └────┬────┘
                                                                                          │
                                                                                    CDP + OS Input
                                                                                          │
                                                                                    ┌─────▼─────┐
                                                                                    │  Chrome    │
                                                                                    │ (port 9222)│
                                                                                    └───────────┘

Four components chained together: the Chrome extension provides a side panel UI, the WebSocket server relays prompts to Claude CLI and streams responses back, Claude runs with MCP browser tools, and the MCP server exposes 13 browser automation tools using Chrome DevTools Protocol and native OS-level input.

Technical Stack

Extension

Chrome Manifest V3, side panel UI, WebSocket client, conversation persistence

Server

Express + WebSocket relay, Claude CLI process management, stream-JSON parsing

MCP Server

13 browser tools via Model Context Protocol, CDP client, coordinate mapping

Input System

OS-detecting dispatcher — cliclick on macOS, PowerShell + user32.dll on Windows

Languages

TypeScript throughout — extension, server, and MCP server

Cross-Platform

macOS + Windows support with automated setup scripts (bash + PowerShell)

Browser Tools (13 MCP Tools)

browser_screenshot

Capture page as base64 PNG

browser_get_dom

Simplified DOM tree with ids, classes, roles

browser_click

Click element by CSS selector

browser_click_at

Click at viewport coordinates

browser_type

Type text at focused element

browser_key_press

Press keys (return, tab, escape)

browser_navigate

Navigate to a URL

browser_scroll

Scroll page up or down

browser_evaluate

Execute JavaScript on page

browser_get_url

Get current page URL and title

browser_wait

Wait for selector or duration

browser_tabs

List all open browser tabs

browser_switch_tab

Switch to a tab by ID

How It Works

Quick Start

$ git clone https://github.com/ordas21/navvy.git
$ bash setup.sh
$ chrome-debug # launches Chrome with remote debugging
$ npm run dev # starts Navvy server on :3300

Load the extension in Chrome, open the side panel, and start chatting. Tell it to "Search for headphones under $50" and watch it browse autonomously.

Project Stats

13

MCP Tools

4

Components

2

Platforms

MIT

License

Tech Breakdown

TypeScriptCore language
HTML / CSS / JSExtension UI

Impact

  • Enables fully autonomous web browsing through natural language — no selectors or scripts needed
  • Cross-platform support for macOS and Windows with native OS-level input
  • Built on the Model Context Protocol (MCP) standard, making tools reusable across any MCP-compatible AI agent
  • Open-source under MIT license — designed for the community to extend and build upon