当前位置：首页 > Vue

How to Build an Autonomous OSINT Agent in Python Using Claude's Tool Use API

When I started studying OSINT, I always felt I was just putting random values into software without deeply understanding what I was doing. After months in the field, I realized I wasn't really investigating — I was just executing steps that follow a predictable pattern. That's exactly what an AI agent is good at. So I built one.

In this tutorial you'll learn how to set up OpenOSINT, an open-source Python OSINT framework with an AI agent at its core. You'll learn how Claude's native tool use API works, how to run autonomous investigations from the terminal using the interactive AI REPL, how to use the direct CLI for scripting, and how to expose all the tools to Claude Code or Claude Desktop via an MCP server.

What Is OSINT and Why Manual Workflows Break Down
What You'll Build
Prerequisites
How Claude's Tool Use API Works
How to Install OpenOSINT
How to Use the Interactive AI REPL
How to Run Individual Tools from the CLI
How to Set Up the MCP Server
How the Agent Loop Works Under the Hood
Project Architecture
Conclusion

What Is OSINT and Why Manual Workflows Break Down

Open Source Intelligence (OSINT) is the practice of collecting and analyzing information from publicly available sources. Security researchers use it during penetration tests. Journalists use it to verify identities and trace connections. Threat analysts use it to profile infrastructure.

A typical OSINT workflow looks like this:

You have a target email address
You run holeheto find which platforms that email is registered on
You notice a username in the output
You manually copy that username and run sherlockto search 300+ platforms
You switch to a browser to check HaveIBeenPwned
You open another tab for a WHOIS lookup
You take notes and repeat

Every tool is a silo. Every pivot is manual. The investigation logic — what to run next, what to chain, what the findings mean — lives entirely in your head.

When you close the terminal, it's gone.

This tutorial walks you through OpenOSINT, an open-source Python framework that replaces that fragmented workflow with an AI agent that chains tools autonomously, executes them against real binaries, and saves a structured Markdown report.

More importantly, you'll learn the core design principle that makes it trustworthy for security research: hallucination in tool results is structurally impossible.

What You'll Build

By the end of this tutorial, you'll have a working OSINT agent that you can use in three ways:

Interactive AI REPL— type a target in natural language and the agent decides what to run
Direct CLI— run individual tools without AI, useful for scripting
MCP Server— expose all tools to Claude Code or Claude Desktop

Here's what a real session looks like:

$ openosintopenosint ❯ investigate [email protected]  → generate_dorks('[email protected]')  → search_email('[email protected]')  ✓ Found: Spotify, WordPress, Gravatar, Office365  → search_breach('[email protected]')  ✓ Found in 2 breaches: LinkedIn (2016), Adobe (2013)  → search_username('target_handle')  ✓ Found on: GitHub, Reddit, HackerNews, Twitter  ╭──────────────── Report ────────────────╮  │ ## Online Presence                     │  │ Spotify · WordPress · Gravatar         │  │                                        │  │ ## Data Breaches                       │  │ LinkedIn (2016) · Adobe (2013)         │  ╰────────────────────────────────────────╯  ✓ Report saved → reports/2026-05-11_report.md

The agent went from email → linked accounts → username pivot → cross-platform search with no human orchestration at any step.

Prerequisites

To follow this tutorial, you'll need:

Python 3.10 or later installed on your machine
Basic familiarity with the command line
An Anthropic API key — only required for the AI REPL, not for the CLI or MCP server
Git installed

You don't need prior experience with OSINT tools or the Anthropic SDK.

How Claude's Tool Use API Works

Before you dive into installation, it's worth understanding the mechanism that makes this framework trustworthy for security research.

Most AI applications that wrap external tools work by generating text that describes what a tool wouldreturn. That's a problem when accuracy matters — the model can hallucinate plausible-looking usernames, fake subdomains, or data breaches that never happened.

Claude's tool use API works differently. When the model decides it needs to call a tool, it does notgenerate the output. It stops and emits a structured tool_useblock containing the tool name and the arguments it wants to pass.

Your code then runs the actual binary — holehe, sherlock, or whatever else — and sends the real output back as a tool_result. The model reads that real output and decides its next step.

Here's the flow:

User prompt    ↓Model decides to call search_email()    ↓Hard stop — model emits tool_use block    ↓Your code runs holehe against the real target    ↓Real output sent back as tool_result    ↓Model reads actual results, decides next step    ↓Repeat until investigation is complete

The model never generates tool output. It only ever reads it. If sherlockfinds 12 profiles, those 12 URLs go back into the context verbatim. The model cannot add a 13th that doesn't exist.

This is not a prompting trick or a system prompt instruction. It is how the API is architected. Keep this in mind as you read through the agent loop code later in this tutorial.

How to Install OpenOSINT

Start by cloning the repository and installing the package:

git clone https://github.com/OpenOSINT/OpenOSINT.gitcd OpenOSINTpip install -e .

Alternatively, if you just want to use the tool without modifying the source, install it directly from PyPI:

pip install openosint

Next, set your Anthropic API key. This is only required for the interactive AI REPL — the direct CLI and MCP server work without it:

export ANTHROPIC_API_KEY=sk-ant-...

How to Install the External Tool Dependencies

OpenOSINT wraps several standalone OSINT tools. Install the ones you plan to use:

pip install holehe            # email account enumerationpip install sherlock-project  # username search across 300+ platformspip install sublist3r         # subdomain enumeration

For phone intelligence, phoneinfogais a standalone binary. Download the release for your platform from its GitHub releases page and place it somewhere in your PATH.

How to Configure Optional API Keys

Two tools work at higher rate limits with optional API keys:

export HIBP_API_KEY=your_key    # required for breach checks via HaveIBeenPwned v3export IPINFO_TOKEN=your_token  # optional — raises ipinfo.io rate limits

If a binary is missing or an API key is not configured, that specific tool returns a descriptive error string. All other tools continue to work normally.

How to Use the Interactive AI REPL

Run openosintwith no arguments to start the AI-powered REPL. You can also use openosint shell— it's equivalent:

$ openosint# or$ openosint shell

If you prefer to pass the API key inline rather than via environment variable, use the --api-keyflag:

$ openosint --api-key sk-ant-...

You'll get a prompt where you can type targets or questions in natural language:

openosint ❯ investigate [email protected] ❯ find all accounts for johndoe99openosint ❯ what subdomains does example.com have?openosint ❯ check if +14155552671 is a mobile number

The agent decides which tools to run based on your input. You don't need to specify which tools to use or in what order. If you type an email address, the agent will run email enumeration. If it finds a linked username, it may pivot and search that username across platforms.

Reports are saved automatically to the reports/directory after every investigation that produces structured findings.

Here are the commands available inside the REPL:

Command	Description
`clear`	Reset the conversation memory
`save`	Manually save the last report
`tools`	Show available tools and their status
`config`	Show current configuration
`help`	List all commands
`exit`or Ctrl-D	Quit

How to Run Individual Tools from the CLI

If you want to run a single tool without the AI layer — for scripting, automation, or quick lookups — use the direct CLI:

# Email account enumeration (default timeout: 120s)openosint email [email protected]# With a custom timeout in secondsopenosint email [email protected] -t 60# Username search across 300+ platforms (default timeout: 180s)openosint username johndoe99# Enable verbose output for debuggingopenosint -v email [email protected]

The direct CLI doesn't require an Anthropic API key. It runs the underlying binary and prints the output to the terminal.

This mode is useful when you need predictable, scriptable behavior — for example, piping output into another tool or running automated checks.

How to Set Up the MCP Server

OpenOSINT also ships as a Model Context Protocol (MCP) server. This exposes all 9 tools to any MCP-compatible AI client.

How to Register with Claude Code

claude mcp add openosint python /absolute/path/to/OpenOSINT/openosint/mcp_server.py

Verify the registration worked:

claude mcp list

Once registered, you can drive investigations from the Claude Code prompt:

> Investigate [email protected]. If you find a linked username,  trace it across other platforms and compile a full report.

How to Configure Claude Desktop

Add the following to your Claude Desktop config at ~/Library/Application Support/Claude/claude_desktop_config.json:

{   "mcpServers": {     "openosint": {       "command": "python",      "args": ["/absolute/path/to/OpenOSINT/openosint/mcp_server.py"]    }  }}

Restart Claude Desktop after saving the file. The tools will appear in Claude's tool list.

The MCP server uses stdio transport and does not need a persistent background process. Claude Code or Claude Desktop starts it on demand.

How the Agent Loop Works Under the Hood

Here is a simplified version of the agent loop from openosint/agent.py:

import anthropicimport asyncioclient = anthropic.Anthropic()async def run_investigation(user_prompt: str) -> str:    messages = [{ "role": "user", "content": user_prompt}]    while True:        response = client.messages.create(            model="claude-...",   # model configured via --api-key / env var            max_tokens=4096,            tools=TOOL_SCHEMAS,   # JSON schemas for all 9 tools            messages=messages        )        # Agent is done — extract and return the final report        if response.stop_reason == "end_turn":            return extract_text(response)        # Agent needs a tool — run the real binary        if response.stop_reason == "tool_use":            tool_results = []            for block in response.content:                if block.type == "tool_use":                    # Runs holehe, sherlock, etc. as real subprocesses                    real_output = await execute_tool(block.name, block.input)                    tool_results.append({                         "type": "tool_result",                        "tool_use_id": block.id,                        "content": real_output  # real output, never generated                    })            # Append assistant turn and real tool results to conversation            messages.append({ "role": "assistant", "content": response.content})            messages.append({ "role": "user", "content": tool_results})

There are a few important things to understand in this code.

The loop runs untilstop_reason == "end_turn": The agent decides when it has gathered enough information to write the final report. It may call one tool or ten, depending on what it finds.
execute_tool()runs real subprocesses: It's a thin async wrapper around Python's asyncio.create_subprocess_exec()with a configurable timeout. There's no simulation and no mocked data at any point.
Conversation history is maintained across the entire loop: Each tool result goes back into messages, so the model always has full context of what it found when deciding what to run next.
Tool schemas are defined as JSON: Each tool has a name, description, and parameter schema. The model uses these to know what tools exist and what arguments they accept. Here's a simplified example for search_email:

{     "name": "search_email",    "description": (        "Enumerates online services and social accounts "        "associated with an email address using holehe."    ),    "input_schema": {         "type": "object",        "properties": {             "email": {                 "type": "string",                "description": "Target email address"            }        },        "required": ["email"]    }}

The same pattern applies to all 9 tools. The model reads these schemas at the start of every request and uses them to decide what's available and how to call it.

Project Architecture

The codebase is organized in five layers. The hard rule across the codebase is that no layer imports from a layer above it:

openosint/tools/        Core tools                        Async wrappers around external binaries and APIs.                        Stateless. No AI. No CLI. Pure functions.openosint/agent.py      AI agent                        Anthropic tool use loop.                        Per-session conversation history.                        Imports from tools/. Nothing imports from agent.py.openosint/repl.py       Interactive REPL (prompt_toolkit + Rich)openosint/mcp_server.py MCP server (stdio transport)openosint/cli.py        CLI entry point

This separation makes each layer independently testable. The core tools are pure async functions that take a string and return a string — you can unit test them without touching the agent or the CLI.

It also means the AI layer is entirely optional. If you don't have an Anthropic API key, you use the CLI and bypass the agent. The MCP server also operates independently of the agent.

The 9 Available Tools

Tool	Backend	What it returns
`search_email`	holehe	Social accounts linked to an email
`search_username`	sherlock	Accounts across 300+ platforms
`search_breach`	HaveIBeenPwned v3	Breach names, dates, leaked data types
`search_whois`	python-whois	Registrant, registrar, creation/expiry
`search_ip`	ipinfo.io	Geolocation, ASN, hostname, org
`search_domain`	sublist3r	Subdomain enumeration
`generate_dorks`	built-in	12 targeted Google dork URLs, no network calls
`search_paste`	psbdmp.ws	Pastebin dump mentions
`search_phone`	phoneinfoga	Carrier, country, line type

Conclusion

In this tutorial, you learned how to set up and use OpenOSINT — a Python OSINT framework built on Claude's tool use API.

The key takeaway is the design principle: by using native tool use, the agent never generates tool output. It only reads real output from real binaries. This makes it suitable for security research where accuracy matters and hallucination isn't an acceptable failure mode.

To recap the three interfaces:

Run openosintfor the interactive AI REPL — best for full investigations with automatic chaining
Run openosint emailor openosint usernamefor direct CLI access — best for scripting and automation
Register the MCP server in Claude Code or Claude Desktop to run investigations inside your existing AI environment

The full source code is available on GitHub under the MIT license. Contributions and issues are welcome.

Legal note: OpenOSINT is for authorized security research, penetration testing, and investigative journalism only. Users are solely responsible for compliance with applicable law, including GDPR, CCPA, and the CFAA. See the DISCLAIMER.md for the full notice.

上一篇

Python Code Example Handbook – Sample Script Coding Tutorial for Beginners
下一篇

The Python Handbook – Learn Python for Beginners