feat: add Hudson Rock infostealer-corpus integration (v2.20.0) by abdullahbilal64 · Pull Request #9 · OpenOSINT/OpenOSINT

abdullahbilal64 · Jun 5, 2026

Summary

Adds search_hudsonrock, a new OSINT tool that queries Hudson Rock's Cavalier API for credentials exposed via infostealer malware (RedLine, Lumma, Raccoon, Vidar, StealC, …). Auto-routes by input shape — emails → /search-by-email, domains → /search-by-domain, usernames and E.164 phone numbers → /search-by-username — and works without an API key against the free public endpoint (50 req / 10 s rate limit). HUDSONROCK_API_KEY, if set, is sent as Authorization: Bearer … for commercial-tier access. Closes #4.

Why

Infostealer-corpus checks fill a coverage gap that search_breach (HaveIBeenPwned) doesn't address: HIBP indexes credentials that have been published as breaches, but a substantial fraction of compromised credentials surface only in malware botnet logs that are sold privately. For email and domain investigations this materially increases recall, and the domain mode returns useful aggregate signals (compromised-employee count, top stealer families, victim-AV breakdown) for assessing organisational exposure.

What's in the diff

New tool module

openosint/tools/search_hudsonrock.py — async run_hudsonrock_osint(query, timeout_seconds); _classify() selects the endpoint from input shape; separate formatters for the domain-aggregate response and per-record (email/username) responses; output redacts top-logins and masks victim IPs as returned by the API. Follows the project's tool-contract convention: never raises across the API boundary, returns descriptive error strings on failure.

Interface registration (per the integration checklist in CONTRIBUTING.md)

openosint/agent.py — Anthropic tool definition + dispatch entry in _TOOL_MAP; SYSTEM_PROMPT now suggests search_hudsonrock alongside search_breach for credential-exposure investigations.
openosint/mcp_server.py — Tool(...) entry in list_tools() and dispatch branch; module docstring updated to reflect 17 tools.
openosint/cli.py — openosint hudsonrock QUERY [-t SECONDS] subcommand.
openosint/repl.py — display row in _TOOL_INFO_ROWS.
openosint/web_server.py — _TOOL_CATALOG entry (Identity category) + _RUNNERS mapping; tool surfaces in the web UI sidebar and the AI chat tool-use path.

Version + docs

Version bumped to 2.20.0 across openosint/__init__.py, pyproject.toml, and .mcp/server.json (both the top-level version and the package entry).
README footer was at 2.19.0 and is now in sync with the rest. _VERSION in web_server.py was already at 2.20.0.
.env.example: new HUDSONROCK_API_KEY entry with a comment explaining the public-vs-commercial-tier behavior.
README.md: feature line (16 tools → 17), env-var table row, Integrations table row pointing at hudsonrock.com.
CHANGELOG.md: [2.20.0] entry under Added / Changed.

Drive-by cleanups inside the new file (flagging transparently — happy to split into a separate commit/PR if maintainers prefer)

_EMAIL_RE had [\w-]+ in the domain class, rejecting any multi-level domain (user@mail.example.com failed _is_valid_email()). Changed to [\w.-]+; the TLD [a-z]{2,} anchor and re.IGNORECASE flag are preserved. Note: the same flaw exists at openosint/web_server.py:757 inside _demo_chat_stream — left untouched here to keep this PR scoped to the integration; can be a follow-up.
_fetch_hudsonrock previously called _raise_for_status(resp.status) and then re-checked resp.status == 404 to return {}. The two checks were redundant: _raise_for_status silently returned on 404, then the caller detected 404 again. Collapsed into a single explicit 404 short-circuit in the caller; _raise_for_status now only raises, matching its name.

Test plan

# unit tests for the new tool — 25 tests
pytest tests/test_hudsonrock.py -v

# full suite, confirming no regression
pytest
# → 233 passed, 2 skipped (the 2 skipped depend on optional binaries)

# lint + format
ruff check openosint/      # → All checks passed!
ruff format --check openosint/   # → 31 files already formatted

# manual smoke against each interface
openosint hudsonrock user@example.com         # CLI, email shape
openosint hudsonrock example.com              # CLI, domain shape
openosint hudsonrock johndoe                  # CLI, username shape
openosint hudsonrock +14155552671             # CLI, phone-as-username shape
openosint                                     # then in REPL: "check infostealer exposure for example.com"
openosint web                                 # exercise via the web UI tool card and AI chat
python openosint/mcp_server.py                # invoke search_hudsonrock from an MCP client (Claude Desktop / Claude Code)

Adds search_hudsonrock, a new OSINT tool that queries Hudson Rock's Cavalier API for credentials exposed via infostealer malware (RedLine, Lumma, Raccoon, Vidar, StealC, ...). Auto-routes by input shape: emails → /search-by-email, domains → /search-by-domain, usernames and E.164 phone numbers → /search-by-username. No API key required; the optional HUDSONROCK_API_KEY is sent as Bearer auth for commercial-tier access. Registers the tool across all six interface layers (agent loop, MCP server, CLI subcommand, REPL display, web UI catalog) per the integration checklist in CONTRIBUTING.md. Bumps version to 2.20.0 across openosint/__init__.py, pyproject.toml, and .mcp/server.json. Adds .env.example entry, README env + Integrations table rows, and a CHANGELOG entry. Closes OpenOSINT#4.

SonoTommy · Jun 6, 2026

First off — this is a genuinely clean PR. Full integration checklist across all six layers, consistent version bump, 25 tests, and you flagged the _EMAIL_RE duplicate transparently instead of silently fixing it. Appreciated.
Two small changes before I merge:

The (redacted) label on top_logins — in _format_stealers the records print under Top logins (redacted):, but the formatter passes through whatever the API returns, and the test fixture has full addresses (user@example.com, admin@example.com). The label asserts a redaction the code doesn't actually perform. Either mask them in the formatter (e.g. first char of local-part + domain) or relabel to something accurate like Top logins (as returned by API):. For a tool we ship "for authorized security research only," I'd rather the wording match the behavior exactly.
CHANGELOG date — the [2.20.0] entry is dated 2026-06-06, but the commit and the README footer say June 5. Just align them.

On the web_server.py:757 duplicate of the same regex flaw — good call keeping it out of scope. I'll open a follow-up issue to track it (or happy to take a second PR if you want to grab it).
Everything else looks good to merge once those two land. Nice work.

- Rename "Top logins (redacted)" → "Top logins (as returned by API)". The formatter does no redaction itself; Hudson Rock's free tier already partial-masks server-side, but the test fixture and any future paid-tier response could carry unredacted logins under the old label. Honest label matches what the code does. - Align [2.20.0] CHANGELOG date to 2026-06-05 to match the commit log and README footer. Addresses review feedback on the PR.

abdullahbilal64 · Jun 7, 2026

Thanks for the quick and careful review and I've pushed both the fixes in the last commit. The redacted label was genuinely a very sharp thing to catch. It looked right end-to-end because the live API kept returning pre-masked strings during my testing, so the label happened to coincide with the behavior. I've gone with the option of fixing the statement and telling the user that we are just passing along whatever is returned by the API. In addition, I've also fixed the date in the CHANGELOG to match with the actual commit date.

By the way, I'm happy to take the lead on the follow up PR.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: add Hudson Rock infostealer-corpus integration (v2.20.0)#9

feat: add Hudson Rock infostealer-corpus integration (v2.20.0)#9
abdullahbilal64 wants to merge 2 commits into
OpenOSINT:mainOpenOSINT/OpenOSINT:mainfrom
abdullahbilal64:feat/hudsonrock-integrationabdullahbilal64/OpenOSINT:feat/hudsonrock-integrationCopy head branch name to clipboard

abdullahbilal64 commented Jun 5, 2026

Uh oh!

SonoTommy commented Jun 6, 2026

Uh oh!

abdullahbilal64 commented Jun 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Search code, repositories, users, issues, pull requests...

Uh oh!

Conversation

abdullahbilal64 commented Jun 5, 2026

Summary

Why

What's in the diff

Test plan

Uh oh!

SonoTommy commented Jun 6, 2026

Uh oh!

abdullahbilal64 commented Jun 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants