Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

feat: add Hudson Rock infostealer-corpus integration (v2.20.0)#9

Open
abdullahbilal64 wants to merge 2 commits into
OpenOSINT:mainOpenOSINT/OpenOSINT:mainfrom
abdullahbilal64:feat/hudsonrock-integrationabdullahbilal64/OpenOSINT:feat/hudsonrock-integrationCopy head branch name to clipboard
Open

feat: add Hudson Rock infostealer-corpus integration (v2.20.0)#9
abdullahbilal64 wants to merge 2 commits into
OpenOSINT:mainOpenOSINT/OpenOSINT:mainfrom
abdullahbilal64:feat/hudsonrock-integrationabdullahbilal64/OpenOSINT:feat/hudsonrock-integrationCopy head branch name to clipboard

Conversation

@abdullahbilal64
Copy link
Copy Markdown

Summary

Adds search_hudsonrock, a new OSINT tool that queries Hudson Rock's Cavalier API for credentials exposed via infostealer malware (RedLine, Lumma, Raccoon, Vidar, StealC, …). Auto-routes by input shape — emails → /search-by-email, domains → /search-by-domain, usernames and E.164 phone numbers → /search-by-username — and works without an API key against the free public endpoint (50 req / 10 s rate limit). HUDSONROCK_API_KEY, if set, is sent as Authorization: Bearer … for commercial-tier access. Closes #4.

Why

Infostealer-corpus checks fill a coverage gap that search_breach (HaveIBeenPwned) doesn't address: HIBP indexes credentials that have been published as breaches, but a substantial fraction of compromised credentials surface only in malware botnet logs that are sold privately. For email and domain investigations this materially increases recall, and the domain mode returns useful aggregate signals (compromised-employee count, top stealer families, victim-AV breakdown) for assessing organisational exposure.

What's in the diff

New tool module

  • openosint/tools/search_hudsonrock.py — async run_hudsonrock_osint(query, timeout_seconds); _classify() selects the endpoint from input shape; separate formatters for the domain-aggregate response and per-record (email/username) responses; output redacts top-logins and masks victim IPs as returned by the API. Follows the project's tool-contract convention: never raises across the API boundary, returns descriptive error strings on failure.

Interface registration (per the integration checklist in CONTRIBUTING.md)

  • openosint/agent.py — Anthropic tool definition + dispatch entry in _TOOL_MAP; SYSTEM_PROMPT now suggests search_hudsonrock alongside search_breach for credential-exposure investigations.
  • openosint/mcp_server.pyTool(...) entry in list_tools() and dispatch branch; module docstring updated to reflect 17 tools.
  • openosint/cli.pyopenosint hudsonrock QUERY [-t SECONDS] subcommand.
  • openosint/repl.py — display row in _TOOL_INFO_ROWS.
  • openosint/web_server.py_TOOL_CATALOG entry (Identity category) + _RUNNERS mapping; tool surfaces in the web UI sidebar and the AI chat tool-use path.

Version + docs

  • Version bumped to 2.20.0 across openosint/__init__.py, pyproject.toml, and .mcp/server.json (both the top-level version and the package entry).
  • README footer was at 2.19.0 and is now in sync with the rest. _VERSION in web_server.py was already at 2.20.0.
  • .env.example: new HUDSONROCK_API_KEY entry with a comment explaining the public-vs-commercial-tier behavior.
  • README.md: feature line (16 tools17), env-var table row, Integrations table row pointing at hudsonrock.com.
  • CHANGELOG.md: [2.20.0] entry under Added / Changed.

Drive-by cleanups inside the new file (flagging transparently — happy to split into a separate commit/PR if maintainers prefer)

  • _EMAIL_RE had [\w-]+ in the domain class, rejecting any multi-level domain (user@mail.example.com failed _is_valid_email()). Changed to [\w.-]+; the TLD [a-z]{2,} anchor and re.IGNORECASE flag are preserved. Note: the same flaw exists at openosint/web_server.py:757 inside _demo_chat_stream — left untouched here to keep this PR scoped to the integration; can be a follow-up.
  • _fetch_hudsonrock previously called _raise_for_status(resp.status) and then re-checked resp.status == 404 to return {}. The two checks were redundant: _raise_for_status silently returned on 404, then the caller detected 404 again. Collapsed into a single explicit 404 short-circuit in the caller; _raise_for_status now only raises, matching its name.

Test plan

# unit tests for the new tool — 25 tests
pytest tests/test_hudsonrock.py -v

# full suite, confirming no regression
pytest
# → 233 passed, 2 skipped (the 2 skipped depend on optional binaries)

# lint + format
ruff check openosint/      # → All checks passed!
ruff format --check openosint/   # → 31 files already formatted

# manual smoke against each interface
openosint hudsonrock user@example.com         # CLI, email shape
openosint hudsonrock example.com              # CLI, domain shape
openosint hudsonrock johndoe                  # CLI, username shape
openosint hudsonrock +14155552671             # CLI, phone-as-username shape
openosint                                     # then in REPL: "check infostealer exposure for example.com"
openosint web                                 # exercise via the web UI tool card and AI chat
python openosint/mcp_server.py                # invoke search_hudsonrock from an MCP client (Claude Desktop / Claude Code)

Adds search_hudsonrock, a new OSINT tool that queries Hudson Rock's
Cavalier API for credentials exposed via infostealer malware (RedLine,
Lumma, Raccoon, Vidar, StealC, ...). Auto-routes by input shape:
emails → /search-by-email, domains → /search-by-domain, usernames and
E.164 phone numbers → /search-by-username. No API key required; the
optional HUDSONROCK_API_KEY is sent as Bearer auth for commercial-tier
access.

Registers the tool across all six interface layers (agent loop, MCP
server, CLI subcommand, REPL display, web UI catalog) per the
integration checklist in CONTRIBUTING.md. Bumps version to 2.20.0
across openosint/__init__.py, pyproject.toml, and .mcp/server.json.
Adds .env.example entry, README env + Integrations table rows, and a
CHANGELOG entry. Closes OpenOSINT#4.
@SonoTommy
Copy link
Copy Markdown
Member

First off — this is a genuinely clean PR. Full integration checklist across all six layers, consistent version bump, 25 tests, and you flagged the _EMAIL_RE duplicate transparently instead of silently fixing it. Appreciated.
Two small changes before I merge:

The (redacted) label on top_logins — in _format_stealers the records print under Top logins (redacted):, but the formatter passes through whatever the API returns, and the test fixture has full addresses (user@example.com, admin@example.com). The label asserts a redaction the code doesn't actually perform. Either mask them in the formatter (e.g. first char of local-part + domain) or relabel to something accurate like Top logins (as returned by API):. For a tool we ship "for authorized security research only," I'd rather the wording match the behavior exactly.
CHANGELOG date — the [2.20.0] entry is dated 2026-06-06, but the commit and the README footer say June 5. Just align them.

On the web_server.py:757 duplicate of the same regex flaw — good call keeping it out of scope. I'll open a follow-up issue to track it (or happy to take a second PR if you want to grab it).
Everything else looks good to merge once those two land. Nice work.

- Rename "Top logins (redacted)" → "Top logins (as returned by API)".
  The formatter does no redaction itself; Hudson Rock's free tier already
  partial-masks server-side, but the test fixture and any future paid-tier
  response could carry unredacted logins under the old label. Honest label
  matches what the code does.
- Align [2.20.0] CHANGELOG date to 2026-06-05 to match the commit log and
  README footer.

Addresses review feedback on the PR.
@abdullahbilal64
Copy link
Copy Markdown
Author

Thanks for the quick and careful review and I've pushed both the fixes in the last commit. The redacted label was genuinely a very sharp thing to catch. It looked right end-to-end because the live API kept returning pre-masked strings during my testing, so the label happened to coincide with the behavior. I've gone with the option of fixing the statement and telling the user that we are just passing along whatever is returned by the API. In addition, I've also fixed the date in the CHANGELOG to match with the actual commit date.

By the way, I'm happy to take the lead on the follow up PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Hudson Rock Infostealer Intelligence Integration

2 participants

Morty Proxy This is a proxified and sanitized view of the page, visit original site.