Skip to main content
surfbot.

Scanning Pipeline

Deep dive into each phase of the Surfbot scanning pipeline.

Pipeline Overview

Every Surfbot scan executes a sequential pipeline of five phases. Each phase builds on the results of the previous one, creating a comprehensive picture of your external attack surface.

Scans are on-demand — you trigger them when you need them. Scheduled/continuous scanning is on the roadmap.

Phase 1: Discovery

Tool: subfinder → dnsx

Goal: Map the full extent of your external footprint.

The discovery phase uses multiple data sources and techniques in parallel:

TechniqueDescriptionTypical Yield
Passive DNSHistorical records from SecurityTrails, VirusTotal, etc.High
Certificate TransparencySubdomains from CT log entriesHigh
DNS Brute-forceDictionary of ~100k common subdomain namesMedium
PermutationGenerates variations of discovered namesMedium
Web crawlingExtracts linked domains/subdomains from responsesLow–Medium

Output: A deduplicated list of subdomains and their resolved IP addresses.

{
  "asset": "api.yourcompany.com",
  "type": "subdomain",
  "ips": ["93.184.216.34"],
  "source": ["ct_logs", "dns_bruteforce"],
  "first_seen": "2025-01-15T08:30:00Z"
}

Phase 2: Port Scanning

Tool: naabu

Goal: Identify open ports and running services on every discovered host.

We use a SYN-based scanner for speed, followed by service fingerprinting on open ports.

For each open port, we capture:

  • Port number and protocol
  • Service name and version (via banner grabbing)
  • TLS certificate details (if applicable)
{
  "host": "api.yourcompany.com",
  "port": 443,
  "protocol": "tcp",
  "service": "nginx/1.24.0",
  "tls": {
    "subject": "*.yourcompany.com",
    "issuer": "Let's Encrypt",
    "expires": "2025-04-15"
  }
}

Phase 3: HTTP Probing

Tool: httpx

Goal: Fingerprint web applications running on HTTP/HTTPS ports.

Every open port serving HTTP(S) is probed for:

  • Response metadata — Status code, headers, redirect chains
  • Technology stack — Framework, CMS, CDN, WAF detection (using Wappalyzer signatures)
  • Content analysis — Page title, favicon hash, body content hash

This phase often reveals forgotten staging environments, admin panels, and shadow IT that organizations didn't know existed.

Phase 4: Vulnerability Assessment

Tool: Nuclei (8,000+ official templates + 19 custom Surfbot templates)

Goal: Identify known vulnerabilities, misconfigurations, and exposures.

The templates executed depend on the scan profile you selected:

Scan Profiles

ProfileTags/Severity IncludedUse Case
PassiveTech detection, SSL/TLS, DNS, info-severitySafe recon — no intrusive checks
StandardMisconfigs, exposures, CVEs, secrets (excludes DoS, intrusive)Balanced assessment for most domains
DeepEverything except denial-of-service templatesComprehensive scan for domains you fully control

What Gets Checked

CVE Detection: Matches service versions against known CVE databases. For example, if Phase 2 identified Apache/2.4.49, the scanner flags CVE-2021-41773 (path traversal).

Misconfiguration Checks:

  • Open directory listings
  • Default credentials on admin panels
  • Exposed .env, .git/config, phpinfo() files
  • Missing security headers (HSTS, CSP, X-Frame-Options)

Secret Exposure: Scans response bodies and JavaScript files for:

  • API keys (AWS, GCP, Stripe, etc.)
  • Hardcoded tokens and passwords
  • Private keys and certificates

SSL/TLS Analysis:

  • Expired or self-signed certificates
  • Weak cipher suites
  • Protocol downgrade vulnerabilities (POODLE, DROWN)

Phase 5: Differential Analysis

Goal: Identify what changed since the last scan.

This is covered in detail in Differential Scanning. In short, every finding is compared against the previous scan to produce a clear delta:

  • New assets, ports, or vulnerabilities
  • Changed service versions, certificates, or configurations
  • Resolved findings that are no longer present

Timing

Domain SizeTypical Duration
Small (< 50 subdomains)5–10 minutes
Medium (50–500 subdomains)10–30 minutes
Large (500+ subdomains)30–90 minutes

Scans run in parallel where possible. You'll see results stream in as each phase completes — you don't have to wait for the full pipeline to finish.

On this page