docs/database.md

Database Documentation

The database package handles all persistent storage for the honeypot system. It uses DuckDB for efficient analytical queries and storage.

Database Engine: DuckDB

DuckDB is used as the primary database engine. It is an in-process SQL OLAP database management system, which makes it ideal for the analytical workloads required by the honeypot dashboard (e.g., aggregating millions of events by port, country, or time).

Schema Structure

The database consists of three main tables: honeypot_events, ips, and blocklist.

Table: honeypot_events

This is the primary table where all honeypot interaction events are recorded.

Column Type Description
id INTEGER Primary key, incremented via id_sequence.
time TIMESTAMPTZ Timestamp of the event.
type TEXT The type of honeypot (e.g., ssh, http, telnet).
event TEXT The specific event name (e.g., login_attempt, command_execution).
remote_addr TEXT The source IP address of the attacker.
remote_ip_int UBIGINT The source IP converted to an integer for high-performance range searches.
remote_port USMALLINT The source port of the connection.
dst_port USMALLINT The destination port (honeypot port) of the connection.
fields JSON A JSON blob containing event-specific metadata (e.g., credentials, headers).

Table: ips

This table stores enriched metadata for IP addresses encountered by the honeypot. It is populated by various background tasks (GeoIP lookup, ASN lookup, reverse DNS).

Column Type Description
ip TEXT The IP address (Primary Key).
ip_int UBIGINT The IP address as an integer.
country TEXT Two-letter ISO country code.
asn INTEGER Autonomous System Number.
asn_org TEXT Organization name associated with the ASN.
city TEXT City name.
latitude DOUBLE Geographical latitude.
longitude DOUBLE Geographical longitude.
fqdn TEXT Fully Qualified Domain Name (Reverse DNS).
domain TEXT Registered domain extracted from FQDN.
last_updated TIMESTAMPTZ When this record was last refreshed.

Data Gathering for IPs Table

The ips table is not populated directly during event insertion. Instead, a background worker process manages its data to avoid slowing down the honeypot's response time and to respect GeoIP rate limits (if any were used):

  1. Background Worker: A ticker runs every minute to identify IP addresses that either:
    • Do not yet exist in the ips table.
    • Exist but have metadata older than 24 hours.
  2. GeoIP and ASN: The system performs lookups using MaxMind GeoLite2 databases (ASN and City) stored locally.
  3. Reverse DNS (FQDN): A standard DNS PTR lookup is performed for each IP to find its Fully Qualified Domain Name.
  4. Domain Extraction: The registered domain (e.g., google.com from mail.google.com) is extracted from the FQDN using the Public Suffix List.
  5. Upsert: The gathered metadata is UPSERTed into the ips table, ensuring that each IP has only one highly-enriched record.

Table: blocklist

Records of IP addresses that have been dynamically blocked by the system.

Column Type Description
id INTEGER Primary key, incremented via blocklist_id_sequence.
address TEXT The blocked IP address or subnet.
timestamp TIMESTAMPTZ When the block was created.
expires TIMESTAMPTZ When the block expires.
reason TEXT Human-readable reason for the block.

For more details on how these entries are generated, see the Scoring and Blocking Rules.

Key Implementation Details

Performance Optimizations

  • IP to Integer: The remote_ip_int column in honeypot_events allows for very fast CIDR/Subnet matching using simple integer range queries (>= and <=).
  • Indexing: Indexes are maintained on type, event, remote_ip_int, dst_port, and time to ensure fast filtering on the dashboard.
  • Analytical Queries: The dashboard.go and stats.go files leverage DuckDB's aggregation performance to generate "Top N" lists across millions of records.

Dynamic JSON Queries

The database.go file implements a flexible querying system that can filter based on keys inside the fields JSON column using DuckDB's JSON extraction functions (e.g., json_extract_string(fields, '$.username')).

Geo Enrichement Joins

Most analytical queries join honeypot_events with the ips table to provide insights like "Top Countries for SSH attacks" or "Events from a specific ASN".

Usage in Code

To interact with the database, use the Database struct defined in internal/database/database.go.

db := database.NewDatabase("path/to/database.db")
defer db.Close()

// Inserting an event
db.InsertEvent(&types.LogEvent{...})

// Querying stats
stats, err := db.GetDashboardStats()