docs/database.md
Database Documentation
The database package handles all persistent storage for the honeypot system. It uses DuckDB for efficient analytical queries and storage.
Database Engine: DuckDB
DuckDB is used as the primary database engine. It is an in-process SQL OLAP database management system, which makes it ideal for the analytical workloads required by the honeypot dashboard (e.g., aggregating millions of events by port, country, or time).
Schema Structure
The database consists of three main tables: honeypot_events, ips, and blocklist.
Table: honeypot_events
This is the primary table where all honeypot interaction events are recorded.
| Column | Type | Description |
|---|---|---|
id |
INTEGER | Primary key, incremented via id_sequence. |
time |
TIMESTAMPTZ | Timestamp of the event. |
type |
TEXT | The type of honeypot (e.g., ssh, http, telnet). |
event |
TEXT | The specific event name (e.g., login_attempt, command_execution). |
remote_addr |
TEXT | The source IP address of the attacker. |
remote_ip_int |
UBIGINT | The source IP converted to an integer for high-performance range searches. |
remote_port |
USMALLINT | The source port of the connection. |
dst_port |
USMALLINT | The destination port (honeypot port) of the connection. |
fields |
JSON | A JSON blob containing event-specific metadata (e.g., credentials, headers). |
Table: ips
This table stores enriched metadata for IP addresses encountered by the honeypot. It is populated by various background tasks (GeoIP lookup, ASN lookup, reverse DNS).
| Column | Type | Description |
|---|---|---|
ip |
TEXT | The IP address (Primary Key). |
ip_int |
UBIGINT | The IP address as an integer. |
country |
TEXT | Two-letter ISO country code. |
asn |
INTEGER | Autonomous System Number. |
asn_org |
TEXT | Organization name associated with the ASN. |
city |
TEXT | City name. |
latitude |
DOUBLE | Geographical latitude. |
longitude |
DOUBLE | Geographical longitude. |
fqdn |
TEXT | Fully Qualified Domain Name (Reverse DNS). |
domain |
TEXT | Registered domain extracted from FQDN. |
last_updated |
TIMESTAMPTZ | When this record was last refreshed. |
Data Gathering for IPs Table
The ips table is not populated directly during event insertion. Instead, a background worker process manages its data to avoid slowing down the honeypot's response time and to respect GeoIP rate limits (if any were used):
- Background Worker: A ticker runs every minute to identify IP addresses that either:
- Do not yet exist in the
ipstable. - Exist but have metadata older than 24 hours.
- Do not yet exist in the
- GeoIP and ASN: The system performs lookups using MaxMind GeoLite2 databases (ASN and City) stored locally.
- Reverse DNS (FQDN): A standard DNS PTR lookup is performed for each IP to find its Fully Qualified Domain Name.
- Domain Extraction: The registered domain (e.g.,
google.comfrommail.google.com) is extracted from the FQDN using the Public Suffix List. - Upsert: The gathered metadata is UPSERTed into the
ipstable, ensuring that each IP has only one highly-enriched record.
Table: blocklist
Records of IP addresses that have been dynamically blocked by the system.
| Column | Type | Description |
|---|---|---|
id |
INTEGER | Primary key, incremented via blocklist_id_sequence. |
address |
TEXT | The blocked IP address or subnet. |
timestamp |
TIMESTAMPTZ | When the block was created. |
expires |
TIMESTAMPTZ | When the block expires. |
reason |
TEXT | Human-readable reason for the block. |
For more details on how these entries are generated, see the Scoring and Blocking Rules.
Key Implementation Details
Performance Optimizations
- IP to Integer: The
remote_ip_intcolumn inhoneypot_eventsallows for very fast CIDR/Subnet matching using simple integer range queries (>=and<=). - Indexing: Indexes are maintained on
type,event,remote_ip_int,dst_port, andtimeto ensure fast filtering on the dashboard. - Analytical Queries: The
dashboard.goandstats.gofiles leverage DuckDB's aggregation performance to generate "Top N" lists across millions of records.
Dynamic JSON Queries
The database.go file implements a flexible querying system that can filter based on keys inside the fields JSON column using DuckDB's JSON extraction functions (e.g., json_extract_string(fields, '$.username')).
Geo Enrichement Joins
Most analytical queries join honeypot_events with the ips table to provide insights like "Top Countries for SSH attacks" or "Events from a specific ASN".
Usage in Code
To interact with the database, use the Database struct defined in internal/database/database.go.
db := database.NewDatabase("path/to/database.db")
defer db.Close()
// Inserting an event
db.InsertEvent(&types.LogEvent{...})
// Querying stats
stats, err := db.GetDashboardStats()