The pod was running for eleven minutes. It spun up, executed a reverse shell, exfiltrated a handful of Kubernetes secrets, and was gone before the on-call engineer even acknowledged the PagerDuty alert. By the time the incident response team opened a terminal, kubectl get pods returned nothing. The node looked clean. The SIEM had a few incomplete events. And someone in the bridge was asking whether they actually had evidence of anything at all.

cover

In brief

  • Pod deletion does not erase all evidence: critical traces remain on node, control plane, and network layers.
  • Kubernetes forensics requires fast action before short retention windows remove logs and metadata.
  • Kubelet, containerd, audit logs, etcd, and CNI telemetry together rebuild attacker activity paths.
  • A reliable workflow depends on preservation order, scope control, and timeline correlation.

What actually disappears when a pod dies

Before cataloguing what survives, it helps to be precise about what is genuinely lost. A container is fundamentally a Linux process with an isolated namespace (mount, pid, net, ipc, uts, user) and a set of cgroups controlling resource access. The container runtime, usually containerd or CRI-O in modern clusters, layers a read-write filesystem on top of an image using overlay2 (or overlayfs). When the container stops and is removed, the overlay filesystem is unmounted and the upper layer (which contains all writes made during the container’s lifetime) is deleted. The container runtime details are documented in the containerd architecture docs and the CRI-O project docs.

So what is genuinely gone:

  • The writable overlay layer: any files created, modified, or deleted inside the container during its run
  • The in-memory process state (obviously)
  • Any ephemeral volumes that were not backed by persistent storage
  • emptyDir volumes if the pod is deleted (not just restarted)

What this means in practice: if an attacker downloaded a tool, compiled something, or wrote a payload to /tmp, that filesystem evidence is likely gone once the pod is deleted. This is the scenario that panics most IR teams. But the forensic surface extends well beyond the container’s own filesystem.

The node is the crime scene

Every Kubernetes worker node is a Linux machine, and Linux machines are thorough in their logging. The runtime, the kernel, and the node-level daemons all generate artifacts that persist independently of the pod lifecycle.

containerd’s shim logs and state directory are the first stop. The containerd-shim process that manages each container writes state information to /run/containerd/. Even after a container exits, the containerd state may retain metadata about the container ID, its image, its creation timestamp, and its exit code. The state directory structure looks like this:

# On the node, inspect containerd state
ls /run/containerd/io.containerd.runtime.v2.task/k8s.io/

# Each subdirectory corresponds to a container ID
# Even exited containers may retain their directory briefly
cat /run/containerd/io.containerd.runtime.v2.task/k8s.io/<container-id>/log.json

The kubelet log is arguably the most durable artifact you have. The kubelet running on each node records pod creation, container starts, restarts, and terminations including exit codes and timestamps. On a systemd-based node:

journalctl -u kubelet --since "2026-05-28 10:00:00" --until "2026-05-28 12:00:00" \
  | grep -E "(Started|Created|Stopped|Killing)" \
  | grep <pod-name-or-partial-id>

The kubelet also writes detailed logs to /var/log/pods/ in a directory structure keyed on namespace, pod name, and a UID that is immutable for the lifetime of the pod object. Crucially, these logs survive pod deletion until the node’s log rotation policy removes them. The broader behavior is aligned with the Kubernetes logging architecture:

/var/log/pods/<namespace>_<pod-name>_<pod-uid>/
└── <container-name>/
    ├── 0.log
    ├── 1.log # restart #1
    └── 2.log # restart #2

Each log file contains the container’s stdout/stderr with RFC3339Nano timestamps. If the attacker’s reverse shell printed anything, if the exploit produced output, if an injected process wrote to stderr, it is in these files. The pod UID in the directory name is also linkable back to the Kubernetes API audit log.

The overlay2 directories on the node deserve attention even after container removal. Docker and containerd use /var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/ to store image layers. The lower layers (the image itself) remain cached on the node until explicitly pruned. If an attacker modified a legitimate image or if the malicious workload was built from a known base, the image layers are still there and can be mounted and examined:

# Find the snapshot directory for a specific image
ctr -n k8s.io snapshots ls | grep <image-name>

# Mount a snapshot for read-only examination
ctr -n k8s.io snapshots mount /mnt/snapshot-exam <snapshot-key>
ls /mnt/snapshot-exam

Kernel audit logs via auditd or eBPF-based tools like Falco or Tetragon are the most forensically rich source if they were enabled before the incident. The kernel does not care about container boundaries: a ptrace syscall, a connect() to an external IP, an execve() of a suspicious binary inside a container will all appear in the audit log with the container’s pid. Correlating these events with the pod UID requires cross-referencing the cgroup hierarchy:

# Identify which cgroup a suspicious PID belongs to
cat /proc/<pid>/cgroup

# Output example:
# 0::/kubepods/burstable/pod<pod-uid>/<container-id>
# This gives you the pod UID directly from the kernel's view

The Kubernetes control plane as an evidence store

The cluster itself, independent of any individual node, generates substantial forensic evidence through the Kubernetes Audit Log. This is not enabled by default in many managed Kubernetes services, which is a significant gap, but when it is configured correctly it becomes a high-value evidence source. Kubernetes documents both audit logging and audit policies.

The audit log captures every request to the Kubernetes API server: pod creation, secret access, RBAC changes, exec into pods, port-forwards. Each event includes the user or service account, the source IP, the timestamp, the resource accessed, and the HTTP verb. A typical event for a compromised service account accessing secrets looks like:

{
  "apiVersion": "audit.k8s.io/v1",
  "kind": "Event",
  "level": "Metadata",
  "timestamp": "2026-05-28T10:14:32.441Z",
  "auditID": "a7c3f1e2-...",
  "stage": "ResponseComplete",
  "requestURI": "/api/v1/namespaces/production/secrets/db-credentials",
  "verb": "get",
  "user": {
    "username": "system:serviceaccount:production:web-app",
    "groups": ["system:serviceaccounts", "system:authenticated"]
  },
  "sourceIPs": ["10.0.3.47"],
  "userAgent": "python-requests/2.31.0",
  "objectRef": {
    "resource": "secrets",
    "namespace": "production",
    "name": "db-credentials"
  },
  "responseStatus": {"code": 200}
}

A service account calling python-requests to enumerate secrets should stand out immediately. The audit log also records exec events (pods/exec resource, verb create) which are among the most forensically significant actions an attacker can take:

# Query audit logs for exec events in the last 24 hours
# Assuming logs are forwarded to a SIEM or stored as JSON files
grep '"resource":"pods"' audit.log | grep '"subresource":"exec"' | \
  jq '{time: .timestamp, user: .user.username, pod: .objectRef.name, ns: .objectRef.namespace}'

Beyond the audit log, the etcd database is the authoritative store for all cluster state. Even if a pod is deleted, etcd may retain the deleted object for a short window depending on the compaction interval. More usefully, if you have etcd snapshots (which any well-operated cluster should be taking regularly), you can restore them to a separate cluster and reconstruct the state at any point in time. The authoritative operational steps are in the etcd disaster recovery guide:

# Restore an etcd snapshot to inspect historical state
ETCDCTL_API=3 etcdctl snapshot restore /backup/etcd-snapshot-20260528-1000.db \
  --data-dir /tmp/etcd-forensics

# Start a local etcd and query it
# Then use kubectl against the restored state
kubectl --kubeconfig=/tmp/forensic-kubeconfig get pods --all-namespaces

Kubernetes Events (the kubectl get events kind) are also worth extracting. They are stored in etcd with a default TTL of one hour, but if you move quickly or if events are being aggregated to a log platform, they record OOMKilled events, failed image pulls, and scheduling decisions that help reconstruct the timeline.

Network forensics in a container environment

Container networking introduces layers of indirection that complicate traffic analysis. In a Kubernetes cluster, traffic between pods typically flows through a CNI plugin (Calico, Cilium, Flannel), which may use VXLAN tunnels, eBPF programs, or iptables rules. However, the node’s network interfaces still carry all traffic, and several forensic approaches remain effective.

If the cluster runs Cilium, you may have access to Hubble, its observability layer, which captures flow-level metadata between pods (source pod, destination pod, destination IP, protocol, verdict) with minimal performance overhead. Hubble data does not require tcpdump and survives pod deletion because it is aggregated at the CNI level. See the Cilium Hubble documentation for flow schema and retention options:

# Query Hubble for flows involving a specific pod IP in the past hour
hubble observe --from-ip 10.0.3.47 --since 1h --output json | \
  jq 'select(.flow.verdict == "FORWARDED") | {time: .time, dst: .flow.destination, proto: .flow.l4}'

For clusters without eBPF-based CNI observability, iptables connection tracking on the node provides a more primitive but still useful picture. The conntrack table records established connections by source/destination IP and port. It is ephemeral but survives container deletion as long as the connection entry has not been flushed:

# Dump all conntrack entries on the node
conntrack -L 2>/dev/null | grep -v "CLOSE_WAIT" | grep "ESTABLISHED"

# Filter for suspicious external connections
conntrack -L | awk '$5 ~ /dst=/ {print}' | grep -v "10\." | grep -v "172\." | grep -v "192\.168\."

DNS queries are particularly valuable. In most Kubernetes setups, all pod DNS traffic flows through CoreDNS, and CoreDNS can be configured to log queries via the log plugin. If DNS logging was active, querying CoreDNS logs for the source IP of the compromised pod reveals C2 domain lookups and data exfiltration patterns that would be invisible from the application-level audit trail.

Live response with ephemeral containers

One of the most powerful tools in the modern Kubernetes forensics toolkit is something that did not exist in older cluster versions: ephemeral containers, introduced as stable in Kubernetes 1.25. An ephemeral container can be injected into a running pod’s namespaces without modifying the pod spec, making it the container equivalent of ssh + memory acquisition. The behavior and constraints are documented in the Kubernetes ephemeral containers docs.

If a suspicious pod is still running (not yet deleted), you can attach a forensic container sharing its PID, network, and filesystem namespaces:

# Inject a forensic container into a running pod
kubectl debug -it <pod-name> \
  --image=nicolaka/netshoot \
  --target=<container-name> \
  -- /bin/bash

# Inside the forensic container, you are sharing namespaces:
# List all processes in the target container
ps auxf

# Examine open network connections
ss -tulpn

# Dump process memory for a specific PID
cat /proc/<pid>/maps
dd if=/proc/<pid>/mem of=/tmp/memdump bs=4096 skip=... count=... 2>/dev/null

# Check for deleted files still held open by processes
ls -la /proc/<pid>/fd | grep "(deleted)"

The last technique is particularly important. An attacker who downloads a malicious binary, executes it, and then deletes the file on disk to cover tracks will still have that file descriptor open in /proc/<pid>/fd. You can recover the binary directly:

# Recover a deleted-but-still-open executable
cp /proc/<suspicious-pid>/exe /tmp/recovered-binary
file /tmp/recovered-binary
sha256sum /tmp/recovered-binary
# Submit hash to VirusTotal, or run through YARA rules

This is essentially the container analog of carving a process’s executable from memory, and it works as long as the process is still running.

First 24 hours response playbook

If you detect suspicious behavior and the pod has already disappeared, a strict time-boxed workflow is often the difference between a complete case and a speculative report.

  1. In the first 15 minutes, isolate affected nodes, snapshot node disks if possible, and export kubelet plus runtime logs.
  2. Within 60 minutes, pull audit logs, cluster events, and CNI flow metadata into immutable storage.
  3. Within 4 hours, correlate pod UID, container ID, service account activity, and external destinations into a single timeline.

Use this quick artifact matrix to prioritize collection:

Artifact Typical retention Volatility Collection priority
/var/log/pods/* container logs Hours to days Medium Immediate
Kubelet journal Hours to days Medium Immediate
API audit log Platform dependent Low to medium Immediate
CNI or flow telemetry Minutes to days High Immediate
etcd snapshots Backup policy dependent Low High

For timeline-driven investigations, you can also reuse techniques from Forensic Timeliner 2.2 to normalize evidence from mixed sources.

What a hardened cluster looks like from a forensic readiness perspective

The practical conclusion of any container forensics investigation is that the quality of the evidence you recover is almost entirely determined by decisions made before the incident. An investigator arriving at a cluster where audit logging was disabled, Falco was not running, pod logs were not centralized, and no CNI-level observability existed will have an extraordinarily difficult time proving anything beyond “yes, a container ran.”

A cluster configured with forensic readiness in mind looks different. API server audit logging is enabled with a policy that records at least Metadata level for all resources and RequestResponse level for secrets, exec, and portforward. Falco or Tetragon runs as a DaemonSet on every node, capturing syscall-level events that survive pod deletion, and both projects provide implementation guidance in the Falco docs and Tetragon docs. Pod logs are forwarded to an immutable log store (not just stored on the node). etcd snapshots are taken every few minutes and retained for at least 30 days. And network flow metadata, whether via Hubble, VPC flow logs, or a service mesh like Istio, is stored externally.

None of this is exotic. Most of it ships with mature Kubernetes distributions or is available as open-source DaemonSets. The gap is almost never technical; it is organizational. Container environments are built for speed and scale, and forensic readiness tends to be treated as someone else’s problem until the pod runs for eleven minutes and disappears. This is one of the same systemic tensions discussed in cloud forensics and jurisdictional constraints and in practical examples of Kubernetes compromise campaigns.

The eleven minutes leave traces. Whether those traces are enough to reconstruct what happened, identify the attacker, and respond effectively depends entirely on whether you decided to keep them.

FAQ

Is investigation still possible after pod deletion?

Yes, if you collect surviving node and control-plane artifacts quickly and in the right order.

Which source usually provides the strongest sequence evidence?

Kubernetes audit logs plus node runtime artifacts typically offer the clearest action chronology.

What is the biggest operational mistake?

Waiting too long before preservation, which causes irreversible loss due to retention and rotation policies.