Copy.Fail: When the Kernel Trusts Too Much
Sometimes you hit a vulnerability that isn’t just “bad”.
It’s clean.
Not elegant. Not pretty.
But clean in the way it slices straight through assumptions we’ve quietly depended on for years.
CVE-2026-31431 is one of those.
The shape of the problem
At a high level:
A logic flaw in the Linux kernel enables arbitrary page cache writes from an unprivileged context.
That sentence alone should make you pause.
Because page cache writes mean:
- modifying files without write permissions
- altering binaries owned by root
- bypassing normal filesystem integrity expectations
Now layer in:
- no race condition
- no offset dependency
- works inside containers
- tiny, reliable primitive
…and this stops being “just another LPE”.
This becomes a primitive.
The chain
This vulnerability isn’t a single mistake.
It’s a chain of perfectly valid components:
authenc(crypto subsystem)AF_ALG(crypto via sockets)splice()(zero-copy memory transfer)
Individually: fine.
Together:
user-controlled data ends up in the page cache of arbitrary files.
The trust boundary failure
We think in layers:
We assume:
- kernel enforces isolation
- filesystem enforces permissions
Reality here:
The kernel becomes the confused deputy.
Containers don’t save you
Typical container:
- non-root
- restricted capabilities
- shared kernel
Observed:
uid/euid: 1000 1000
AF_ALG: allowed
modules: authenc, algif_aead
That’s default behaviour.
Which means:
An unprivileged container can reach a vulnerable kernel path affecting host files.
Testing safely
Python check
import os, socket, platform
print("uid/euid:", os.getuid(), os.geteuid())
print("kernel:", platform.release())
print("machine:", platform.machine())
try:
s = socket.socket(socket.AF_ALG, socket.SOCK_SEQPACKET, 0)
print("AF_ALG: allowed")
s.close()
except Exception as e:
print("AF_ALG: blocked/unavailable:", repr(e))
print("\nmodules:")
os.system("grep -E 'algif|authenc|aead' /proc/modules || true")
Docker harness
sudo docker run -it --rm --user 1000:1000 -v "$PWD":/app -w /app python:3 python3 safe-check.py
Fast mitigation: seccomp
Create profile
cat > seccomp-block-af_alg.json <<'JSON'
{
"defaultAction": "SCMP_ACT_ALLOW",
"architectures": [
"SCMP_ARCH_AARCH64",
"SCMP_ARCH_ARM",
"SCMP_ARCH_X86_64",
"SCMP_ARCH_X86"
],
"syscalls": [
{
"names": ["socket"],
"action": "SCMP_ACT_ERRNO",
"args": [
{
"index": 0,
"value": 38,
"op": "SCMP_CMP_EQ"
}
]
}
]
}
JSON
Run with it
sudo docker run -it --rm --user 1000:1000 -v "$PWD":/app -w /app --security-opt seccomp="$PWD/seccomp-block-af_alg.json" python:3 python3 safe-check.py
Expected:
AF_ALG: blocked/unavailable
Why this matters
This is a perfect example of:
Remove the entry point → kill the exploit path
No reboot. No patch yet.
Just removing reachability.
But don’t stop there
Seccomp is containment, not remediation.
You still need:
- kernel patching
- module review (
algif_aead,authenc) - runtime hardening
- removal of privileged containers
The pattern (this is the real lesson)
This isn’t about crypto.
It’s about trusted component chaining.
Same pattern shows up everywhere:
- identity systems trusting client signals
- APIs exposing control planes
- SaaS platforms enabling unintended flows
- kernels assuming “safe paths”
Different layer.
Same failure.
TL;DR
- Arbitrary page cache write primitive
- Works from unprivileged containers
- Default environments exposed
- Seccomp provides fast mitigation
- Kernel patch required
Final thought
We talk about “container escape”.
But more often than not:
The container never contained anything
because the boundary wasn’t as strong as we thought
Kubernetes seccomp example
For Kubernetes, the same mitigation can be applied by shipping the seccomp profile to each node and referencing it from the pod security context.
Example profile path on the node:
/var/lib/kubelet/seccomp/profiles/seccomp-block-af_alg.json
Example pod:
apiVersion: v1
kind: Pod
metadata:
name: copyfail-check
spec:
securityContext:
seccompProfile:
type: Localhost
localhostProfile: profiles/seccomp-block-af_alg.json
containers:
- name: check
image: python:3
command: ["python3", "/app/safe-check.py"]
securityContext:
runAsUser: 1000
runAsGroup: 1000
allowPrivilegeEscalation: false
capabilities:
drop:
- ALL
volumeMounts:
- name: app
mountPath: /app
volumes:
- name: app
configMap:
name: copyfail-check-script
restartPolicy: Never
And the script as a ConfigMap:
apiVersion: v1
kind: ConfigMap
metadata:
name: copyfail-check-script
data:
safe-check.py: |
import os, socket, platform
print("uid/euid:", os.getuid(), os.geteuid())
print("kernel:", platform.release())
print("machine:", platform.machine())
try:
s = socket.socket(socket.AF_ALG, socket.SOCK_SEQPACKET, 0)
print("AF_ALG: allowed")
s.close()
except Exception as e:
print("AF_ALG: blocked/unavailable:", repr(e))
print("\nmodules:")
os.system("grep -E 'algif|authenc|aead' /proc/modules || true")
Apply and check logs:
kubectl apply -f copyfail-check-configmap.yaml
kubectl apply -f copyfail-check-pod.yaml
kubectl logs pod/copyfail-check
Expected mitigated result:
AF_ALG: blocked/unavailable
Important operational note: Localhost seccomp profiles are node-local. That means the JSON profile must exist on every node that may schedule the workload. In production, you would normally distribute it with your node image, bootstrap process, DaemonSet, or node management tooling.
Also, this protects workloads that actually use the profile. It will not protect privileged pods, host processes, or pods scheduled without the seccomp profile unless you enforce it through admission control or policy.
Enforcing seccomp with policy (Pod Security Admission / Kyverno)
You can move from “best effort” to enforced by requiring a seccomp profile at admission time.
Option 1 – Pod Security Admission (PSA)
Use the restricted profile (Kubernetes v1.25+), which requires seccomp:
apiVersion: v1
kind: Namespace
metadata:
name: workloads-secure
labels:
pod-security.kubernetes.io/enforce: restricted
pod-security.kubernetes.io/audit: restricted
pod-security.kubernetes.io/warn: restricted
This doesn’t force your specific profile, but it forces the use of seccomp (no more “unset”).
Option 2 – Kyverno (enforce your exact profile)
Require your AF_ALG-blocking profile on all pods:
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
name: require-seccomp-af-alg-block
spec:
validationFailureAction: enforce
rules:
- name: require-seccomp-profile
match:
resources:
kinds:
- Pod
validate:
message: "Pods must use the AF_ALG-blocking seccomp profile"
pattern:
spec:
securityContext:
seccompProfile:
type: "Localhost"
localhostProfile: "profiles/seccomp-block-af_alg.json"
You can extend this to also block privileged: true and require allowPrivilegeEscalation: false.
Distributing the seccomp profile (DaemonSet)
Because Localhost profiles are node-local, you need to ensure the JSON exists on every node.
A simple approach is a DaemonSet that writes the profile file to the kubelet seccomp directory.
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: seccomp-profile-distributor
namespace: kube-system
spec:
selector:
matchLabels:
app: seccomp-profile-distributor
template:
metadata:
labels:
app: seccomp-profile-distributor
spec:
hostPID: true
containers:
- name: installer
image: busybox
securityContext:
privileged: true
command:
- /bin/sh
- -c
- |
mkdir -p /host/seccomp/profiles
cat <<'EOF' > /host/seccomp/profiles/seccomp-block-af_alg.json
{
"defaultAction": "SCMP_ACT_ALLOW",
"architectures": [
"SCMP_ARCH_AARCH64",
"SCMP_ARCH_ARM",
"SCMP_ARCH_X86_64",
"SCMP_ARCH_X86"
],
"syscalls": [
{
"names": ["socket"],
"action": "SCMP_ACT_ERRNO",
"args": [
{
"index": 0,
"value": 38,
"op": "SCMP_CMP_EQ"
}
]
}
]
}
EOF
sleep infinity
volumeMounts:
- name: seccomp
mountPath: /host/seccomp
volumes:
- name: seccomp
hostPath:
path: /var/lib/kubelet/seccomp
type: DirectoryOrCreate
This ensures every node has:
/var/lib/kubelet/seccomp/profiles/seccomp-block-af_alg.json
Putting it together
- DaemonSet → ensures profile exists on every node
- Kyverno / PSA → ensures pods must use seccomp
- Pod spec → references your AF_ALG-blocking profile
Result:
The vulnerable kernel path becomes unreachable from workloads, even before patching.
Final note
This is the kind of control that scales:
- Fast to deploy
- Low disruption
- High impact
But still:
Patch the kernel.
Always patch the kernel.