Sockmap socket interface
Attention
The sockmap socket interface extension is experimental and is currently under active development.
Note
This feature is only supported on Linux and requires a kernel 4.18 or later.
The sockmap socket interface accelerates same-host TCP hops by loading eBPF sock_ops and
sk_msg programs that redirect payloads between local sockets through a
BPF_MAP_TYPE_SOCKHASH, bypassing the kernel TCP/IP stack. Connections whose peer is not on the
same host are absent from the map and transparently fall back to TCP/IP, so behavior is unchanged
for traffic that cannot be accelerated.
Loading and attaching the eBPF programs requires CAP_SYS_ADMIN, or CAP_BPF and
CAP_NET_ADMIN on newer kernels. When the programs cannot be loaded or attached, the interface
logs the failure and every socket falls back to the standard datapath, so traffic is never
interrupted.
Building Envoy with sockmap support
The eBPF datapath is only compiled when Envoy is built with --define=sockmap=enabled, which
links libbpf:
bazel build --define=sockmap=enabled //source/exe:envoy-static
Default builds compile a no-op stub instead. The extension is still registered, so configuring
bpf_program_path in a default build logs a warning and leaves every socket on the standard
datapath.
Compiling the eBPF object
Envoy does not ship a compiled eBPF object. The sock_ops and sk_msg programs and the user
space registration must share the same map name and key layout. The program source is part of this
extension at
sockmap_kern.c, and
Envoy provides a build rule that compiles it into sockmap_kern.o with clang:
bazel build //source/extensions/network/socket_interface/sockmap:sockmap_bpf
Compiling the object requires clang and the libbpf development headers on the host. Point
bpf_program_path at the
resulting object, or at a custom build that exports the envoy_sockops and envoy_sk_msg
programs and the envoy_sockhash map under those names with a matching key layout.
Example configuration
Register the socket interface as a bootstrap extension and select it as the default socket interface:
bootstrap_extensions:
- name: envoy.extensions.network.socket_interface.sockmap
typed_config:
"@type": type.googleapis.com/envoy.extensions.network.socket_interface.sockmap.v3.Sockmap
bpf_program_path: /etc/envoy/sockmap_kern.o
default_socket_interface: "envoy.extensions.network.socket_interface.sockmap"
The example accelerates proxy-to-proxy hops, which is the default. Accelerating application-to-proxy
hops additionally requires setting cgroup_path.
How it works
When bpf_program_path points at an object Envoy can load, the interface accelerates two kinds
of same-host hops, which can be enabled independently.
Application-to-proxy hops
When cgroup_path is set,
Envoy attaches the sock_ops program to that cgroup v2 directory. Every socket that reaches the
established state inside the cgroup is added to the sockhash, which accelerates hops between
applications and Envoy that run in the same cgroup. Prefer a narrowly scoped cgroup over a broad
one such as the root. When the cgroup must be broad, set accelerated_ports to the
proxy listener port ranges so only connections to or from those ports are registered, leaving
unrelated same-host connections in the cgroup on the standard datapath. If cgroup_path is not
set, the sock_ops program is not attached and application sockets are not tracked.
Proxy-to-proxy hops
When register_user_space_sockets
is true, which is the default, Envoy registers its accepted, connected, and duplicated sockets
into the sockhash from user space. This is independent of cgroup_path and accelerates
proxy-to-proxy hops on the same host without attaching the sock_ops program. The matching entry
is removed when the socket closes, so a later connection that reuses the tuple is never redirected
into a stale entry.
For either path, the sk_msg verdict program looks up the peer of each send in the sockhash
and, when the peer is present, redirects the payload straight to its ingress queue with
bpf_msg_redirect_hash. Only IPv4 stream sockets are accelerated. Other sockets, including IPv6
and Unix domain sockets, use the standard datapath unchanged. The sockhash holds one entry per
accelerated socket, up to sockhash_max_entries.
Note
When the programs load successfully, the interface logs sockmap acceleration enabled using
<path> at the info level.