Systemd provides many configuration settings to reduce privileges and restrict
access of a service and thus harden the service against potential
vulnerabilities. However, these settings are scattered throughout the
documentation making them more difficult to find than necessary. In addition,
the commonly suggested settings are not enough to restrict privileged
processes (running as root or with special capabilities) because they can
still access sensitive files like private keys (e.g. in
/etc/) or sockets
|One remaining limitation of this setup is that privileged processes can
still send signals to any other process. If possible don’t run services as
root but instead as a separate user (
Group). Required capabilities
can be granted even for non-root processes with
The following configuration snippet is a collection of all relevant hardening
options I could find, followed by a short explanation what they do and how
they are useful; see the systemd man pages (
man systemd.directives) for
details. These settings work at least since Debian Buster (systemd 241),
except where otherwise noted.
systemd provides the
systemd-analyze security command to check if a service
is restricted. It does not take all possible hardening settings into account
but gives a good overview which services require further hardening.
# Permit AF_UNIX for syslog(3) to help debugging. (Empty setting permits all
# families! A possible workaround would be to blacklist AF_UNIX afterwards.)
SystemCallFilter=~@aio @chown @clock @cpu-emulation @debug @keyring @memlock @module @mount @obsolete @privileged @raw-io @reboot @resources @setuid @swap userfaultfd mincore
# Only available in Debian bullseye or later
# Restrict access to potential sensitive data (kernels, config, mount points,
# private keys). The paths will be created if they don't exist and they must
# not be files.
TemporaryFileSystem=/boot:ro /etc/luks:ro /etc/ssh:ro /etc/ssl/private:ro /media:ro /mnt:ro /run:ro /srv:ro /var:ro
# Permit syslog(3) messages to journald
|All settings marking mounts as read-only (e.g.
ReadOnlyPaths) cannot protect mount points created after the service was
started (see the systemd man page of
ReadOnlyPaths for details).
All path based restrictions (e.g. from previous paragraph or
TemporaryFileSystem) can be undone by a privileged process with the ability
to perform mount syscalls. The
settings above prevent this but one should be aware of this potential issue.
When restricting existing services I use
systemctl edit $service to create
an override file with these settings (or I put the file manually at the
appropriate place). This way my settings override the default restrictions of
the service and are kept during system updates.
After this block of default options, specific settings can be changed or
extended. For example
PrivateUsers is often too strict, adding
PrivateUsers=no after this block will restore the default. Or to permit
access to keyring syscalls one can add
SystemCallFilter=@keyring. Having the
default options first followed by service-specific modifications makes it easy
to update the default settings of multiple service files.
CapabilityBoundingSet restricts the capabilities (
man 7 capabilities) of
this service; setting it to empty removes all capabilities. Capabilities
permit more fine-grained permissions, for example
creating raw network sockets without being
LockPersonality prevents changing the “process execution domain” (
personality), a rarely used feature with potential bugs.
MemoryDenyWriteExecute prevents memory mappings which are both writable and
executable to hinder (simple) exploits.
NoNewPrivileges prevents the process from gaining any additional privileges
during exec (
man 2 execve), for example when running setuid or setcap
Private* provides a separate instance of the named feature to the process.
This way, devices (
PrivateDevices), mounts (
PrivateTmp) and users (
PrivateUsers) are isolated from the regular system
and cannot be modified by the process. In the case of temporary directories
this also protects the process against other users of the system as for
example TOCTOU (time-of-check time-of-use, Wikipedia) races in
/tmp/ can no longer attack the process. These settings use Linux’s
man 7 namespaces) to provide isolation.
Protect* restricts access to the named features. This prevents the process
from modifying cgroups (
man 7 cgroups,
ProtectControlGroups), sysctls and
other kernel tunables in
kernel modules (
ProtectKernelModules) and the hostname (
ProtectHome=yes (other values are possible) makes
ProtectSystem=strict (other values are possible)
mounts the whole file system hierarchy read-only (except for
/sys/; those are protected by
ReadWritePaths can be used
to give write-access selectively. Most of these settings are also implemented
Restrict* also restricts access to the named feature. This controls the
available support for address families (
RestrictNamespaces), real-time scheduling (
RestrictRealtime) and setting
suid/guid bits on files/directories (
RestrictSUIDSGID). Note that setting
RestrictAddressFamilies to the empty value permits all address families!
This is unlike other options where the empty value is the most restrictive.
SystemCallFilter restricts access to syscalls via seccomp (
man 2 seccomp).
First the setting is reset to the default (first line), then the systemd
defaults for services is permitted (second line), followed by the removal of
additional syscalls which should not be necessary for most services (third
line). Two extra syscalls are blacklisted:
userfaultfd which can be used to
help exploiting timing sensitive attacks and
mincore which can leak kernel
TemporaryFileSystem mounts tmpfs (read-only with
:ro suffix, other
settings possible) over the specified directories. This is similar to
InaccessiblePaths which also prevents access to the directory contents but
TemporaryFileSystem permits nesting to give access to sub-directories. In
the example this is used with
BindReadOnlyPaths to permit logging to syslog.
To give write access to sub-directories use
BindPaths in combination with
This restrictive use of
TemporaryFileSystem is especially important for
privileged processes which still have access to all root-owned files even with
all the other restrictions from above. As this often includes private keys
restricting access via
TemporaryFileSystem is very useful.