Do you guys expose the docker socket to any of your containers or is that a strict no-no? What are your thoughts behind it if you don’t? How do you justify this decision from a security standpoint if you do?
I am still fairly new to docker but I like the idea of something like Watchtower. Even though I am not a fan of auto-updates and I probably wouldn’t use that feature I still find it interesting to get a notification if some container needs an update. However, it needs to have access to the docker socket to do its work and I read a lot about that and that this is a bad idea which can result in root access on your host filesystem from within a container.
There are probably other containers as well especially in this whole monitoring and maintenance category, that need that privilege, so I wanted to ask how other people handle this situation.
Cheers!
That sounds interesting, but I think I am following an approach where I don’t have to expose the socket at all and see how far I can get with that. If I ever have to expose it, this will definitely be something to come back to. Thanks for the suggestion!
If you mean updating the images themselves, I just use kubernetes and rolling updates. Works like a charm.
As for monitoring, kubernetes also handles that well. Liveness probes are kind of standard, then Prometheus for more intense monitoring.
If you don’t mind the extra overhead it would probably address these issues for you.
I have heard the name Kubernetes and know that is also some kind of container thing, but never went really deeper than that. It was more a general question how people handle the whole business of exposing the docker socket to a container. Since I came across it in Watchtower and considered installing that I used it as an example. I always thought that Kubernetes and Docker swarms and things like that are something for the future when I have more experience with Docker and containers in general, but thank you for the idea.
Sorry this doesn’t answer your question really but I’ve had issues when I used to auto update containers so stopped doing that. Some things have breaking changes, others just had issues in that release that caused me issues accessing stuff when not at home. I update every so often when I have ten minutes to do updates, check release notes and deal with any issues if they arise or roll back to that version. I spin up what’s up docker to see what’s changed then when finished, stop the container so it doesn’t keep on polling docker hub using my free allowance.
In short, it could be an option to spin it up, let it run, then stop the container so theres less risk it could be used for an attack.
Mounting the docker socket into Watchtower is fine from a security perspective, but automatic updates can definitely cause problems. I used to use Rennovate and it would open a pull request to update the version.
There are lots of articles out there that say the opposite. Not about Watchtower per se, but giving a container access to the socket is generally considered to be a bad idea from a security point of view.
Giving a container access to the docker socket allows container escapes, but if you’re doing it on purpose with a service designed for that purpose there is no problem. Either you trust Watchtower to manage the other containers on your system or you don’t. Whether it’s managing the containers through a mounted docker socket or with direct socket access doesn’t make a difference in security.
I don’t know if anybody seriously uses Watchtower, but I wouldn’t be surprised. I know that companies use tools like Argo CD, which has a larger attack surface and a similar level of system access via its Kubernetes service user.
Per this guide https://cheatsheetseries.owasp.org/cheatsheets/Docker_Security_Cheat_Sheet.html I do not. I have a cron/service script that updates containers automatically (‘docker compose pull’ I think) that I don’t care if they fail for a bit (pdf converter, RSS reader, etc.) or they’re exposed to the internet directly (Authentik, caddy).
Note that smart peeps say that the docker socket is not safe as read-only. Watchtower is inherently untenable sadly, so is Traefik (trusting a docker-socket-proxy container with giga root permissions only made sense to me if you could audit the whole thing and keep auditing with updates and I cannot). https://stackoverflow.com/a/52333163 https://blog.quarkslab.com/why-is-exposing-the-docker-socket-a-really-bad-idea.html
I then just have scripts to do the ‘docker compose pull’ for things with oodles of breaking changes (Immich) or things I’d care if they did break suddenly (paperless).
Overall, I’ve only had a few break over a few years - and that’s because I also run all services (per link above) as a user, read-only, and with no capabilities (that aren’t required, afaik none need any). And while some containers are well coded, many are not, and if an update makes changes that want to write to ‘/npm/staging’ suddenly, the read-only torches that until I can figure it out and put in a tmpfs fix. The few failures are worth the peace of mind that it’s locked the fuck down.
I hope to move to podman sometime to eliminate the last security risk - the docker daemon running the containers, which runs as root. Rootless docker seems to be a significant hassle to do at any scale, so I haven’t bothered with that.
Edit: this effort is to prevent the attack vector of “someone hacks or buys access to a well-used project (e.g., Watchtower last updated 2 years ago, commonly used docker socket proxy, etc.) which is known to have docker socket access and then pushes a malicious update that to encrypt and ransom your server with root access escalations from the docker socket”. As long as no container has root, (and the container doesn’t breach the docker daemon…) the fallout from a good container turned bad is limited to the newly bad container.
All true, wanted to add on to this:
Note that smart peeps say that the docker socket is not safe as read-only.
That’s true, and it’s not just something mildly imperfect, read-only straight up does nothing. For connecting to a socket, Linux ignores read-only mount state and only checks write permission on the socket itself. Read-only would only make it impossible to make a new socket there. Once you do have a connection, that connection can write anything it wants to it. Traefik and other “read-only” uses still have to send GET queries for the data they need, so that’s happening for legitimate use cases too.
If you really need a “GET-only” Docker socket, it has to be done with some other kind of mechanism, and frankly the options aren’t very good. Docker has authorization plugins that seem like too much of a headache to set up, and proxies don’t seem very good to me either.
Or TLDR:
:ro
or stripping off permission bits doesn’t do anything aside from potentially break all uses for the socket. If it can connect at all, it’s root-equivalent or has all privileges of your rootless user, unless you took other steps. That might or might not be a massive problem for your setup, but it is something you should know when doing it.Thank you for your comment and the resources you provided. I definitely look into these. I like your approach of minimizing the attack surface. As I said, I am still new to all of this and I came across the user option of docker compose just recently when I installed Jellyfin. However, I thought the actual container image has to be configured in a way so that this is even possible. Otherwise you can run into permission errors and such. Do you just specify a non-root user and see if it still works?
And while we’re at it, how would you setup something like Jellyfin with regards to read-write permissions? I currently haven’t restricted it to read-only and in my current setup I most certainly need write permissions as well because I store the artwork in the respective directories inside my media folder. Would you just save these files to the non-persisted storage inside the container because you can re-download them anyway and keep the media volume as read-only?
So I’ve found that if you use the
user:
option with auser: UserName
it requires the container to have that UserName alsoo inside. If you do it with a UID/GID, it maps the container’s default user (likely root 0) to the UID/GID you provideuser: 1500:1500
. For many containers it just works, for linuxserver (a group that produces containers for stuff) containers I think it biffs it - those are way jacked up. I put the containers that won’t play ball in a LXC container (via Incus GUI), or for simple permission fixes I just make a permissions-fixing version of the container (runs as root, but only executes commands I provide) to fill a volume with the data that has the right permissions then load that volume into the container. Luckily jellyfin doesn’t need that.I give jellyfin read-only access (via
:ro
in thevolumes:
) to my media stuff because it doesn’t need to write to it. I think it’s fine if your use-case needs:rw
, keep a backup (even if you:ro
!).Here’s my docker-compose.yml, I gave jellyfin its own IP with macvlan. It’s pretty janky and I’m still working it, but you can have jellyfin use your server’s IP by deleting everything after
jellyfin-nw:
(but keepjellyfin-nw:
!) in both thenetworks:
section andservices:
section. Delete themac:
in theservices:
section too. In theports:
part that10.0.1.69
would be the IP of your server (or in this case, what I declare the jellyfin container’s IP to be) - it makes it so the container can only bind to the IP you provide, otherwise it can bind to anything the server has access to (as far as I understand).And of course, I have GPU acceleration working here with some embeded Intel iGPU. Hope this helps!
# --- NETWORKS --- networks: jellyfin-nw: # In docker, `macvlan` gets similar stuff to driver: macvlan driver_opts: parent: 'br0' # mode: 'l2' name: 'doc0' ipam: config: - subnet: "10.0.1.0/24" gateway: "10.0.1.1" # --- SERVICES --- services: jellyfin: container_name: jellyfin image: ghcr.io/jellyfin/jellyfin:latest environment: - TZ=America/Los_Angeles - JELLYFIN_PublishedServerUrl=https://jellyfin.guzzlezone.local/ ports: - '10.0.1.69:8096:8096/tcp' - '10.0.1.69:7359:7359/udp' - '10.0.1.69:1900:1900/udp' devices: - '/dev/dri/renderD128:/dev/dri/renderD128' # - '/dev/dri/card0:/dev/dri/card0' volumes: - '/mnt/ssd/jellyfin/config:/config:rw,noexec,nosuid,nodev,Z' - '/mnt/cache/jellyfin/log:/config/log:rw,noexec,nosuid,nodev,Z' - '/mnt/cache/jellyfin/cache:/cache:rw,noexec,nosuid,nodev,Z' - '/mnt/cache/jellyfin/config-cache:/config/cache:rw,noexec,nosuid,nodev,Z' # Media links below - '/mnt/spinner/movies:/data/movies:ro,noexec,nosuid,nodev,z' - '/mnt/spinner/shows:/data/shows:ro,noexec,nosuid,nodev,z' - '/mnt/spinner/music:/data/music:ro,noexec,nosuid,nodev,z' restart: unless-stopped # Security stuff read_only: true tmpfs: - /tmp:uid=2200,gid=2200,rw,noexec,nosuid,nodev # mac address is 02:42 then 10.0.1.69 in hex for each # betwen the .s mapped to the :s in the mac address # its how docker assigns so there will never be a mac address collision mac_address: 02:42:0A:00:01:45 networks: jellyfin-nw: # Docker is pretty jacked up and can't get an IP via DHCP so manually specify it ipv4_address: 10.0.1.69 user: 2200:2200 # gpu capability needs render capability, see the # for your server with `getent group render | cut -d: -f3` group_add: - "109" security_opt: - no-new-privileges:true cap_drop: - ALL
Lastly thought I should add the external stuff needed for the hardware acceleration to work/get the user going:
# For jellyfin low power (LP) intel QSV stuff # if trouble see https://jellyfin.org/docs/general/administration/hardware-acceleration/intel/#configure-and-verify-lp-mode-on-linux sudo apt install -y firmware-linux-nonfree #intel-opencl-icd sudo mkdir -p /etc/modprobe.d sudo sh -c "echo 'options i915 enable_guc=2' >> /etc/modprobe.d/i915.conf" sudo update-initramfs -u sudo update-grub APP_NAME="jellyfin" APP_PID=2200 sudo useradd -u $APP_PID $APP_NAME
The Jellyfin user isn’t added to the render group, rather the group is added to the container in the docker-compose.yml file.
I have set all this up on my Asustor NAS, therefore things like apt install are not applicable in my use-case. Nevertheless, thank you very much for your time and expertise with regards to users and volumes. What is your strategy for networks in general? Do you setup a separate network for each and every container unless the services have to communicate with each other? I am not sure I understand your network setup in the Jellyfin container.
In the ports: part that 10.0.1.69 would be the IP of your server (or in this case, what I declare the jellyfin container’s IP to be) - it makes it so the container can only bind to the IP you provide, otherwise it can bind to anything the server has access to (as far as I understand). With the macvlan driver the virtual network driver of your container behaves like its own physical network interface which you can assign a separate IP to, right? What advantage does this have exactly or what potential problems does this solve?
I wanted Jellyfin on its own IP so I could think about implementing VLANs. I havent yet, and I’m not sure what I did is even needed. But I did do it! You very likely don’t need to do it.
There are likely guides on enabling Jellyfin hardware acceleration on your Asustor NAS - so just follow them!
I do try to set up separate networks for each service.
On one server I have a monolithic docker compose file with a ton of networks defined to keep services from talking to the internet or each other if it’s not useful (pdf converter is prevented from talking to the internet or the Authentik database, for example). Makes the most sense here, has the most power.
On this server I have each service split up with its own docker compose file. The network bit makes more sense on services that have an external database and other bits, it lets me set it up so only the service can talk to its database and its database cannot reach the internet at large (via adding a ‘internal: true’ to the networks: section). In this case, yes the pdf converter can talk to other services and I’d need to block its internet access at the router somehow.
The monolithic method gets more annoying to deal with with many services via virtue of a gigantic docker compose file and the up/down time (esp. for services that don’t acknowledge shutdown commands). But it lets me use fine-grained networking within the docker compose file.
For each service on its own, they expose a port and things talk to them from there. So instead of an internal docker network letting Authentik talk to a service, Authentik just looks up the address of the service. I don’t notice any difference in perceptible lag.
I am a strong believer in separate docker compose files to keep it more organized and hopefully have more control over everything. But in the end most of it comes down to personal preference.
I actually have some kind of network issues with one of my containers at the moment (Adguard in this case), where your ideas already came in handy. Unfortunately, I couldn’t solve it yet, but this is also something for a new topic I believe.
I use Watchtower just to notify me of the updates. So the docker socket is read-only.
Interesting. I just skimmed through the documentation again and couldn’t find anything about read-only. How did you set it up exactly? Just because it isn’t auto-updating i.e. writing something, doesn’t necessarily mean it doesn’t have write privileges.