Sunday Aug 16, 2020 15:37, last edition on Sunday Aug 16, 2020 15:42
For my Linux containers, I am using LXD. The recommended way to install LXD is Snap, which updates daily. Few days ago, the last release of LXD has been released and the update started automatically. For the next two following days, I experienced timeout on my virtual servers. I also noticed at lot of input/output (I/O) on the hard drives. I started to investigate. I identified, using Htop among other tools like Netdata, that the process snapd, from Snap, was using 100% of the hard drives and all the I/O were on the /var/ partition. Using the command snap watch --last auto-refresh, I saw that the update stated more than two days ago and was stuck at the step Copy snap "lxd" data. I aborted the ongoing procedure but I ended up with a completely broken LXD. Hopefully, on Linux Containers forum, I found someone with the same, or at least similar enough, issue.
You can try to automate snap in offline mode. Make sure your host is not connected to the internet. This is definitely recommended for hypervisors and therefore also the LXD hosts. (my opinion for LXD hosts)
Download snap somewhere else with:
snap download lxd
Copy the files to your LXD node and install the snap:
snap ack <package.assert>
snap install <package.snap>
It saved my day! At least, what was left ot it. It is at least the second time that this issue happens to me and it appears to impact other user. Unfortunately, it is a sign that LXD is not ready for production environment which requires stability. It is also a reminder that I need to find a way to backup LXD.