3

I have a daemon I've created which fires up a bunch of subcomponent threads. If I use the "systemctl stop" command I catch the SIGTERM and usually have everything closed down cleanly in 2-6 seconds (my threads need to be closed sequentially mostly).

However if the daemon is shutdown as a result of a "shutdown" or "reboot" command I see in my log that it appears that it's getting killed (SIGKILL?) within a second or two of the SIGTERM. I tried adding "TimeoutStopSec=20" to my ".service" file as well as uncommenting "DefaultTimeoutStopSec=90s" in the "system.conf" file with no change in behavior (which makes sense since it seems 90s is the default anyway).

Here is the current ".service" file -

[Unit]
Description=XXXXXX XXXXXXX
After=network-online.target
Requires=network-online.target
After=bluetooth.service
Requires=bluetooth.service

[Service]
Type=forking
KillSignal=SIGTERM
KillMode=process
PIDFile=/var/run/XXXXXXXXXX.pid
ExecStart=/usr/local/bin/XXXXXXXXXX

The app itself does the normal daemon stuff, i.e. fork, set up sigterm handling, close inherited file descriptors, and write a pid file.

I'm running Stretch 9.9 Lite from a month or so ago. So is there some other setting that controls the shutdown/reboot process? I'm just not finding anything else that clearly states how the shutdown process would be different from the systemctl stop process (if it is).

Update - I've tried adding "SendSIGKILL=no" into the "Service" section of my .service file as well as "RequiresMountsFor=/var/log" into the "Unit" section. No difference unfortunately. The log my daemon creates get's halfway thought it's normal shutdown process and stops writing.

Update - I changed the "After=" and "Requires=" to point to multi-user.target and it now stops cleanly. I assume that implies that something my application needed was being closed/shutdown before I was fully stopped. Not sure if that's an okay thing to do (can't find many examples of people doing it via google) but my daemon runs in the background with no user intervention so I assume it's okay? Obviously I'd prefer to know exactly what the issue was but I can't really think of anything other than network/sockets, bluetooth, and the mount I mention above...

Ingo
  • 42,961
  • 20
  • 87
  • 207
Chrisby
  • 31
  • 2

2 Answers2

1

Your Unit file does not have an [Install] section. This makes your service a static Unit. Static Units are not started by itself and cannot be started/stopped with systemctl. They are started/stopped only by dependencies on other Units and will never start if there is no dependency on another Unit. So your program will start/stop only in direct conjunction with the network-online.target and the bluetooth.service. This is obviously not the correct order.

You changed the After= and Requires= to point to [multi-user.target] and it now stops cleanly. This is only a workaround to have the static Unit to start/stop later. But it is not the correct way to start/stop an independent service like your application and may have other problems. You should have an [Install] section to make the service a full manageable service that can be started and stopped with systemctl. It could look like this:

[Unit]
Description=XXXXXX XXXXXXX
After=network-online.target
Requires=network-online.target
After=bluetooth.service
Requires=bluetooth.service

[Service]
Type=forking
KillSignal=SIGTERM
KillMode=process
PIDFile=/var/run/XXXXXXXXXX.pid
ExecStart=/usr/local/bin/XXXXXXXXXX

[Install]
WantedBy=multi-user.target
Ingo
  • 42,961
  • 20
  • 87
  • 207
0

I see in my log that it appears that it's getting killed (SIGKILL?) within a second or two of the SIGTERM

Yes.

From man systemd.kill:

Processes will first be terminated via SIGTERM (unless the signal to send is changed via KillSignal=). Optionally, this is immediately followed by a SIGHUP (if enabled with SendSIGHUP=). If then, after a delay (configured via the TimeoutStopSec= option), processes still remain, the termination request is repeated with the SIGKILL signal (unless this is disabled via the SendSIGKILL= option).

Which makes it a bit mysterious why setting the timeout doesn't work, but you could still try the "unless" clause:

SendSIGKILL=

Specifies whether to send SIGKILL to remaining processes after a timeout, if the normal shutdown procedure left processes of the service around. Takes a boolean value. Defaults to "yes".

goldilocks
  • 60,325
  • 17
  • 117
  • 234