Best practices for "hardening" an RPi for embedded applications (unattended operation)?

Question

This is a very general question, but I'm sure others will want to know the answer...

I'm using headless RPi's for a remote monitoring application. These RPi's will be buried in utility closets and generally inaccessible places. I already have written Python code that POSTs readings to our server, and I'm trapping (almost) all of the errors that would cause my app to crash.

What tools and techniques would you recommend to maximize the app's uptime "no matter what" (e.g. restart the app if it gets hung, reboot the machine if that doesn't work), and diagnose problems when the system crashes?

Note: I have a vague sense that init.d lets me register the application as a service that can be started and stopped, and update-rc.d will launch the application at startup. And I can use syslog and syslog-ng to log errors remotely. But what's the best way to create an app-level watchdog and a system level watchdog?

Anything else I should be thinking about?

score 6 · Answer 1 · answered Sep 06 '14 at 21:51

For your system-level watchdog, I was going to suggest building a hardware watchdog. There are connections on the RPi board (the P6 header) which you can short to reset the device. You could connect this to a countdown timer chip, and have your app regularly use the GPIO pins to reset the timer. If your app stopped working for too long, the countdown would finish and reset the system.

While that might be a fun electronics project, it looks like there is an easier way though: it turns out that the RPi's system-on-a-chip already has a built-in hardware watchdog timer, with Linux drivers available to let you use it. There's a kernel module that makes the system reboot after one minute without hearing a heartbeat, and a watchdog daemon that sends the heartbeat every 10 seconds. The daemon can do various checks, to make sure that the operating system and your app are still alive. If the checks fail, it will let the countdown expire, and the Pi reboots.

Instructions are on Ricardo's Workbench and Gadgetoid.com. There's also the watchdog(8) man page.

score 3 · Accepted Answer · answered Sep 06 '14 at 13:23

The monit tool is good for monitoring and restarting services.

If you are using a Debian based distribution then it should be in the package repository that your Raspberry Pi is using. See the main Debian monit packages for links to some useful information about monit but you will need to download from you R-Pi repository.

If you have created init.d scripts for starting/stopping and status monitoring of your application then it is easy to configure monit to manage and restart that service. You just need to drop an appropriate configuration fragment for your app into a file into directory /etc/monit/monitrc.d/.

Monit ships with some samples that show how to configure the service. For example the ssh daemon can be monitored with something along the following lines.

check process sshd with pidfile /var/run/sshd.pid
   start program  "/etc/init.d/ssh start"
   stop program  "/etc/init.d/ssh stop"
   if failed port 22 protocol ssh then restart
   if 5 restarts within 5 cycles then timeout

The Debian wiki LSBInitScripts page gives information on how to structure/write init.d support for an application.

Note the monit website seems to prominently mention a commercial version of monit but the open source version in Debian is definitely freely available.

Best practices for "hardening" an RPi for embedded applications (unattended operation)?

2 Answers2