3

We have produced a few hundred devices based on the compute module, running Raspbian wheezy. We are seeing random filesystem corruption relatively often. We've seen it on at least 10 devices out of about 200.

The symptoms are:

  • A device appears to work fine
  • After powering it down and powering up again it suddenly fails to boot because the boot partition or the Linux partition can't be mounted. There doesn't seem to be a particular pattern to the corruption across multiple devices.
  • Flashing the compute module (or in some cases running fsck) makes it operational again.

These are details on the partitions (these are the defaults from the official Raspbian wheeze image):

$ mount
/dev/root on / type ext4 (rw,noatime,data=ordered)
devtmpfs on /dev type devtmpfs (rw,relatime,size=185772k,nr_inodes=46443,mode=755)
tmpfs on /run type tmpfs (rw,nosuid,noexec,relatime,size=38012k,mode=755)
tmpfs on /run/lock type tmpfs (rw,nosuid,nodev,noexec,relatime,size=5120k)
proc on /proc type proc (rw,nosuid,nodev,noexec,relatime)
sysfs on /sys type sysfs (rw,nosuid,nodev,noexec,relatime)
tmpfs on /run/shm type tmpfs (rw,nosuid,nodev,noexec,relatime,size=76000k)
devpts on /dev/pts type devpts (rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000)
/dev/mmcblk0p1 on /boot type vfat (rw,relatime,fmask=0022,dmask=0022,codepage=437,iocharset=ascii,shortname=mixed,errors=remount-ro)

We are mounting the Linux partition read/write and in theory that can lead to FS corruption when cutting the power uncontrolled. But my understanding is with ext4 and data=ordered this should be very rare, certainly not as common as we're experiencing it.

Has anybody experienced similar issues or has any idea what we should look for to address this?

nhaldimann
  • 131
  • 4

3 Answers3

3

In a nutshell, the RPi family of devices does not provide a method to keep power alive while finishing a write operation on the SD card.

If during a power disconnection the memory card is in a write operation, there is a high chance that one or more sectors will be unexpectedly damaged.

My personal opinion is that the Raspberry Pi Foundation should take a look at this situation.

fcm
  • 1,869
  • 2
  • 19
  • 30
2

I found that using a read-only file system was a fine way of dealing with file corruption errors on classic Rpis.

In your case, you may at least try to put the /boot partition in read-only mode. Edit your /boot/cmdline.txt file to add at the end : ro

Then edit the line defining "/boot" your /etc/fstab :

/dev/mmcblk0p1 on /boot type vfat (ro,relatime,fmask=0022,dmask=0022,codepage=437,iocharset=ascii,shortname=mixed,errors=remount-ro)

If you need to write datas on the SD card, you can remount it in read-write mode with this command :

sudo mount -o remount,rw /boot
# write some datas and re-enable protection of the SD card
sudo mount -o remount,ro /boot
Technico.top
  • 1,426
  • 13
  • 20
0

I had the corruption issue in my SD card. I am using Raspberry Pi2. I had this issue when I was frequently writing to sqlite db (file system). When I switched from sqlite to memory for saving my temp data the problem disappeared. Still I see services like NTP doesn't work on very few instances. But it looks more stable now. I am using the USB to power my pi.