8

I have a Raspberry Pi in a remote location running from a battery charged by a solar panel and having Sleepy Pi starting it in every hour to run for a few minutes to snap some pictures, make some measurements and upload those.

The problem is, that fairly frequently (in about 2-7 days of usage) the SD card gets damaged and needs to be replaced. First I thought, that some kind of an issue with writing data to the SD card when power goes out, so I made all partitions to be mounted as read-only and all writing happens to RAM-drives only, but the SD card corruption keeps happening.

Question is, how can an all read-only SD card keep getting corrupted?

Actually I'm swapping two cards and happening with both, so probably not a card issue. Cards are of the same type, but bought at different times so likely different production batch (G.Skill 32Gb Class 10 MicroSDHC Flash Card with SD Adapter (FF-TSDG32GA-C10), http://www.amazon.com/gp/product/B007MO0YAI/ref=oh_aui_detailpage_o03_s00?ie=UTF8&psc=1)

Below is my fstab file:

proc            /proc   proc      defaults   0   0
/dev/mmcblk0p5  /boot   vfat      ro         0   0
/dev/mmcblk0p6  /       ext4      ro         0   0
/dev/mmcblk0p7  /home   ext4      ro         0   0  none
/var/run        ramfs   size=5M              0   0  none
/var/log        ramfs   size=50M             0   0

EDIT: To clarify some points pointed out by goldilocks:

  1. There are two SD cards (same type but purchased at different times, so common production issue is unlikely)

  2. The SD cards get written with DD from the same image after every corruption, so when the next corruption happens they just get swapped out - as such it is always the same 2 cards getting rotated.

  3. I don't know why the raspberry doesn't boot, as this is a headless system and only the maintenance crew has occasional access to it. I have asked them to take an image (dd) of a damaged card before they would reload it from the backup image and upload it to me. I will take a look at it when I receive it, maybe it will help me to identify at what point the boot fails.

  4. No, I'm not running fsck on the cards, they get reloaded completely from the backup image using dd.

  5. Both cards were bought for purpose, so they are unlikely to be worn-out.

  6. While I can't say for sure that this wasn't a corruption due to low voltage, the last time it happened it has happened when the battery was at 98%, the sun was up (so the solar was supplying power too), so it is unlikely that a low voltage scenario would have happened at least at this time.

Sparhawk
  • 683
  • 4
  • 18
  • 34
Zoltan Fedor
  • 181
  • 1
  • 3

3 Answers3

5

The contacts on the SD connector will bend an SD card, causing it to fail.This is especially true if you often exchange cards, However, some cards are less stiff than others and can bend more easily. We are doing development that requires swapping cards often, and this problem caused a lot of problems. The amount of bending is barely visible but can cause some pins to lose contact. We assumed that our application was causing corruption - not true. The B+ boards use micro SD and do not have this problem.

The SD cards tend to straighten after they are removed and allowed to sit in a warm room. You can test the card by pressing it down during bootup. If it boots when you are pressing it down, the card is bending.

The only reliable workaround that we have found is to use a low-profile microSD card adapter like this: http://www.adafruit.com/product/966 We suspect that many cases of 'corruption' are actually do to this problem.

Patrick Cook
  • 6,365
  • 8
  • 38
  • 63
alan baker
  • 51
  • 1
1

You could try adding this to the end of /etc/rc.local:

/bin/echo "-y" > /forcefsck

Which will run fsck -y (see man fsck) on the root filesystem early in the boot process. It will add 10-15 seconds to the boot time. You won't be able to do this this way on a read-only filesystem, obviously. You could try just permanently putting the file there, but I suspect this won't work because it happens with the fs unmounted, and is then removed (which is why it must be rewritten again later during boot via rc.local).

Of course, that's no help if the data on the card is so corrupt it cannot boot at all. I wonder what could be wrong?

  1. Both SD cards are defunct

    This would be a crazy coincidence, but not completely impossible, of course. Presuming you bought them new for this purpose, based on what you are saying about the purpose, they can't be worn out at this point, regardless of whether you used them ro or rw. Unless we consider the elephant in the room, possibility #2...

  2. Corruption due to low voltage

    Setting the card RO will prevent the chance of minor corruption due to sudden power loss -- this happens because the filesystem is left in an inconsistent state by the OS. You can also prevent it by running sync intermittently, or by using the sync mount option. In any case, this kind of corruption is:

    1. Unlikely to happen at all in the first place, unless the system is extremely busy constantly -- think, enterprise internet server. That's not the case here.

    2. Incredibly unlikely to result in a problem that leaves the system unbootable; I've never actually seen nor heard of such a case (although there are plenty of people who seem to think this happens to them, a meme which is very pernicious online WRT the pi and SD cards). Beyond that, using the /forcefsck mentioned earlier will deal with this possibility.

    Whatever's gone wrong here is not caused simply by the power suddenly dying. What it might well be caused by, though, is the slow drop in voltage that occurs when the power runs out. This presumably could cause problems on a hardware level, so setting the card RO won't make any difference.

    However, I could not find anything conclusive online regarding this possibility; some people claim that SD cards are not prone to this issue because they are built for battery powered devices.

I think you need to implement something that shuts the pi down when the voltage starts to drop. The new pi + versions have a brown-out detector that may help; while it won't cleanly shut down the OS, it will cut the power quickly rather than letting it slowly fade. As already described, sudden power loss is very very unlikely to cause any significant damage that can't be corrected with fsck. Note, however, that it's probably not a good practice in the long run since you may still occasionally loose some data (fsck does the best it can and will leave the filesystem consistent and usable).1 You need to attach a voltmeter with a chip that can message the OS via GPIO; there are various kinds of things like this for the pi available online.


1. "Consistent and usable" here means it can be mounted without error. Since this is the root filesystem, however, there's always the possibility that "data loss" includes something crucial. Again, though, events like this will be few and far between (guesstimate < 0.1% probability).

goldilocks
  • 60,325
  • 17
  • 117
  • 234
1

Mounting filesystem as read-only only prevents writes as long as the system is stable. You're telling the kernel not to write to a particular device, but in the case of a kernel crash or a brown-out (loss of electrical power) anything can happen - the code to write to the SD card is still there, and if it gets executed the contents of your card will likely be damaged.

If you want to make sure your SD card is read-only, you should write protect it, e.g. using sdtool

sudo sdtool /dev/mmcblk0 lock

Of course, you still need to keep the read-only settings in /etc/fstab, otherwise Linux will keep trying to write to the SD card, fail to do so and report all sorts of filesystem errors. Current Linux drivers seem to only understand the mechanical lock switch present on full-size SD cards, and fail to understand the locked status when no switch is present.

sdtool for the Raspberry Pi can be downloaded here.

Dmitry Grigoryev
  • 28,277
  • 6
  • 54
  • 147