8

My question is, I believe, a little bit complex, and I think it would be better to fully describe what I have and what I want to achieve, with what I am able to do.

What I have

I have a group of RPi 3B+, with one RPi serving as a Master of the cluster and the others serving as Workers. The goal is to create a cluster of RPi manageable from the master RPi to do distributed computations. They are all connected via Ethernet through a switch. I have setup a network with a DHCP server on the Master RPi (with dnsmasq) and I am able to SSH to any Worker RPi without any problem.

I have also set up a TFTP server on the Master RPi, for reasons I will explain in the next section.

What I want to achieve

I want to be able to flash the SD cards of the Workers remotely and without having to do any physical interactions with the Workers RPi, all from the master. I can manually flash a special system to the SD cards of the Workers once, but I'd like to then be able to update the OS of the Workers without having to move anything, as stated above.

The "plan" that I have now is the following:

  • Flash the Workers SD cards with a special system composed of 2 systems:
    • A piCore (TinyCore) system that, on boot, would launch a script that checks if a new image is available on the Master RPi, and if yes, downloads it with TFTP, flashes it and reboot.
    • A second partition that would contain a "normal" system, set up by the piCore system. Ideally, piCore could set up any kind of OS (Raspbian, an other piCore system or whatever).

What I am able to do

I am able to flash an SD card with piCore, use fdisk to create a new partition intended for the "normal" system. I can set up a script in piCore that will check if a new image is present on the Master RPi, and download it.

What I don't know how to do

How can I, once I have the image on the piCore partition, flash it in the second partition ? I only know the unix command dd that allows me to flash an entire SD card from an .img file, and I obviously don't want to flash the entire SD card of the Worker but only the special partition dedicated to the "normal" system.

How can I configure the bootloader of the SD card to make sure it always boot the piCore partition, and how can I, from the special script in the piCore partition, make sure that when I reboot, the newly flashed "normal" system is booted up and not again the piCore system.

I think I have a correct plan, at least "conceptually", to solve the problem, but I lack the technical knowledge on which tool I should use and how to configure them. I don't have a lot of experiences on this kind of low-level things, unfortunately.

I also heard of the process of "netbooting" the Worker to a filesystem located on the Master, but I'm not sure it would fit the requirements of my problem. Mainly, I don't know if it possible for multiple to netboot to the same (and more specifically, use the same remote filesystem at the same time) and if it possible to set up the Worker such that it reboots on the local filesystem on the SD card after it has checked if a new image is available on the Master RPi.

Thanks in advance!

Longwelwind
  • 83
  • 1
  • 6

1 Answers1

9

Here is a solution with netbooting using sytemd-networkd.

Network booting works only for the wired adapter. Booting over wireless LAN is not supported 1.

It is also important that there is already a working DHCP server on the local network.

We use RPi 3B+. It comes with "Improved PXE network and USB mass-storage booting" 2. So PXE booting will work out of the box. Please forget all the quirks, hints and workarounds to netboot with older models you may find on the web. There is no need to prepare the worker for netbooting. It will simply try it if there is no SD Card inserted.

So lets look what I've tested. I followed mostly the official tutorial 3 for older models but adapted it to the needs of this question and for the RPi 3B+.

For reference I flashed Raspbian Stretch Lite 2018-06-27, enabled ssh and made a full-upgrade. This setup can be done headless. After first boot ssh into the RPi and update Raspbian:

raspberrypi ~$ sudo -Es
raspberrypi ~# apt update
raspberrypi ~# apt full-upgrade


Setup systemd-networkd

For detailed information look at 4. Here only in short. Execute these commands:

# Install helpers
raspberrypi ~# apt --yes install rng-tools systemd-container

raspberrypi ~# systemctl mask networking.service
raspberrypi ~# systemctl mask dhcpcd.service
raspberrypi ~# mv /etc/network/interfaces /etc/network/interfaces~
raspberrypi ~# sed -i '1i resolvconf=NO' /etc/resolvconf.conf

raspberrypi ~# systemctl enable systemd-networkd.service
raspberrypi ~# systemctl enable systemd-resolved.service
raspberrypi ~# ln -sf /run/systemd/resolve/resolv.conf /etc/resolv.conf

We will give our master a static ip address because it works as a server. For example my master is on subnet 192.168.10.0/24 static ip address 192.168.10.60 broadcast address 192.168.10.255 gateway/router 192.168.10.1 dns server 192.168.10.10 Of course you have to use the ip addresses from your network. Look what are yours. You may find your dns server with cat /etc/resolv.conf. If in doubt you may use googles dns server 8.8.8.8. To set the static ip address write this file:

raspberrypi ~# cat > /etc/systemd/network/04-eth.network <<EOF
[Match]
Name=e*
[Network]
Address=192.168.10.60/24
Gateway=192.168.10.1
DNS=192.168.10.10
EOF

Rename hostname from raspberrypi to master:

raspberrypi ~# sed -i 's/raspberrypi/master/' /etc/hostname
raspberrypi ~# sed -i 's/raspberrypi/master/g' /etc/hosts

Reboot.


Master configuration

ssh into your master. Remember that is has now a new static ip address.

This setup will also be used for the worker, so we copy it to a directory we will later mount as root partition for the worker.

master ~$ sudo -Es
master ~# mkdir -p /nfs/worker1
master ~# rsync -xa --exclude /nfs / /nfs/worker1

Don't worry now. Depending on your SD Card copying of 1.1 GByte will take about 15 minutes or longer. Look at the green led on your RasPi.

When finished prepare the network and the name of the worker:

master ~# rm /nfs/worker1/etc/systemd/network/04-eth.network
master ~# sed -i 's/master/worker1/' /nfs/worker1/etc/hostname
master ~# sed -i 's/master/worker1/g' /nfs/worker1/etc/hosts

Now we start the worker in a container. This is similar to chroot but more powerful. We regenerate SSH host keys so ssh will not complain about spoofing ("it has already seen the same host with other ip address"):

master ~# systemd-nspawn -D /nfs/worker1 /sbin/init

Login and execute following commands. This will create new SSH2 server keys and it tries to start the ssh.service but that will fail because the ethernet interface is already used by the master. Starting the ssh.service (here with error) is essentional because we are headless on the worker. If the worker is running on its own hardware this should go without error.

worker1 ~$ sudo rm /etc/ssh/ssh_host_*
worker1 ~$ sudo dpkg-reconfigure openssh-server
worker1 ~$ logout

Exit from container with CTRL+(short three times)].


Setup tftp server

Now we will install a tftp server that is needed to send boot files to the worker. The program dnsmasq will provide this. Also we install the network sniffer tcpdump to look if the worker requests its boot files the right way:

master ~# apt --yes install dnsmasq tcpdump
master ~# # Stop dnsmasq breaking DNS resolving:
master ~# rm /etc/resolvconf/update.d/dnsmasq

Now start tcpdump so you can search for DHCP packets from the worker:

master ~# tcpdump -i eth0 port bootpc

Now power on the worker RPi without SD Card. Then you should get packets from it "DHCP/BOOTP, Request from ..."

IP 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from b8:27:eb:d3:85:78

Here we have to notice the mac address b8:27:eb:d3:85:78 from the worker RPi. You should also see that it gets a reply to an ip address from the DHCP server on your local network, here 192.168.10.1:

IP 192.168.10.1.bootps > 192.168.10.101.bootpc: BOOTP/DHCP, Reply, length 300

Exit with CTRL+C. Then we have to configure dnsmasq to serve boot files via tftp. Write this file:

master ~# cat > /etc/dnsmasq.conf <<EOF
port=0
dhcp-range=192.168.10.255,proxy
log-dhcp
enable-tftp
tftp-root=/tftpboot
tftp-unique-root=mac
pxe-service=0,"Raspberry Pi Boot"
EOF

The first address of the dhcp-range is the broadcast address of your network. Now create a /tftpboot directory. The subdirectory for the specific worker (its mac address we have noticed with tcpdump) must have only lower case characters and dashes:

master ~# mkdir -p /tftpboot/b8-27-eb-d3-85-78
master ~# chmod -R 777 /tftpboot
master ~# systemctl enable dnsmasq.service
master ~# systemctl restart dnsmasq.service

Monitor dnsmasq:

master ~# journalctl --unit dnsmasq.service --follow

Now power cycle the worker RPi. You should see something like this:

master dnsmasq-tftp[756]: file /tftpboot/b8-27-eb-d3-85-78/bootcode.bin not found

Next, you will need to copy bootcode.bin and start.elf into the /tftpboot/b8-27-eb-d3-85-78 directory. You should be able to do this by copying the files from /boot, since these are the right ones. We need a kernel, so we might as well copy the entire boot directory. First, use Ctrl+C to exit the monitoring state. Then type the following:

master ~# cp -r /boot/* /tftpboot/b8-27-eb-d3-85-78

Restart dnsmasq for good measure:

master ~# systemctl restart dnsmasq

Edit /tftpboot/b8-27-eb-d3-85-78/cmdline.txt and from root= onwards, replace it with:

root=/dev/nfs nfsroot=192.168.10.60:/nfs/worker1,vers=3 rw ip=dhcp rootwait elevator=deadline

You should substitute the IP address here with the static ip address of your master.

Set up NFS root

This should now allow your Raspberry Pi to boot through until it tries to load a root filesystem that is normally located at the second partition of the SD Card (which it doesn't have). All we have to do to get this working is to export the /nfs/worker1 filesystem we created earlier.

master ~# apt install nfs-kernel-server
master ~# echo "/nfs *(rw,sync,no_subtree_check,no_root_squash)" | tee -a /etc/exports
master ~# systemctl enable rpcbind
master ~# systemctl restart rpcbind
master ~# systemctl enable nfs-kernel-server
master ~# systemctl restart nfs-kernel-server

Finally, edit /nfs/worker1/etc/fstab and remove or comment the PARTUUID=efe16111-01 and PARTUUID=efe16111-02 lines (only proc should be left).

Now power cycle the worker RPi and it should boot. You can monitor again. You will also see what ip address your worker has:

master ~# exit
master ~$ journalctl --unit dnsmasq.service --follow

Now you should be able to ssh into the worker e.g. with:

master ~$ ssh pi@192.168.10.101


What to do next?

You have now a working base for one worker. It should be no problem to add the next worker2 with e.g. mac address b8:27:eb:0e:3c:6f. Create directories mkdir /tftpboot/b8-27-eb-0e-3c-6f and mkdir /nfs/worker2, copy boot and root data to it and modify /tftpboot/b8-27-eb-0e-3c-6f/cmdline.txt and /nfs/worker2/etc/fstab. Then worker2 should boot.

You can manage your workers from the master by running them in a container as shown above with sudo systemd-nspawn -D /nfs/worker1 /sbin/init, e.g. for maintenance. But this can only be done if the worker is shut down.

Yes, there is much to optimize. But this is out of scope here and can be asked as separate questions.

You need a bit of storage and you can attach an external USB storage (stick or disk) to the master. Most files are identical. It may be possible to work with hard links. There are backup strategies using this. I don't know if it is workable for this purpose.

You can strip down the operating system of the workers to just what they need. First step could be to clean up from old networking (ifupdown), dhcpcd and openresolv 4.

If the worker does not need to be persistent after reboot, means forget all changes from runtime, then you can use a read only root directory. This has the big advantage that you only need one boot and root directory for all workers. Problem is that the workers need different names on the network (worker1, worker2, ...) but this can be solved with DHCP. To achive this you can pay attention to special transient directories 5 or with overlay file systems 6.


references:
[1] Network booting
[2] Raspberry Pi 3 Model B+
[3] Network Boot Your Raspberry Pi
[4] Howto migrate from networking to systemd-networkd with dynamic failover
[5] Can a Raspberry Pi be used to create a backup of itself?
[6] How do I make the OS reset itself every time it boots up?

Ingo
  • 42,961
  • 20
  • 87
  • 207