#_ITandME blog

March 9, 2015: Raspi 'cluster' without the cluster

After a rather lengthy period of time, I've finally achieved the redundancy I've been looking for...at least for now. The basic idea is to have two Raspberry Pi's running as a active-passive failover 'cluster'. I say 'cluster' because it's not REALLY an enterprise cluster with shared NAS/SAN.

Here are some of the details:

Using a bash shell script, I created a simple ping test from the secondary pi server. See code and comments below:


#!/bin/bash
#---------------------------------------------------------------------------
# GLOBAL VAR            # These are global variables I assigned to make future
PRIMARY=192.x.x.x       # edits easier.
FAIL="100%"             # PRIMARY = primary IP
NOW=$(date)             # FAIL = indicates 100% ICMP failure if success, 0% loss
#---------------------------------------------------------------------------
# HERE COME THE FUNCTIONS...
function failOVER
{
        rm -f ping.test

        ping $PRIMARY -c 1 >> ping.test
        .
        .
        .

        # This is repeated 7 times. I learned if I used '-c 7' the count I set failed (only
        # registered ICMP instance) so I stacked 7 seperate ICMP ping instances
        # failover is designed to conduct a more intense ping scan to make sure thing are really
        # broken

        TEST=$(cat ping.test | grep $FAIL | wc -l)
        if [ $TEST -eq 7 ]
                then resetSERVER
        fi
}

function resetSERVER
{
# reset network
# I have two interfaces files in /etc/network, one for current and one for failover...
        rm -f /etc/network/interfaces
        mv /etc/network/interfaces.failover /etc/network/interfaces
# reset hostname (delete /etc/hostname and recreate)
        rm -f /etc/hostname
        echo xiphos-tech.info >> /etc/hostname
# reset hosts file
        rm -f /etc/hosts
        touch /etc/hosts
        echo 127.0.0.1          xiphos-tech.info >> /etc/hosts
        echo ::1                localhost ip6-localhost ip6-loopback >> /etc/hosts
        echo fe00::0            ip6-localnet >> /etc/hosts
        echo ff00::0            ip6-mcastprefix >> /etc/hosts
        echo ff02::1            ip6-allnodes >> /etc/hosts
        echo ff02::2            ip6-allrouters >> /etc/hosts
        echo 127.0.1.1          xiphos-tech.info >> /etc/hosts
        echo $NOW FAILOVER: Server assuming XIPHOS-TECH.INFO Primary >> /var/log/messages

        /sbin/shutdown -r now

# Had trouble with 'restart' or 'shutdown', but full path + actually stating the time 'now' worked
}
#----------------------------------------------------------------------------------------------------
# MAIN where we really start...if you've made it down this far...

ping $PRIMARY -c 1 >> ping.test
TEST=$(cat ping.test | grep $FAIL | wc -l)
if [ $TEST -eq 1 ]
        then
                failOVER
                echo $NOW Heartbeat to XIPHOS-TECH.INFO - P failed >> /var/log/messages
        else
                echo $NOW Heartbeat to XIPHOS-TECH.INFO - P was successful >> /var/log/messages
fi

rm -f ping.test

#------------------------------------------------------------------------------------------------------


While my ping script is really exciting...*not*, what really makes this work is the cron job that runs every 5 seconds. It's not very complicated, just

        
        /5* * * * * /opt/my-heartbeat
        
        

That's pretty much all there is to it. There's no magic here, just some simple testing. The application of this kind of heartbeat could go beyond a simple ICMP ping, but could possibly test other types of server connections, like SSH for example. Instead of jumping everything to another server, it would be possible to shift a particular service.

Anyway, still learning and perfecting, but I've tested this process several time with good success, going as far as configuring SSMTP client to text message me when an outage occurs, but more on that later...

Happy administration, everybody.