Backup your data before the ransom strikes

Ramblings about saving your data from distruction and loss

I appreciate nowadays most developers feel serverless and are happy with a bunch of stateless machines that just implode on theirselves when they're done with their job.

But still, I have a handful of machines that I care about and can't definitely consider them as cattle. While that might be considered like a futile hobby, I still see a lot of use-cases for maintaining pieces of infrastructure, and, consequently, given the impending risks that everything online faces, you want to have in place a disaster recovery plan.

Ransomware, botnets, ssh login attempts, reply-to spam, you name it... the risks are always there. If you have any type of digital asset online, and from time to time you look into your logs, you know well what I am talking about.

The mindset should be that, if you have online assets of some sort, it can always happen something bad, they could steal your data, sneak peak in your traffic, destroy your data. It can happen, and given enough time, it probably will.

Therefore the following things here are really imperative:

all the data in transit should be encrypted
always have a reasonably recent of your data offsite
everything at rest that represents sensible data should be encrypted

I appreciate there are better dedicated solutions, and probably what I am going to describe next cuts some corners, but everything should be put in perspective, and the perspective is that if tomorrow a datacenter catches fire, or a ransomware hits your temporary machine,that has some temporary data, that are temporarily not backed until... you lose everything.

You must have an evacuation plan, a script that backs up mostly all you care about, encrypts the backups and sends it somewhere where you'll have the care and the attention to make sure everything important is not lost.

For the encryption, we'll be using this script x509crypt (I briefly discussed the implementation here) with some precisation. This script uses a public key (it is actually a certificate that you'll generate on your local machine) to encrypt the files, making sure the cleartext archive never surpasses 512 Mb, therefore chopping into similarly sized chunks every big archives. You'll not be sharing the private key that you used to generate the public one, so that you only can decrypt the backups. I know this sounds a lot like what actual ransomware does... encrypt everything, send it somewhere and keep it. And, above all, keep the secret key, back it up offline, because if you lose that, you lose the ability to decrypt your own files... you don't want that!

You want to keep track of the following:

The top directory you want to backup in block, let us call it: DIR_TOPLIST
The dir that contains archives to tar together, e.g.: /home if you want the users dir to be in separate archives, let us call this: DIR_SUBLIST
A work directory for the backup process, probably outside of the above directory tree, if you want to avoid obvious loops

Furthermore, if we are going to use an S3-compatible api to push the backup out, we want to be able to specify a bucket name, say: BUCKET If we are going to use x509crypt you'll need to create a keypair first, and have a PROFILE associated with them.

This makes for the configuration of our " backup system" of sort:

#-----------------------------------------------------------
DIR_TOPLIST="/etc /var/www /root /opt/someproduct"
DIR_SUBLIST="/home"
WORK_DIR="/opt/backup"
PROFILE=backup-keypair
BUCKET=backup-bucket
#-----------------------------------------------------------

All the backup files will carry around the current date in the name, so that we'll eventually be able to do some sort of after-market magic like: "keep one backup a month, apart from the latest month, where I keep all the mondays". This is some luxury that you can achieve one you start having backup saved, here we're just packing up everything and send it somewhere, on a channel that we shouldn't even trust 100%, as an attacker could start, at some point in time, disrupting the backup themselves. The important part is that the old ones are kept, and with a reasonable time interval (depending on the importance of the data) the backup are checked.

If - other than filesystem - you have something like a database on the machine, for example MySQL, or PostgresSQL, you will probably want to have a dump of the data somewhere on disk where you will be archiving the hierarchy.

Since a picture is worth a thousand words, in less than thousand words you have the picture of what I am thinking for MySQL:

function mysql_process()
{
  DATABASES=$(echo "show databases" | mysql -u root -p$MYSQL_ROOT_PWD | tail -n +2)
  for db in $DATABASES; do
    echo "database: [$db]"
    DOC_TARGET="${PROCESSING_AREA}/${DATE}.$(hostname)-mysql-${db}"
    mysqldump --skip-lock-tables --password=$MYSQL_ROOT_PWD $db > ${DOC_TARGET}
    gzip -9 ${DOC_TARGET}
  done
}

and for Postgres:

function postgres_process(){
  DATABASES=$(export PGPASSWORD=$POSTGRES_PWD; psql -h localhost -U postgres postgres -c\
         " SELECT datname FROM pg_database WHERE datistemplate = false" | sed -e '1,2d' | sed  '$ d'| sed '$ d')

  for db in $DATABASES; do
    echo "database: [$db]"
    DOC_TARGET="${PROCESSING_AREA}/${DATE}.$(hostname)-postgres-${db}"
    export PGPASSWORD=$POSTGRES_PWD; pg_dump -h localhost -U postgres $db > ${DOC_TARGET}
    gzip -9 ${DOC_TARGET}
  done
}

And of course you will know that here we have two unconfortable elephants in the room, the variable MYSQL_ROOT_PWD and POSTGRES_PWD that you will need to provide somehow. The most simplistic way would be to source them at the top of the scripts, but there we could do a bit better and avoid having plaintext around.

What I put together, following the descriptions above, is a script that runs on linux and depends on x509crypt and s3cmd.

You might want to reuse or have it as inspiration for your disaster recovery plan:

backup-node

[git] [x509] [linux] [certificate] [api]