Pointed Development

A Robust Personal Backup Strategy

| Comments

Everyone needs a reliable backup solution. Here’s what I came up with. Like any solution I develop, though, I’m always seeking to improve. Suggestions welcome!

I’ve lived 99% of my computing life recklessly: Without a reliable backup solution. Back in the 90s, I had scattered bits and bytes of digital necessity copied to a dozen or so 3.5″ floppies. At one point, I even had an Iomega 100 MB Zip Drive I used to keep essential data archived.

Later, probably sometime around the mid 2000s, I started relying on separate partitions on separate HDDs, some CDRWs and even a 250 GB NAS which I thought was pretty slick. But I never really had a reliable or permanent backup solution.

I had a backup mess.

I had data everywhere: I had no idea what it was, what was where or (even worse) how to recover anything. I didn’t even know what was or wasn’t important, essential or worthless.

And I had no way to tell.

When I built my new development machine, I took the time to do the research and came up with a solid backup strategy that I then implemented and tested.

So far it’s working flawlessly and I’m really happy with the design. Here it is!

My family has a lot of data across different spectrums of importance: Some of it’s not important at all, like game save files (though my kids might argue this point). Some if it’s relatively important, like HD video of our children or important receipts. Some of the data is critically important, like HD video of my son’s first moments or my own in-progress projects.

I wanted to ensure that I could back up everything according to some imaginary Severity Factor ™. As such, my backup solution supports three degrees of backup criticality:

  1. Critical: This is data I MUST have in the event of a catastrophe, where ‘catastrophe’ is defined as a complete loss of local computer hardware. I may not need this data right away, either. Project files and once-in-a-lifetime video files fall under this category. It’s critical, but I don’t mind if it takes a week to recover.
  2. Important: This is data that I would really like to keep if possible. And I don’t mind if it takes a few days to get, either, as long as I can get it. A video of a school play, scanned receipts, etc. belong here.
  3. Moderate: I would like to get to this data, but it’s not essential. Save games, maybe iTunes media fall under this severity level. If I could never get this data back, it’s not the end of the world.

The Internal Solution

I consider my “internal solution” to encompass everything inside of my house, not just internal to my machine. This includes any external USB drives, the Xbox 360, Apple TV, etc.

Let’s look at my machine, first.

Internally, the computer has three drives: The operating system runs on a 256 GB SSD, which also stores a majority of my project files. I also have 2 x 2 TB HDDs for large media files: Photos, HD video, etc. The HDDs are currently configured at RAID 0, where I accept the risk of a lost drive (and therefore, lost data) for the performance gain and increased capacity.

My machine also has a network share which stores critical data from my wife’s machine: She has a shared folder she can drop bits into. This gets backed up locally, locally off-site and into The Cloud.

My long-term vision is to upgrade my current RAID scheme with two more 2 TB drives and migrate the RAID 0 configuration to RAID 1 0. I suspect this will be my Christmas present to myself, so my current RAID 0 setup only needs to “survive” another month or two.

To compliment the machine, I also have a Buffalo DriveStation 4 TB RAID external drive connected via USB 3.0.

The machine is configured to backup everything to the DriveStation continuously and automatically. The purpose is to have a constantly maintained mirror of everything across all of my drives within easy reach, in the event that my internal HDDs or SSD die.

Finally, between our Apple TV and Xbox 360, we have local copies of most of our digital media, including music and movies.

Everything within the Moderate spectrum gets backed up to the DriveStation. Some items get backed up to the Xbox or Apple TV. Eventually, everything will be mirrored within the internal RAID 1 0.

Recovery of these items is the fastest of all of my current backup options: I should be able to completely restore my whole computer from the DriveStation in a couple of hours at most. Critical items can be restored almost instantly. Items copied over the Xbox or Apple TV will take a little longer over the wifi network, but it won’t be terrible.

Side note: Yes, I perfectly understand that RAID is not backup, and in my current configuration, I’m actually taking quite a risk by using RAID 0. For now, this is an acceptable risk. Later, when I migrate to RAID 1 0, it’s still not a backup solution: Absolutely not. But it’s part of my overall strategy to protect against hardware failure.

The External Solution

All data that falls under the Important category gets backed up to the DriveStation AND to an off-site location down the street. I gave a friend a 1 TB HDD that he placed inside his machine: We use CrashPlan to run backups between our systems.

The purpose of the local off-site solution is to be able to pull data immediately and relatively quickly (we’re both over 25 Mbps fiber connections) in the event of a catastrophe, such as my computer exploding. If needed, I could also drive down to his house and pick up the hardware.

This solution is a little slower to recover: The 25 Mbps connection is fast but not that fast. It could realistically take a day or two to recover this data over the Internet, or half a day to pick up the drive depending on our mutual availability.

If I lose every piece of hardware in my home, however, it’s nice knowing that most of my data is just down the street.

I can’t fully trust the local off-site: What if my friend accidentally paves the machine? Spills coffee on it? Finally, can I trust the security of that data? Absolutely not: I can’t ensure his machine isn’t hacked and the data stolen. Therefore, I do not back up anything that we (as a family) consider confidential or secure, such as documents with our social security numbers, SSH keys or anything like that.

The Cloud Solution

Finally, Critical data. In addition to being backed up to the DriveStation and backed up to a local offsite location, I also copy critical data to The Cloud. This solution costs the most and is the slowest to recover.

There are plenty of blogs and companies that discuss cloud-based backup solutions. That’s not the point of this post, so I’ll be brief about it.

In this case, The Cloud represents many different services, including Google/GMail, GitHub, a virtual private server hosted by Linode, Flickr, etc. In addition, I also use a cloud-based backup solution, SpiderOak, which maintains a copy of everything backed up to the local off-site (and a little more).

This data absolutely cannot be lost, ever! In the event that Portland gets hit by a meteor, I should still be able to recover this critical data.

Hyperbole aside, I can’t completely trust my DriveStation (what if I flood my office with coffee?) or my local off-site (like I said, what if my friend accidentally paves my drive?)… For that matter, I can’t completely trust The Cloud… But I have a high degree of confidence that in replicating critical data to all three, I won’t lose anything.

The Cloud Solution is the slowest option of the three to recover: I’m bound by my download connection speed and by the connection of the service provider and everything in between. This solution is also somewhat pricey: SpiderOak gives me 100 GB for $10/month. I could probably find a cheaper provider, but I really like the security provided by SpiderOak. I can’t copy every single HD video I have of my son to The Cloud, but I can copy a couple of our most precious memories without eating up all of my available space.

As stated, I really like the security provided by SpiderOak. Counter intuitively, I “trust” the data stored at SpiderOak slightly more than the data stored at my local off-site.

Sorting and Searching

The final issue I had with my previous “backup” mess was a general lack of organization. I’m still going through this process, but I’m now organizing every single video, picture, project, document and piece of data I can find into a manageable structure. With files going back almost a decade, it’s a struggle. However, with my current backup solution in place, I don’t worry too much about losing any of my data.

And there’s peace of mind in knowing that it’s all safe.

Final Thoughts

There are a couple of additional thoughts I’d like to incorporate with my full backup strategy, but I haven’t fully merged the ideas with my current setup.

  • Encrypted, rotating external HDDs stored in a bank
    This is part of Scott Hanselman’s strategy (see below) and I really like this idea: You encrypt an external drive, backup your data to it daily, weekly, monthly or whatever, and – on a schedule – rotate the drive with a secondary drive that you’ve been keeping in a safe deposit box. I really appreciate the charm of having completely secure data with this idea.
  • A wireless backup solution
    For this, I’m thinking along the lines of the Apple Time Capsule. Today, I’m taking a small risk by having my local on-site backup right next to my computer: A leaky roof, a spilled drink or whatever could be disasterous. I like the idea of having a local on-site that sits in a different room. A NAS would work well, too.

These other fine people have great opinions about personal backup. Take a peek!