Data Resiliency in 2022: Protecting Personal Data in Times of War

2022 hasn't failed to continue the legacy of the roaring 20s. Not even 3 months into the year and with potentially the largest war in Europe since 1945 breaking out, I question what this means in a modern context, and more specifically what it means for personal data storage.

Traditionally, wartime has been a time of great technological innovation and progression, such as microwaves, commercial radar usage, and most notably the first computer. Whilst these progressions have been groundbreaking breakthroughs that have changed how we live today, how do we protect the product of such innovations on a personal level?

Pretty much everything we do today, we're storing in the cloud. Our photos are in the cloud, our messages are in the cloud, our code is in the cloud. Our memories, our interactions, our work. I don't think it's unfair to say that most of us today are storing a large proportion of our lives in the cloud. Unfortunately, "the cloud" isn't some ethereal being that transcends borders or physical existence, the cloud is simply a collection of physical servers sitting in warehouses spread across the globe.

That's not to say that data centres are a hodgepodge operation thrown together by anyone that has a few million pounds to spend on servers to throw in a rickety old warehouse on an industrial unit. On the contrary, data centres are some of the most secure non-military structures on the planet, built to the highest safety specifications. Despite the incredible minds that design, maintain and work within these facilities, they're not perfect and are still prone to major accidents that can cause irrecoverable data loss.

In March 2021, French cloud computing company OVH suffered from a major fire at one of its data centres in Strasbourg which destroyed one data centre and took two others offline. The cause of this? Most probably just a faulty UPS. This led to total data loss of its EU customers for gaming company Rust, and impacted multiple other companies such as telecom company AFR-IX, encryption utility VeraCrypt, news outlet eeNews Europe, and many more.

OVH advised its customers to enact their disaster recovery plans, but for many, there simply isn't a plan for an incident of this scale and nature. The onus for localised data loss is in most cases on the provider. If a hard drive fails on a cloud storage system, this shouldn't incur any data loss. Any half-decent provider will have their storage set up in a RAID configuration, meaning that any given file isn't stored on a single hard drive it'll be stored across many different drives; if a hard drive fails, it gets replaced and any files that were stored on the failed drive will be automatically rebuilt.

Hard drive failures are to be expected and come with the territory — in 2021, storage and personal backup provider Backblaze saw 1,820 drive failures out of a total of 202,759 deployed drives. Unlike a lot of consumer-level storage solutions, Backblaze have always maintained a level of transparency as to their storage architecture. In Backblaze's case, each file is split into 20 shards – 17 shards and 3 parity shards. Each shard is stored in a separate "pod" (server), with each of these 20 pods being located in separate cabinets. This level of resiliency means that they can lose 2 entire cabinets (due to power or networking issues) and still maintain read and write access to files, and lose 3 cabinets and still maintain read access to files.

The level of resiliency employed by Backblaze with their Vault and Pod architecture means that each file has 99.999999999% annual durability. That's seriously durable. But while each file is distributed across 20 different servers, these servers still reside within the same physical data centre, leaving them liable to an on-premises disaster as happened with OVH.

Bringing this into the context of 2022 with an ongoing war in Europe and the ever-growing threat of escalation outside of Ukraine, the threat of a data centre disaster as a result of war is still extremely unlikely, but it's not totally out of the question. Surely enterprise-level organisations are working on updating their disaster recovery plans in light of current events, but should consumer-level customers be concerned about the resiliency of their data?

Rather than fear-mongering and living in a perpetual state of anxiety, when it comes to my personal data I like to live by the idioms "Hope for the best, plan for the worst" and "Don't put all of your eggs in one basket"; the basket being the data centre, and the worst being a data centre experiencing an event that causes total data loss. Chances are that if you're living within Europe, your data is also cohabiting in Europe, so while there's currently not much cause for concern, it may be time to consider what viable alternatives exist to storing personal data in a "traditional" data centre.

In an ideal world, the best way to protect data would be to write it to tape and store it underground in a demilitarised area of the Arctic Circle. Paranoid? Yes. Stupid idea? Not really. This is exactly what GitHub did.As part of the GitHub Archive Program, in 2020 GitHub took a snapshot of every single public repository that they host, wrote it to hardened film and it is currently being stored "in a steel-walled container inside a sealed chamber within a decommissioned coal mine on the remote archipelago of Svalbard". In my opinion, most organisations should be taking measures of these proportions to protect their customers' data, but unfortunately, the nature of personal data such as photos, videos and documents versus version-controlled code is somewhat different. In the long-term, code can be viewed as somewhat immutable – even if you have a very old version of an open-source project, you'll still be able to rebuild from that point. Whereas, you can't rebuild memories from a time before they happened.

Oh, and we also don't all have access to disused mines in the Arctic.

Sure, it's absolutely possible to replicate your storage across multiple providers, but depending on how much data you're carrying around with you, you'll probably end up with a rather hefty bill at the end of each month. Luckily, we're living in the era of Web3. No, I'm not about to suggest that you mint all of your data into NFTs, nor that you should store your data on the blockchain. The path I've taken to protect my data draws from one of the broader, overarching themes of Web3 — decentralisation.

Storj popped up on my radar quite a few years ago, before Web3 was called Web3. The idea was pretty simple: rent out your spare hard drive space to other people. Storj describes it more eloquently in the precursor to their whitepaper:

"We have designed a general framework of eight components that provides an optimal implementation of decentralized cloud storage that can massively scale and still operate within the limits of any design constraints. Our V3 network is S3 compatible, has a pathway from object storage to CDN, uses Reed Solomon erasure coding instead of replication, and you save 80% on your cloud storage costs compared to the big cloud storage providers."

Naturally, there's an amount of marketing spiel baked in there, but the outline of the idea remains; a decentralised cloud storage system that's S3-compatible with resiliency on a level that I've never seen before. Anyone with a decent internet connection that's able to communicate with the outside world can run a Storj node, allocating an amount of their hard drive to be used as part of the Storj network which then receives encrypted segments of files uploaded to Storj. In turn, node operators are rewarded with the $STORJ token.

The level of resiliency involved here is put into perspective when you draw a comparison to Backblaze. As previously mentioned, each file uploaded to Backblaze is split into 20 shards across separate servers, with a possible 3 shards that can be lost before the file is irrecoverable. Each file uploaded to Storj is split into 80 segments, with a possible 51 segments that can be lost before the file is irrecoverable.

Not only is the level of recoverable segment loss far higher with Storj, but each segment is also stored in an entirely different location. With no single point of failure, the possibility for a single file to be lost from the network is near-impossible. In a recent post from Storj titled "Our Approach Regarding Russian Network Connectivity Uncertainty", post author Ben Golub explores the idea of the entirety of Russia being cut off from the Storj network simultaneously. Commenting on the resiliency of the Storj network, he writes:

"Because the initial 80 pieces are distributed to storage nodes operated by different people, on different equipment, in different geographies, on different networks, on different power supplies, the system is highly resilient against things like drive failures, power outages, equipment wide bugs, and viruses, fires, floods, earthquakes, data centre failures, etc. Power outages or storms that have impacted broad swaths of Europe or North America, for example, haven’t impacted our durability."

However, the Storj organisation have undertaken modelling to predict the effects of Russia being cut off from the open internet:

"The good news is that the chances that this impacts any individual customer or file are minuscule. However, given the large number of files on our system, the chances that some file somewhere would be impacted is greater than we would like."

So it seems that no system is perfect on the face of it, but there are proactive steps that can be taken to mitigate the chances of data loss in these concerning times.

As a result of Storj's modelling, they've increased the repair threshold of segments across the network and are prioritising at-risk segments on nodes in and around Russia to ensure that total node loss in Russia wouldn't result in any file loss.

As a consumer, you can make an informed decision about where and how your data is stored. It doesn't take WWIII for your data to be lost, all it takes is a faulty UPS.

Please donate to support those affected by the ongoing war in Ukraine if you can.