78

This question was inspired by https://superuser.com/questions/374386/how-to-store-and-preserve-lots-of-data. There have been other similar questions, but none with the same criteria.

This is two questions in one.

  1. How do you store financial/critical records that should survive anything but a fire and should be available for decades?
  2. Lets say I want to store family photos/videos and want people do be able to find them in storage 100 years from now and still be able to use them. How would this be done?

Criteria

  1. Long term means 30+ years guaranteed. 100+ years average. [If this is not practical, use the closest solution]
  2. High volume means a couple terabytes.
  3. Answers can be 'no-compromise/industrial' solutions or practical solutions for the home office/small business user.
  4. Media will not be active during the timespan. (i.e., if you suggest hard drives, they will not be spinning).
  5. Further, there is no expectation of needing to read these archives. They are there for emergency or "for future generations" purposes.
  6. Should not require maintenance (if at all possible).

My thoughts:

  1. CD-R/DVD-Rs have proven to me, even in the short term, to be a terrible medium for backups. They seem to be very fragile and seem to lose their data a very short time even when in pristine condition.
  2. I can't help but think that storing data on a couple of 1TB hdd's and then expecting them to spin up correctly a decade or two later to be a terrible idea. Am I wrong?
  3. Industrial tape drives seem like a viable option?
user606723
  • 1,528
  • I'm no expert, but I'd say tape. This question might be better on Server Fault, but I honestly don't think it fits perfectly on either, so I'll decline to vote. It is a good question and should live somewhere. – Shinrai Jan 04 '12 at 17:57
  • I agree @Shinrai. I am welcome to moving this somewhere else if someone can comment on where it should live. – user606723 Jan 04 '12 at 18:01
  • 5
    If you want no compromise, there is existing technology that is designed to last at least 40,000 years with no intervention: http://voyager.jpl.nasa.gov/spacecraft/goldenrec.html – fixer1234 Feb 05 '15 at 23:20
  • 3
    The future is in crystals, it can potentially store 360TB and last a million years. See: 5D 'Superman memory crystal' heralds unlimited lifetime data storage – kenorb May 21 '15 at 11:03
  • 3
  • 2
  • I had 40 drives over the years that I have recently had to DOD erase. While this is painful, I was surprised that out of the 40, only 8 did not spin up (some are 20 years old). However, I’d never want to depend on magnetic storage — what is state of the art today is hard to find tomorrow. Even DVD’s are becoming obsolete. I’m looking at using SD Flash. Tape drives have the same issue as HDD — you cannot assume to find one that will work in 20 years. USB isn’t going away soon. SD unlikely. – Lloyd Sargent May 24 '22 at 20:42
  • @LloydSargent While tape drives may certainly fail, the tapes cartridges are independent, they can outlive the drives. With open standards such as LTO, it is theoretically possible that anybody can manufacture such drives in the future. The only issue is that in practice not everyone will be able to manufacture such drives, but governmental bodies can certainly rely on such standards as they have enough resources to make their own drives if necessary. For us mere mortals we have to regularly make new copies on newer LTO revisions every decades. We can still find LTO1 drives and they are 23yo. – gaborous Apr 24 '23 at 22:06

16 Answers16

99

Short answer

It's impossible to guarantee a long timeframe because of entropy (also called death!). Digital data decay and dies, just like any other thing in the universe. But it can be slowed down.

There's currently no fail-proof and scientifically proven way to guarantee 30+ years of cold data archival. Some projects are aiming to do that, like the Rosetta Disks project of the Long Now museum, although they are still very costly and with a low data density (about 50 MB).

In the meantime, you can use scientifically proven resilient optical mediums for cold storage like Blu-ray Discs HTL type like Panasonic's, or archival grade DVD+R like Verbatim Gold Archival, and keep them in air-tight boxes in a soft spot (avoid high temperature) and out of the light.

Also be REDUNDANT: Make multiple copies of your data (at least 4), and compute hashes to check regularly that everything is alright, and every few years you should rewrite your data on new disks. Also, use a lot of error correcting codes, they will allow you to repair your corrupted data!

Long answer

Why are data corrupted with time? The answer lies in one word: entropy. This is one of the primary and unavoidable force of the universe, which makes systems become less and less ordered in time. Data corruption is exactly that: a disorder in bits order. So in other words, the Universe hates your data.

Fighting entropy is exactly like fighting death: you're not likely to succeed, ever. But, you can find ways to slow death, just like you can slow entropy. You can also trick entropy by repairing the corruptions (in other words: you cannot stop the corruptions, but you can repair after they happen if you took measures beforehand!). Just like anything about life and death, there's no magic bullet, nor one solution for all, and the best solutions require you to directly engage in the digital curation of your data. And even if you do everything correctly, you're not guaranteed to keep your data safe, you only maximize your chances.

Now for the good news: there are now quite efficient ways to keep your data, if you combine good quality storage mediums, and good archival/curation strategies: you should design for failure.

What are good curation strategies? Let's get one thing straight: most of the info you will find will be about backups, not about archival. The issue is that most folks will transfer their knowledge on backups strategies to archival, and thus a lot of myths are now commonly heard. Indeed, storing data for a few years (backup) and storing data for the longest time possible spanning decades at least (archival) are totally different goals, and thus require different tools and strategies.

Luckily, there are quite a lot of research and scientific results, so I advise to refer to those scientific papers rather than on forums or magazines. Here, I will summary some of my readings.

Also, be wary of claims and non independent scientific studies, claiming that such or such storage medium is perfect. Remember the famous BBC Domesday project: «Digital Domesday Book lasts 15 years not 1000». Always double check the studies with really independent papers, and if there's none, always assume the storage medium is not good for archival.

Let's clarify what you are looking for (from your question):

  • Long-term archival: you want to keep copies of your sensible, irreproducible "personal" data. Archiving is fundamentally different than a backup, as well explained here: backups are for dynamic technical data that regularly get updated and thus need to be refreshed into backups (ie, OS, work folders layout, etc.), whereas archives are static data that you would likely write only once and just read from time to time. Archives are for intemporal data, usually personal.

  • Cold storage: you want to avoid maintenance of your archived data as much as possible. This is a BIG constraint, as it means that the medium must use components and a writing methodology that stay stable for a very long time, without any manipulation from your part, and without requiring any connection to a computer or electrical supply.

To ease our analysis, let's first study cold storage solutions, and then long-term archival strategies.

Cold storage mediums

We defined above what a good cold storage medium should be: it should retain data for a long time without any manipulation required (that's why it's called "cold": you can just store it in a closet and you do not need to plug it into a computer to maintain data).

Paper may seem like the most resilient storage medium on earth, because we often find very old manuscript from ancient ages. However, paper suffers from major drawbacks: first, the data density is very low (cannot store more than ~100 KB on a paper, even with tiny characters and computer tools), and it degrades over time without any way to monitor it: paper, just like hard drives, suffer from silent corruption. But whereas you can monitor silent corruptions on digital data, you cannot on paper. For example, you cannot guarantee that a picture will retain the same colors over only a decade: the colors will degrade, and you have no way to find what were the original colors. Of course, you can curate your pictures if you are a pro at image restoration, but this is highly time consuming, whereas with digital data, you can automate this curation and restoration process.

Hard Drives (HDDs) are known to have an average life span of 3 to 8 years: they do not just degrade over time, they are guaranteed to eventually die (ie: inaccessible). The following curves show this tendency for all HDDs to die at a staggering rate:

Bathtub curve showing the evolution of HDD failure rate given the error type (also applicable to any engineered device):

curve-hdd1

Curve showing HDD failure rate, all error types merged: curve-hdd2

Source: Backblaze

You can see that there are 3 types of HDDs relatively to their failure: the rapidly dying ones (eg: manufacturing error, bad quality HDDs, head failure, etc.), the constant dying rate ones (good manufacturing, they die for various "normal" reasons, this is the case for most HDDs), and finally the robust ones that live a bit longer than most of HDDs and eventually die soon after the "normal ones" (eg: lucky HDDs, not-too-much used, ideal environmental conditions, etc..). Thus, you are guaranteed that your HDD will die.

Why HDDs die so often? I mean, the data is written on a magnetic disk, and the magnetic field can last decades before fading away. The reason they die is because the storage medium (magnetic disk) and the reading hardware (electronic board+spinning head) are coupled: they cannot be dissociated, you can't just extract the magnetic disk and read it with another head, because first the electronic board (which convert the physical data into digital) is different for almost each HDD (even of the same brand and reference, it depends on the originating factory), and the internal mechanism with the spinning head is so intricate that nowadays it's impossible for a human to perfectly place a spinning head on magnetic disks without killing them.

In addition, HDDs are known to demagnetize over time if not used (including SSD). Thus, you cannot just store data on a hard disk, store it in a closet and think that it will retain data without any electrical connection: you need to plug your HDD to an electrical source at least once per year or per couples of years. Thus, HDDs are clearly not a good fit for cold storage.

Magnetic tapes: they are often described as the go-to for backups needs, and by extension for archival. The problem with magnetic tapes is that they are VERY sensitive: the magnetic oxide particles can be easily deteriorated by sun, water, air, scratches, demagnetized by time or any electromagnetic device or just fall off with time, or print-through. That's why they are usually used only in datacenters by professionals. Also, it has never been proven that they can retain data more than a decade. So, why are they often advised for backups? Because they used to be cheap: back in the days, it costed 10x to 100x cheaper to use magnetic tapes than HDDs, and HDDs tended to be a lot less stable than now. So magnetic tapes are primarily advised for backups because of cost effectiveness, not because of resiliency, which is what interests us the most when it comes to archiving data. 2023 Update: LTO, an open-standard for magnetic tapes, is now widely spread, and LTO5+ drives supporting the standardized LTFS filesystem are available to consumers especially at refurbished prices, so I would now recommend LTO drives over optical discs, see my other answer below.

CompactFlash and Secure Digital (SD) cards are known to be quite sturdy and robust, able to survive catastrophic conditions.

The memory cards in most cameras are virtually indestructible, found Digital Camera Shopper magazine. Five memory card formats survived being boiled, trampled, washed and dunked in coffee or cola.

However, as any other magnetic based medium, it relies on an electrical field to retain the data, and thus if the card runs out of juice, data may get totally lost. Thus, not a perfect fit for cold storage (as you need to occasionally rewrite the whole data on the card to refresh the electrical field), but it can be a good medium for backups and short or medium-term archival.

Optical mediums: Optical mediums are a class of storage mediums relying on laser to read the data, like CD, DVD or Blu-ray (BD). This can be seen as an evolution of paper, but we write the data in a so tiny size, that we needed a more precise and resilient material than paper, and optical disks are just that. The two biggest advantages of optical mediums is that the storage medium is decoupled from the reading hardware (ie, if your DVD reader fails, you can always buy another one to read your disk) and that it's based on laser, which makes it universal and future proof (ie, as long as you know how to make a laser, you can always tweak it to read the bits of an optical disk by emulation, just like CAMILEON did for the Domesday BBC Project).

Like any technology, new iterations not only offer bigger density (storage room), but also better error correction, and better resilient against environmental decay (not always, but generally true). The first debate about DVD reliability was between DVD-R and DVD+R, and even if DVD-R are still common nowadays, DVD+R are recognized to be more reliable and precise. There are now archival grade DVD discs, specifically made for cold storage, claiming that they can withstand a minimum of ~20 years without any maintenance:

Verbatim Gold Archival DVD-R [...] has been rated as the most reliable DVD-R in a thorough long-term stress test by the well regarded German c't magazine (c't 16/2008, pages 116-123) [...] achieving a minimum durability of 18 years and an average durability of 32 to 127 years (at 25C, 50% humidity). No other disc came anywhere close to these values, the second best DVD-R had a minimum durability of only 5 years.

From LinuxTech.net.

Furthermore, some companies specialized in very long term DVD archival and extensively market them, like the M-Disc from Millenniata or the DataTresorDisc, claiming that they can retain data for over 1000 years, and verified by some (non-independent) studies (from 2009) among less-scientific others.

This all seems very promising! Unluckily, there's not enough independent scientific studies to confirm these claims, and the few ones available are not so enthusiastic:

Humidity (80% RH) and temperature (80°C) accelerated ageing on several DVDs over 2000 hours (about 83 days) of test with regular checking of readability of data: Humidity and temperature accelerated ageing on several DVDs brands

Translated from the french institution for digital data archival (Archives de France), study from 2012.

The first graph show DVD with a slow degradation evolution. The second one DVD with rapid degradation curves. And the third one is for special "very long-term" DVDs like M-Disc and DataTresorDisc. As we can see, their performance does not quite fit the claims, being lower or on par with standard, non archival grade DVDs!

However, inorganic optical discs such as M-Disc and DataTresorDisc get one advantage: they are quite insensible to light degradation:

Accelerated ageing using light (750 W/m²) during 240 hours: Light accelerated ageing on several DVDs brands

These are great results, but an archival grade DVD such as the Verbatim Gold Archival also achieves the same performance, and furthermore, light is the most controllable parameter for an object: it's quite easy to put DVD in a closed box or closet, and thus removing any possible impact of light whatsoever. It would be much more useful to get a DVD that is very resilient to temperature and humidity than light.

This same research team also studied the Blu-ray market to see if there would be any brand with a good medium for long term cold storage. Here's their finding:

Humidity and temperature accelerated ageing on several Blu-ray brands, under the same parameters as for DVDs: temp-bd

Light accelerated ageing on several BluRays brands, same parameters: light-bd

Translated from this study of Archives de France, 2012.

Two summaries of all findings (in french) here and here.

In fine, the best Blu-ray disc (from Panasonic) performed similarly to the best archival grade DVD in humidity+temperature test, while being virtually insensible to light! And this Blu-ray disc isn't even archival grade. Furthermore, Blu-ray discs use an enhanced error correcting code than DVDs (themselves using an enhanced version relatively to CDs), which further minimizes the risks of losing data. Thus, it seems that some BluRay discs may be a very good choice for cold storage.

And indeed, some companies are starting to work on archival grade, high density storage Blu-ray discs like Panasonic and Sony, announcing that they will be able to offer 300 GB to 1TB of storage with an average life span of 50 years. Also, big companies are turning themselves towards optical mediums for cold storage (because it consumes a lot less resources since you can cold store them without any electrical supply), such as Facebook which developed a robotic system to use Blu-ray discs as "cold storage" for data their system rarely access.

Long Now archival initiative: There are other interesting leads such as the Rosetta Disc project by the Long Now museum, which is a project to write microscopically scaled pages of the Genesis in every languages on earth the Genesis got translated to. This is a great project, which is the first to offer a medium that allows to store 50 MB for really very long term cold storage (since it's written in carbon), and with future-proof access since you only need a magnifier to access the data (no weird format specifications nor technological hassle to handle such as the violet beam of the Blu-ray , just need a magnifier!). However, these are still manually made and thus estimated to cost about $20K, which is a bit too much for a personal archival scheme I guess.

Internet-based solutions: Yet another medium to cold store your data is over the net. However, cloud backup solutions are not a good fit, for the primary concern than the cloud hosting companies may not live as long as you would like to keep your data. Other reasons include the fact that it is horribly slow to backup (since it transfers via internet) and most providers require that the files also exist on your system to keep them online. For example, both CrashPlan and Backblaze will permanently delete files that are not at least seen once on your computer in the last 30 days, so if you want to upload backup data that you store only on external hard drives, you will have to plug your USB HDD at least once per month and sync with your cloud to reset the countdown. However, some cloud services offer to keep your files indefinitely (as long as you pay of course) without a countdown, such as SpiderOak. So be very careful of the conditions and usage of the cloud based backup solution you choose.

An alternative to cloud backup providers is to rent your own private server online, and if possible, choose one with automatic mirroring/backup of your data in case of hardware failure on their side (a few ones even guarantee you against data lost in their contracts, but of course it's more expensive). This is a great solution, first because you still own your data, and secondly because you won't have to manage the hardware's failures, this is the responsibility of your host. And if one day your host goes out of business, you can still get your data back (choose a serious host so that they don't shutdown over the night but notify you beforehand, maybe you can ask to put that onto the contract), and rehost elsewhere.

If you don't want to hassle of setting up your own private online server, and if you can afford it, Amazon offers a new data archiving service, called Glacier. The purpose is exactly to cold store your data for the long-term. It offers 11 9s of durability per year per archive which is the same as their other S3 offers, but at a much lower price. The catch is that the retrieval isn't free and can take anywhere from a few minutes (Standard retrieval from Glacier Archive) to 48 hours (Bulk retrieval from Glacier Deep Archive).

Shortcomings of cold storage: However, there is a big flaw in any cold storage medium: there's no integrity checking, because cold storage mediums CANNOT automatically check the integrity of the data (they can merely implement error correcting schemes to "heal" a bit of the damage after corruption happened, but it cannot be prevented nor automatically managed!) because, contrariwise to a computer, there's no processing unit to compute/journalize/check and correct the filesystem. Whereas with a computer and multiple storage units, you could automatically check the integrity of your archives and automatically mirror onto another unit if necessary if some corruption happened in an data archive (as long as you have multiple copies of the same archive).

Long-Term Archival

Even with the best currently available technologies, digital data can only be cold stored for a few decades (about 20 years). Thus, in the long run, you cannot just rely on cold storage: you need to setup a methodology for your data archiving process to ensure that your data can be retrieved in the future (even with technological changes), and that you minimize the risks of losing your data. In other words, you need to become the digital curator of your data, repairing corruptions when they happen and recreate new copies when needed.

There's no foolproof rules, but here are a few established curating strategies, and in particular a magical tool that will make your job easier:

  • Redundancy/replication principle: Redundancy is the only tool that can revert the effects of entropy, which is a principle based on information theory. To keep data, you need to duplicate this data. Error codes are exactly an automatic application of the redundancy principle. However, you also need to ensure that your data is redundant: multiple copies of the same data on different discs, multiple copies on different mediums (so that if one medium fails because of intrinsic problems, there's little chances that the others on different mediums would also fail at the same time), etc. In particular, you should always have at least 3 copies of your data, also called 3-modular redundancy in engineering, so that if your copies become corrupted, you can cast a simple majority vote to repair your files from your 3 copies. Always remember the sailor's compass advice:

It is useless to bring two compasses, because if one goes wrong, you can never know which one is correct, or if both are wrong. Always take one compass, or more than three.

  • Error correcting codes: this is the magical tool that will make your life easier and your data safer. Error correcting codes (ECCs) are a mathematical construct that will generate data that can be used to repair your data. This is more efficient, because ECCs can repair a lot more of your data using a lot less of the storage space than simple replication (ie, making multiple copies of your files), and they can even be used to check if your file has any corruption, and even locate where are those corruptions. In fact, this is exactly an application of the redundancy principle, but in a cleverer way than replication. This technique is extensively used in any long range communication nowadays, such as 4G, WiMax, and even NASA's space communications. Unluckily, although ECCs are omnipresent in telecommunications, they are not in file repair, maybe because it's a bit complex. However, some software are available, such as the well-known (but now old) PAR2, DVD Disaster (which offers to add error correction codes on optical disks) and pyFileFixity (which I develop in part to overcome PAR2 limitations and issues). There are also file systems that optionally implement Reed-Solomon such as ZFS for Linux or ReFS for Windows, which are technically a generalization of RAID5.

  • Check files integrity regularly: Hash your files, and check regularly (ie, once per year, but depends on the storage medium and environmental conditions). When you see that your files suffered of corruption, it's time to repair using the ECCs you generated if you have done so, and/or to make a new fresh copy of your data on a new storage medium. Checking data, repairing corruption and making new fresh copies is a very good curation cycle which will ensure that your data is safe. Checking is very important because your files copies can get silently corrupted, and if you then copy the copies that have been tampered, you will end up with totally corrupted files. This is even more important with cold storage mediums, such as optical disks, which CANNOT automatically check the integrity of the data (they already implement ECCs to heal a bit, but they cannot check nor create new fresh copies automatically, that's your job!). To monitor files changes, you can use the rfigc.py script of pyFileFixity or other UNIX tools such as md5deep. You can also check the health status of some storage mediums like hard drives using tools such as Hard Drive Sentinel or the open source smartmontools.

  • Store your archives mediums on different locations (with at least one copy outside of your house!) to avoid for real life catastrophic events like flood or fire. For example, one optical disc at your work, or a cloud-based backup can be a good idea to meed this requirement (even if cloud providers can be shut down at any moment, as long as you have other copies, you will be safe, the cloud providers will only serve as an offsite archive in case of emergency).

  • Store in specific containers with controlled environmental parameters: for optical mediums, store away from light and in a water-tight box to avoid humidity. For hard drives and sd cards, store in anti-magnetic sleeves to avoid residual electricity to tamper the drive. You can also store in air-tight and water-tight bag/box and store in a freezer: slow temperatures will slow entropy, and you can extend quite a lot the life duration of any storage medium like that (just make sure that water won't enter inside, else your medium will die quickly).

  • Use good quality hardware and check them beforehand (eg: when you buy a SD card, test the whole card with software such as HDD Scan to check that everything is alright before writing your data). This is particularly important for optical drives, because their quality can drastically change the quality of your burnt discs, as demonstrated by the Archives de France study (a bad DVD burner will produce DVDs that will last a lot less).

  • Choose carefully your file formats: not all files formats are resilient against corruption, some are even clearly weak. For example, .jpg images can be totally broken and unreadable by tampering only one or two bytes. Same for 7zip archives. This is ridiculous, so be careful about the file format of the files you archive. As a rule of thumb, simple clear text is the best, but if you need to compress, use non-solid zip and for images, use JPEG2 (not open-source yet...). More info and reviews of pro digital curators here, here, and here.

  • Store alongside your data archives every software and specifications that are needed to read the data. Remember that specifications change rapidly, and thus in the future your data may not be readable anymore, even if you can access the file. Thus, you should prefer open source formats and software, and store the program's source code along your data so that you can always adapt the program from source code to launch from a new OS or computer.

  • Lots of other methods and approaches are available here, here and in various parts of the Internet.

Conclusion

I advise to use what you can have, but always respect the redundancy principle (make 4 copies!), and always check regularly the integrity (so you need to pre-generate a database of MD5/SHA1 hashes beforehand), and create fresh new copies in case of corruption. If you do that, you can technically keep your data for as long as you want whatever your storage medium is. The time between each check depends on the reliability of your storage mediums: if it's a floppy disk, check every 2 months, if it's a Blu-ray HTL, check every 2/3 years.

Now in the optimal, I advise for cold storage to use Blu-ray HTL discs or archival grade DVD discs stored in water-tight opaque boxes and stored in a fresh place. In addition, you can use SD cards and cloud-based providers such as SpiderOak to store the redundant copies of your data, or even hard drives if it's more accessible to you.

Use lots of error correcting codes, they will save your day. Also you can make multiple copies of these ECCs files (but multiple copies of your data is more important than multiple copies of ECCs because ECCs files can repair themselves!).

These strategies can all be implemented using the set of tools I am developing (open source): pyFileFixity. This tool was in fact started by this discussion, after finding that there were no free tool to completely manage file fixity. Also, please refer to the project's readme and wiki for more info on file fixity and digital curation.

On a final note, I really do hope that more R&D will be put on this problem. This is a major issue for our current society, having more and more data digitized, but without any guarantee that this mass of information will survive more than a few years. That's quite depressing, and I really do think that this issue should be put a lot more on the front, so that this becomes a marketing point for constructors and companies to make storage devices that can last for future generations.

/EDIT: read below for a practical curation routine.

gaborous
  • 2,013
  • 10
    Outstanding answer! This needs far more upvotes. – bwDraco Feb 04 '15 at 01:24
  • 2
    You plan to add MORE information? Consider publishing it as a textbook. :-) – fixer1234 Mar 21 '15 at 18:56
  • 1
    @fixer1234 yes I plan to add more information and, more importantly, more pertinent and reliable information. There are a lot of misconceptions and falsely perceived secure solutions in the field of file fixity, so there's quite a lot to say. I have found so much info after publishing this post that an update is clearly needed, and I already compiled everything in my notes along with references. I'm not sure SuperUser is the best place where to publish all this data but I have no blog of my own :-/ I will try to be as concise as possible. – gaborous Apr 03 '15 at 20:27
  • Great answer, but part of your answer (at the very beginning) talks about storing data for a short term and you have identified this as "backup" and then the very long term storage as "archive". I was under the impression that backup means to store copies of data in the event of catastrophic data loss so as to be able to restore the original data. Storing data - even for a few weeks - and not having a duplicate could be considered "archiving". The lack of a duplicate means you haven't created a backup... – Kinnectus Oct 20 '15 at 09:35
  • @BigChris You're right, except that in both cases, you need to use redundancy, and good curation (aka maintenance) strategies. Backing up and archiving are indeed very similar, and often use the same tools, which is why they are often used interchangeably, but what differs is the goal: backup should contain dynamic non personal data, while archives should contain static personal data. Here I specifically address the issue of long-term archival, not just archival, but indeed archival is just storing static data. But in both cases, you need redundancy to ensure your data will be recoverable. – gaborous Oct 20 '15 at 14:49
  • I cannot say everything nor link to all the references I have because of the size limit in answers, but I'd like to add 3 things: be careful of HDD health indicators (SMART, temperature, activity level), they cannot be relied on, and also that with a good cutation scheme (check regularly, use redundancy), you can even reliably use floppy disks, which are notably known for being highly unreliable. And finally a good blog. – gaborous Oct 20 '15 at 16:25
  • I started a chat about this answer if you would like to check in occasionally. I'm reading and correcting typos. – user193661 Nov 06 '15 at 02:28
  • 3
    DVD+Rs are quite reliable if you don't get fakes. CD-Rs were affected by any light from infrared to violet (and infrared is everywhere, sometimes a lot of it), DVD+Rs are affected only by red or shorter, already more difficult. DVDs also have the sensitive layer in between two layers of plastic, CDs had the layer just below the pencil-writable surface!! BD-R disks are the best: you need violet or ultraviolet light in order to ruin them, and their surface is the strongest one. I would say go with BD-R for practical archival with high probability of success after 30 years. But you need a player. – FarO Nov 19 '15 at 17:28
  • 1
    @OlafM yes that's true, each new generation of optical disks bring more reliable technologies with them, not only in their material, but also in their technological setup (eg, the way pits/grooves are written and managed, the error correcting code, etc.), but also you should pay attention to the material the layers were made in, not all optical disks are equal, and usually (but not always), archival grade disks are made with more resilient materials. – gaborous Dec 21 '15 at 12:12
  • 1
    Humidity may be controlled cheaply by putting in one of those moisture-absorber packets with the discs. So now we need data for low-humidity, high-temperature conditions. Also, temperature cycling over each 24h may be a factor, because of the repeated expansion/contraction that accompanies it. So now we need accelerated ageing data with temperature cycling... – Evgeni Sergeev May 05 '16 at 00:39
  • New optical storage medium: "Superman memory crystal" disc by the team of Peter Kazansky (Optoelectronics Research Centre at University of Southampton) is said to be highly resilient to temperature changes (uo to 1,000 degrees celsius) and retain data for theoretically billion of years. We will have to wait for other labs to reproduce the results, but this can be very promising. – gaborous Jul 03 '16 at 18:21
  • It looks like you've hit the 30,000-character answer length limit. If you need to add more information, you can split off some of the content into another answer. Include a link to the other answer in each post to make navigation easier for readers. – bwDraco Aug 06 '16 at 20:00
  • @bwDraco Yes good idea, I could gather some more refs and put them in an extended answer. Also, I could describe the new scheme I use: 3 hard drives copies regularly checked + SpiderOak with infinite storage plan + BluRay discs for really really sensible data but not too big (I limit to 50 GB the data that I can store on these discs) + pyFileFixity and DVDisaster for folders I really want to ensure to keep in the long run. The most important thing for me was to prioritize the data: I assigned in four folders (garbage, personal, important, critical) and each has an additional degree of backup. – gaborous Aug 06 '16 at 21:58
  • 1
    @InancGumus 1- Duplicate your data on several different mediums (hard drives, optical discs, flash memories, printing), including off-site (eg, cloud storage like SpiderOak) 2- burn your most crucial data on blu-ray discs and store them in a hermetic box and placed in a dark and mid-temperature place 3- if you're tech-savvy, use Reed-Solomon error correction codes to protect your data. – gaborous May 04 '20 at 15:59
  • Technically, the universe doesn’t hate your data, it just hates order and for your data to be read, it must be, well, in a certain order! :) – Lloyd Sargent May 24 '22 at 20:46
  • @LloydSargent Data is ordered information, otherwise it just would be noise ;-) – gaborous Aug 05 '22 at 11:11
  • What is your opinion on multi-layered blurays (i.e. 50GB or 100GB discs) for backups? – 9a3eedi Jan 29 '24 at 10:23
  • @9a3eedi Same as what I wrote above about optical discs including DVD and single layered BluRays (disregard the non resilience of CD, this is a technological outlier because it was an early tech and they didn't master error correction at the time). Just note that like single layer BluRay some brands will manufacture higher quality and more robust items than others, I cannot say which without someone doing an artificial aging test. And as usual the same redundancy rule applies, one backup medium is never enough. – gaborous Jan 31 '24 at 08:12
20

Paper

Other than archival ink on archival paper in sealed storage, no current medium is proven to last an average 100 years without any sort of maintenance.

Archival Paper

Older papers were made from materials such as linen and hemp, and so are naturally alkaline. or acid free, therefore lasting hundreds of years. 20th century paper and most modern paper is usually made from wood pulp, which is often acidic and does not keep for long periods.

Archival Inks

These permanent, non-fading inks are resistant to light, heat and water, and contain no impurities that can affect the permanence of paper or photographic materials. Black Actinic Inks are chemically stable and feature an inorganic pigment that has no tendency to absorb impurities like other ink pigments can.

Redundant storage

Torvalds once said

Only wimps use tape backup: _real_ men just upload their important stuff on ftp, and let the rest of the world mirror it

Which suggests you should not rely on a single copy on a single medium.

Not magnetic media?

http://www.zdnet.com/blog/perlow/the-bell-tolls-for-your-magnetic-media/9364?tag=content;siu-container

  • Typical example of irretrievable degradation of magnetic media.
  • Issues of hardware and software (and data formats)

Not specialized systems

In 2002, there were great fears that the discs would become unreadable as computers capable of reading the format had become rare and drives capable of accessing the discs even rarer. Aside from the difficulty of emulating the original code, a major issue was that the still images had been stored on the laserdisc as single-frame analogue video,

http://en.wikipedia.org/wiki/BBC_Domesday_Project#Preservation

Long Term Personal storage

http://www.zdnet.com/blog/storage/long-term-personal-data-storage/376

  • both the media AND the format can become unreadable.
  • print on acid-free paper with pigment inks and store in a cool, dry and dark place.
  • The first problem is picking data formats for maximum longevity.
  • Avoid using proprietary formats
  • USCSF is transferring all their original tapes - many in now-obsolete formats like BetaSP and VHS - to the 75Mbit motionJPEG2000 format
bwDraco
  • 46,155
  • 1
  • Can you provide details about this? Will normal hard copies not last that long? (Photos from 100 years ago seems to be fine, AFAIK). 2) If no current data medium will last this long, I suggest that we use the closet solution possible. It's depressing that decades from now we won't be able to look through old boxes and expect to be able to look at any of our old, forgotten photos, etc.
  • – user606723 Jan 04 '12 at 18:05
  • @user606723: see updated answer – RedGrittyBrick Jan 04 '12 at 19:11
  • I've figured that laser printing on acid-free paper would be a good way to store data (a few megabytes per page) that has a high probability of being readable in 100-200 years. The software to read it would be relatively simple, and one presumes that scanners will always be available, so the format (so long as not too convoluted) would never really "go away" beyond the ability of a competent amateur to recover. – Daniel R Hicks Jan 29 '12 at 16:10