The Future of our Present Past

It can be wise for the family historian to think a little about the future of the past they are reconstructing in the present. You can read the letters that were once in a box in grandma’s attic. You can look at the photographs found in your uncle’s desk drawer. Those things could be a century old, or even older and we would know what they were and be able to understand them.

A century or more from now and a curious descendant might be shown a box of things once found in your office drawer. What will they see? Diskettes, homemade CDs, a couple of flash memory cards, a USB hard drive and a sticky note that reads “genealogy and family photos.” How much would be understood? Probably just the sticky note.

The Medium Holds the Message

In earlier times, one of the few important thing about how information was stored was how long the physical medium would last. Clay tablets, carved stone, papyrus scrolls and books printed on rag paper can all last for many centuries. Where do we have our data? CDs and DVDs that we burn ourselves might last a decade if we take good care of them. Magnetic disks can fail mechanically or simply slowly demagnetize and become unreadable. Flash memory can also slowly lose the electrical charges that hold its information. Though firm numbers are hard to come-by, our normal day-to-day ways of storing digital data all seem to become questionable for storage times of more than a decade.

Hello, Is there Data in There?

Not so many years ago, at least on a genealogical timescale, I had a zip drive and zip disks. Remember those? Remember when 100 MB on a removable disk was huge? How may people in a century will know what they are? How many today know? Aging is no longer just for individual examples of a physical medium, it is something that happens to whole types of physical media. Types of physical media don’t stay in use as long as they once did.

The aging of types of media is different from what it once was. Even today, it does not take much to realize that a clay tablet with funny marks on it was used to store data. You can see the data. Though disks have been used to store data for more than a century, if one counts sound recordings, it isn’t obvious that a disk holds data.

In many ways putting machines between ourselves and our data is a good thing. I’d have to stop and think about how many terabytes of data capacity I have in my office. It is certainly much more than the data capacity of my bookshelves. In other ways, it is problematic. Writing has to accommodate normal human vision. It is stable. The format might make significant changes over centuries or millennia but one can still see it. Once we need a machine to read the data we have two problems. We might not realize that the object holds data and we might not have access to any machine that can read it. I still have machines that can play analog sound recordings on disks. If I found a collection of cylinder recordings, I would have to look for a machine. We know what a USB connector is but will our descendants? When it gets to the point that USB connectors disappear from computers, what will our descendants do with a disk drive with a USB connector that might have lost its data decades earlier? Will they struggle to find a machine that can attempt to read it?

Format

This is that factor in the future survival of our data that got me thinking. Formats once lasted a good long time. It took many, many centuries before the go-to format for information in the Western world stopped being Latin. Even today, the high school down the road teaches Latin. I won’t ask for a quick show of hands for how many people think that one hundred years from now our schools will be teaching four years of classes in docx? Introduction to Postscript anyone?

Even if our physical media survive and are of a type that can still be read, what if the data itself is formatted in a way that makes it unintelligible? Already many people have had the experience of suddenly no longer being able to read an old file. It may have moved to the new computer but there is no longer a program that can read it. If the problem is caught early, there is often a list of hoops to jump through that though painful, is at least possible. A century from now those hoops may not be realistic or even known. I don’t expect the database files I have today to even make sense ten years down the road. Instead, when new software is released, I will need to update all those files to keep them useable.

What to Do?

These days data requires a custodian. Many of us have taken on or been given the role of “family archivist.” We are the ones with the old photographs, documents and personal items. More and more, a family needs a data custodian as well. If left to itself, data will become unusable. It might just be practically unusable or it might actually disappear beyond recovery.

First, physical media need both preservation and maintenance. Having multiple copies of files and keeping the storage medium up to date by copying the data to new types of media is important. If the data has no custodian for a while, it is best if it was on up-to-date media when it was “abandoned.” If it had been neglected for years before that, there is much less hope.

Finally, I’m getting to the inspiration for this post—thinking about the data’s format. I just read an article in the October 2014 edition of Mac Life which discusses formats that are likely to be better for long term storage than others. It is an article that was well worth the few minutes it takes to read. The basic recommendation is to store files in formats that are open. That is, the structure of the format is published and freely available or to formats that are so ubiquitous and stable that it would be hard to image something else outliving them. Open on the one hand and ubiquitous and stable on the other aren’t always the same thing. JPEG is likely to be around for a long, long time but it isn’t an open standard. The same is true of PDF.

Some formats are highly specific. There is no better way to store a digital image than RAW if all you care about is having the maximum amount of the image data preserved for working with the images in the near future but I can’t think of a worse way to store images if what you care about is being able to see those images in coming decades. RAW is what it sounds like, the raw data produced by a digital camera. Every make of camera is different so RAW isn’t so much a format as a class of formats—none of them particularly wide spread and all of them proprietary.

Word processing formats come and go, and are generally proprietary. Any modern word processing program should be able to save to plain text or RTF (Rich Text Format), both of which are open and not going to suddenly change or disappear.

Genealogy database programs can generally output to gedcom, which is not fun to read but a human can make sense out of it as plain text. Reports can be generated and saved as RTF. You might also be able to export your database to something like a series of interlinked html files that preserve the structure of the database. HTML is likely to last far longer than any genealogy database format.

So take care of how you physically preserve your data and think about the format that you are preserving. Last but no least, there might be things that you consider printing on acid-free paper or photo stock. Those things will obviously be data as long as they last.

 

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top