10 things you should know about long-term data archiving
By Brien Posey July 15, 2010
Today, almost every organization archives at least some of its data. Some do so to comply with federal
regulations, while others use archiving to facilitate their internal business requirements. Regardless of an
organization's reason for archiving data, the process can be tricky. Unlike a typical backup, archives must be able
to stand the test of time. Given the rapid pace at which IT evolves, longevity can be a tall order. This following
list of considerations will help you improve the long-term usefulness of your archives.
1: Storage medium
The first thing to take into account is the storage medium you use for your archives. Since they will be stored for
long periods of time, you must choose a type of media that will last as long as your retention policy dictates.
Tapes tend to become demagnetized over time, which can lead to data loss. As a result, tapes are rated
according to their durability. A good quality tape should last for 10 years or more. In contrast, optical storage
media will last indefinitely.
2: Storage devices
Another major consideration is whether the storage device you are using for your archives will be accessible in a
few years. For example, 15 years ago, I stored my archives on Zip disks. They were a good choice at the time
because they were relatively inexpensive and you could fit a whopping 100 MB of data on a single disk. Today,
though, Zip disks are pretty much extinct. I still have my old Zip drive, but it connects to a PC via a parallel port.
Like the Zip drives themselves, parallel ports are also extinct, so I can't read the data from the Zip disks.
Unfortunately, there's no way to predict which types of storage devices will stand the test of time. Even so, it is
important to try to pick those that have the best chance of being supported over the long term.
3: Revisiting old archives
On a similar note, your archive policies as well as the storage mechanisms you use for archiving data will
undoubtedly change over time. So be sure you review your archives at least once a year to see if anything needs
to be migrated to a different storage medium.
For example, about 10 years ago, I realized that Zip drives were becoming extinct, so I transferred all of my
archives to CD. Today, I store most of my archives on DVD, but because modern DVD drives will also read CDs, I
haven't needed to move my extremely old archives off CD and onto DVD.
4: Data usability
One major problem I have seen in the real world is archived data that's in an obsolete format. For example, a
few years ago I helped someone restore some document files that had been archived in the early 1990s.
Although I was able to recover the data relatively easily, the documents were created by an application called
PFS Write. The PFS Write file format was widely supported in the late 80s and early 90s, but today, there aren't
any applications around that can read the files.
To avoid situations like this, you might find it helpful to archive not only data, but also copies of the installation
media for the applications that created the data. If you use this approach, don't forget to also archive copies of
any necessary license keys.
Page 1
Copyright © 2010 CNET Networks, Inc., a CBS Company. All rights reserved. TechRepublic is a registered trademark of CNET Networks, Inc
For more downloads and a free TechRepublic membership, please visit http://techrepublic.com.com/2001-6240-0.html
10 things you should know about long-term data archiving
5: Redundancy
When data is ready to be moved to the archives, many organizations simply write the data to tape and then
store the tape some place safe. The problem is that the tape is often the only copy of the archived data.
I once did some work for an organization whose standard practice was to write its archives to tape and store the
tapes in a fireproof vault. The vault was of good quality, and the tapes actually survived a flood even though the
vault was submerged for a few days. A couple of years later, the organization needed to restore something off
one of the archive tapes, only to find that the tape was bad. My point is that even the most elaborate systems
for protecting tapes will do nothing to guard against something as simple as a defective tape. Your only defense
against this type of situation is data redundancy.
6: Selective archiving
Consider what should be archived. Sure, you want to archive your data -- but not all data is equally important.
For example, you will probably want to archive your financial records indefinitely, but is it really necessary to
preserve your telephone call logs for all eternity? Determine what types of data are present in your organization
and the useful lifespan for each data type. Then, design your archival policy around it.
7: Retrieval method
As you design your archival system, remember that over time, the archives will probably grow to a monolithic
size. So you need an efficient way of retrieving data from the archives should the need arise. It might be simple
to dump your archive data to tape, for example, but how well are your tapes indexed? If you aren't sure, ask
yourself how much work would be involved in locating and retrieving a file that was archived three years ago. If
you don't even know where to begin, it's time to consider a different method for archiving your data. Many
commercial archival products provide a Web interface that simplifies the task of searching the archives for data.
8: Space considerations
Because your archives can become huge, you must plan for the long-term retention of all of that data. If you are
archiving your data to removable media, capacity planning might be as simple as making sure there is enough
free space in the vault to hold all of those tapes, and making sure that there is room in your IT budget to
continue purchasing tapes. If you archive data to a network server, the capacity planning process will likely be
much more important because of the limited amount of data that can be stored online.
9: Restoring to an isolated environment
As you develop your archive policy, you should stipulate how the data should be restored. My advice is to
restore the data to an isolated environment whenever possible. I once saw a Fortune 500 company accidentally
introduce a virus onto their file servers because they restored some infected archive files.
10: Online vs. offline storage
Finally, decide whether to store your archives online (on a dedicated archive server) or offline (on removable
media). Storing data online keeps it accessible. But the sheer volume of the archived data may make online
retention impractical. Furthermore, data that is stored online may be vulnerable to theft, tampering, corruption,
etc. Offline storage enables you to store an unlimited amount of data. However, the data is not readily
accessible, and it may prove to be difficult to restore the data should the need arise years from now.
Page 2
Copyright © 2010 CNET Networks, Inc., a CBS Company. All rights reserved. TechRepublic is a registered trademark of CNET Networks, Inc
For more downloads and a free TechRepublic membership, please visit http://techrepublic.com.com/2001-6240-0.html
10 things you should know about long-term data archiving
Page 3
Copyright © 2010 CNET Networks, Inc., a CBS Company. All rights reserved. TechRepublic is a registered trademark of CNET Networks, Inc
For more downloads and a free TechRepublic membership, please visit http://techrepublic.com.com/2001-6240-0.html