Data Management
CC Image courtesy of Erica Marshall of
muddyboots.org on Flickr
Protecting Your Data: Backups, Archives, and Data
Preservation
Backups, Archives, and Data Preservation
Protection vs. Backups vs. Archiving vs.
Preservation
Why is this important?
Backups: Things to Consider
Data Preservation
Best Practices
Backups, Archives, and Data Preservation
CC Image courtesy of TonyHall on Flickr
Lesson Topics
Learning Objectives
After completing this lesson, the participant will
be able to:
o
o
o
o
data
Identify the issues related to data backups
Identify the reasons why backup plans are important
and how they can fit into larger backup procedures
Discuss what data preservation covers
List several best practices
Backups, Archives, and Data Preservation
CC Image courtesy of paul.klintworth on Flickr
o Define the differences between backups and archiving
Data Protection, Backups, Archiving, Preservation
Are They the Same Thing? Not Quite
Data Protection
o Includes topics such as backups, archives, and
preservation; also includes physical security,
encryption, and others not addressed here
More information about these topics can be found in the
References section
Terms backups and archives are often
used interchangeably, but do have different
meanings
o Backups: a copy (or copies) of the original file is made
before the original is overwritten
o Archives: preservation of the file
Data Preservation
o Includes archiving in addition to processes such as data
rescue, data reformatting, data conversion, metadata
Backups, Archives, and Data Preservation
Backups vs. Archiving
Backups
o Used to take periodic snapshots of data in case the
current version is destroyed or lost
o Backups are copies of files stored for short or near-longterm
o Often performed on a somewhat frequent schedule
Archiving
o Used to preserve data for historical reference or
potentially during disasters
o Archives are usually the final version, stored for longterm, and generally not copied over
o Often performed at the end of a project or during major
milestones
It is a good idea to have multiple copies of your
backups
archives,
in case one copy fails.
Backups,
Archives, and
and Data
Preservation
Why Perform Backups?
o Accidental deletions
o Fires, natural disasters
o Software bugs, hardware failures
Reproduce results of past
procedures (if they were based
on older files)
Respond to data requests
Limit liability
Backups, Archives, and Data Preservation
CC Image courtesy of Brian J Matis on Flickr
Limit or negate loss of data, some of which may
not be reproducible
Save time, money, productivity
Help prepare for disasters
Backups: Things to Consider
Are there existing policies that might affect how
and when you do data backups?
o May be separate project, office, department, funding
source, or organizational polices
May differ between groups; which has precedence?
o Are backups already part of a larger data management
or contingency plan for your group?
Who is responsible for performing backups?
o Users? System administrators? Both?
Do these various policies fit your needs?
Backups, Archives, and Data Preservation
Considerations
How often should you do backups?
o Continually? Daily? Weekly? Monthly?
o Cost vs. benefit
What kind of backups should you perform?
o Partial: backing up only those files that have changed
since the last backup
o Full: backing-up all files
o How often and what kind will depend upon what kind of
data you have and how important it is
What about non-digital files (such as papers)?
o Consider digitizing files
Backups, Archives, and Data Preservation
Considerations
Where will you backup your files?
o May depend upon project requirements, etc.
o Personal external disk, centralized computer storage (
Dropbox), data repositories (GEON, NEON, GCMD, KNB,
etc.), cloud storage (Amazon, Google)
CDs and DVDs, while cheap and convenient, are not good
media for backups
o What metadata is needed when using these systems?
o Good practice to keep backups in different location than
source data
If a disaster strikes, it can destroy both versions of data
Backups, Archives, and Data Preservation
Considerations
How are backups carried out?
o Manually may work for single files, but requires that the
user remembers to perform regular backups and can be
time-consuming
o Automatic backups can be set to run on a set schedule
that doesnt require the user to remember
What do I do if I need to get a file off of
backups?
o You should know how to obtain files from backups ,
where they are located, and who to contact
o Are the files backed up individually or as one large file?
o You need to know this information beforehand, as often
you need a file off of a backup in an emergency!
Backups, Archives, and Data Preservation
Considerations
How do you verify a backup has been
successfully performed?
o Most backup software will have a log file that contains
details of the backup (which files, when the backup was
created)
Good starting place
o However, dont rely solely on the log file
Even if a log file states the backup was successful, you still
need to check the backup to make sure the files are there
and accessible
Can you pull a file off of a backup and restore it to another
location?
Hardware and software failures can happen after backups
and log files are made
Your system might be backing up the wrong files
Backups, Archives, and Data Preservation
Considerations
How do you verify a backup has been
successfully performed? (Cont.)
o Since manual checks of all of the files in your backup is
probably not possible, you should utilize other methods
such as checking file sizes, date stamps, checksum
values.
Checksum are mathematical calculations based upon a
specific file. If the calculated checksums match between
the backup copy and the original file, chances are the file is
the same and was not modified when copied or stored.
Backups, Archives, and Data Preservation
Considerations
Are there backups of the backups?
o Necessary for high-value data
o Usually different copies of backups are kept in different
locations
How long do you keep your backups?
o Depends upon specific situation, but should be at least
weeks or months.
What happens to the backups after the project
is no longer funded, project ends, or staff
departs?
o Long-term storage solutions? Will data be archived
elsewhere?
Backups, Archives, and Data Preservation
Data in Real Life
A design firm was handling their own backups.
Images courtesy of Heather Henkel
The system was working fine and the backup
software was reporting that the data was
successfully backed up.
Backups, Archives, and Data Preservation
Data in Real Life
The administrator checked
CC Image courtesy of angielauw on Flickr
the
backups immediately after
they were done and
confirmed they were good.
Backups, Archives, and Data Preservation
Data in Real Life
After a computer virus erased most of their
files, they went back to their backups.
Unfortunately they found that the backups were
all blank and all of the data was gone. Only
after some investigation did they discover that
the computer tapes (which contained the
backups) were placed against a wall that had an
elevator on the other side of it. When the
elevator went past, the magnets inside erased
all of the tapes.
Had they checked their backups properly, they
probably would have noticed this before there
was an emergency
Backups, Archives, and Data Preservation
Final Considerations
Can you read data off of older backups?
o Media changes and you may no longer be able to read
older versions and formats such as floppy disks, Jazz
and Zip drives, Wordperfect files, etc.
o Media can degrade quickly, unexpectedly,
inconsistently
Even if you can open a file today, that doesnt mean you
can in a month from now
How will outdated data be disposed of?
o Will it be copied over, deleted, archived?
o What if it contains sensitive information?
Remember: only backup the data you
cant afford to lose!
Backups, Archives, and Data Preservation
Data Preservation
Includes backups and archiving in addition to
processes such as data conversion, data
reformatting, and data rescue
o Older files may no longer be in a usable format and may
require rescuing before the data can be used.
o Data rescue becomes even more important as projects
finish up and/or are no longer funded. Data may have
been preserved at the end of the project, but if no one is
managing the data, data may be left in formats that are
no longer usable or in locations that are no longer
accessible.
Backups, Archives, and Data Preservation
Considerations
Data Conversions and Formats
o Use non-proprietary, standard formats
o Convert text files from .doc or .xls to .txt, image files
to .tiff or .pdf
o Be sure to check files after converting them, as data,
metadata, and formatting loss can occur
Versioning
o Use consecutive numbers and letters to help keep track
of changes to a file throughout various edits and
revisions. This will help you quickly differentiate
between files with similar names.
File Naming
o Use file names that are consistent, descriptive, and
concise so that you can find and quickly identify the file
the file at a later time.
o Rename
that Preservation
have a default file name when
Backups,
Archives, files
and Data
Data Preservation
By managing and preserving your data well, data
rescue may not be necessary. Why?
o With proper file naming (can help the file from getting
lost in the system), utilization of proper file formats (lets
you open the file without having to convert the file),
backups (limits loss of files), and media types (limits
degradation of files), you may limit or prevent the need
for data rescue.
A good data management plan is another tool to
help limit the need for data rescue.
Backups, Archives, and Data Preservation
Best Practices
Create a backup policy that clearly identifies:
o roles,
o responsibilities,
o where the data is backed up,
o how often the files are backed up,
o how to access the files,
o recommended file formats to be used, and
o policies for migrating data to assure data are not lost
due to media degradation or changing formats or
programs
Review your backup policy and plan periodically
to ensure it is still valid and applicable
o Update contacts, if appropriate
Backups, Archives, and Data Preservation
Best Practices
Minimize or remove reliance on users to perform
own manual backups (if possible)
o Implement standardized and automatic backups
o If possible, put experts in charge of this task (computer
staff) as they are more likely to keep up-to-date
regarding software updates, hardware issues, best
practices, etc.
Dont assume backups are being performed for
you
o You dont want to find out after the fact that no backups
have been performed
o If you are using third-party software (like Yahoo or
Google Mail), what happens if they lose your files?
Use non-proprietary, standard formats
o Convert text files from .doc or .xls to .txt, image files
Backups, to
Archives,
.tiff, orand
.pdfData Preservation
Best Practices
Check your backups manually
o Start with log files, as they may tell you the backup was
unsuccessful
o Do not rely solely on the log files they may be
incorrect or the data may have become corrupted after
the file was transferred
o Look at file dates and file sizes to see if they match;
calculate a checksum on the original and archived file
and make sure they match
o Ensure you can read files off of older backups and
archives.
Have multiple versions of backups on multiple
formats in multiple places
Good data management will limit the amount of
data rescue that needs to be done to older data
Backups, Archives, and Data Preservation
Data In Real Life
CC Image courtesy of Sybren A. Stvel
In 2011, a software bug caused
some Gmail users to lose access
to their email. Fortunately,
Google had backups!
Backups, Archives, and Data Preservation
Summary
Backups refer to creating copies of original files
while archives involve the preservation of files
There are many reasons we need to perform
backups but primarily to prevent data loss
One needs to consider how often to perform
backups, where to backup, and accessibility to
backups when you need them and how long to
keep the files
Check for backups on outdated media and test
backups often!
Data preservation more than just backing up
and archiving your files
Backups, Archives, and Data Preservation
References
Backup, wikipedia.org, http://en.wikipedia.org/wiki/Backup
, (accessed 3/16/2011)
2. Georgia Tech Library, NSF Data Management Plans
Research Data Management (Georgia Tech Library and
Information Center),
http://libguides.gatech.edu/content.php?pid=123776&si
d=1514980
(accessed 3/16/2011)
3. Albanesius, Chloe, Google: Storage software update led to
e-mail bug,
http://www.pcmag.com/article2/0,2817,2381168,00.asp
(accessed 11/18/2011)
4. Van den Eynden, Veerle, Corti, Louise, Woollard, Matthew,
Bishop, Libby and Horton, Laurence, Managing and
Sharing Data, http://
www.data-archive.ac.uk/media/2894/managingsharing.pdf
(accessed 4/25/12)
1.
Backups, Archives, and Data Preservation
The full slide deck may be downloaded from:
http://www.dataone.org/education-modules
Suggested citation:
DataONE Education Module: Data Protection
Backups. DataONE. Retrieved Nov12, 2012. From
http://www.dataone.org/sites/all/documents/L06_Da
taProtectionBackups.pptx
Copyright license information:
No rights reserved; you may enhance
and reuse for your own purposes. We do
ask that you provide appropriate citation
and attribution to DataONE.
Backups, Archives, and Data Preservation