Chapter 1 - Introduction
Chapter 1 - Introduction
Introduction
This chapter provides a quick view of why digital preservation is important and why it is difcult to do. There is a need to be able to preserve the understandability and usability of the information encoded in digital objects. Because of this focus on information we shall often refer to digitally encoded information where we wish to stress the information aspects. However the basic techniques of digital preservation, discussed in many books on this subject, for example [712], focus, by analogy with traditional paper-based libraries and archives, on preserving the media or bit sequences and preserving the ability to render documents and images. This book addresses what might be termed the more advanced issues of digital preservation, beyond keeping the bits and the ability to render, bringing into play concepts of understandability, usability, knowledge and interoperability. In addition it is recognised that there are rights associated with digital objects; there is concern about how one can judge the authenticity of digital objects; there is uncertainty about how digital objects may be identied and located in the future. In responding to each of these concerns the likelihood is that additional digital artefacts will be created (such as the specication of the digital rights) which themselves need to be preserved so they can be used in future when they are needed! Thus we argue that one must be able to preserve many additional different digital objects if one wishes to really preserve any particular single digital object. Part I of this book provides the theoretical basis for preservation. In Part II we provide evidence from a variety of sources using many types of data from many disciplines, and show many tools which provide reasonable implementations of the techniques described here. These examples and much of the work described in this book are derived from the CASPAR project [2]. Part III addresses the important questions of how to keep costs under control and how to make sure money is not wasted by preparing ones archive for independent evaluation.
D. Giaretta, Advanced Digital Preservation, DOI 10.1007/978-3-642-16809-3_1, C Springer-Verlag Berlin Heidelberg 2011
Introduction
1.1
Many will already have had the experience of nding a digital object (lets say for simplicity that this is a le) for which one no longer remembers the details or for which one no longer has the software one used to use. In the case of images or documents there is, at the moment, a reasonable chance of nding some way of viewing them, and that may be perfectly adequate, although one might for example want to know who the people in a photograph are, or what language the book is written in and what the words mean. This would be equivalent to storing a book or photograph on a shelf and then picking it up after many years and still being able to view the symbols or images on the page as before, although the reader may not be able to understand the meaning of those symbols. On the other hand many may also have had the experience of nding a spreadsheet, still being able to view all the numbers, text and formulae, and yet be unable to remember what the various formulae, cells and columns mean. Thus despite knowing the format of the le and having the appropriate software, the information is essentially lost! Looking yet further aeld, consider the digital record of a cultural heritage site such as the Taj Mahal measured 10 years ago. In order to know whether or not visitors have damaged this heritage site one would need to compare those measurements with current day measurements which may have been captured with different instruments or stored in a different way. Thus one needs to be able to combine data of various types in order to get an answer. Based on the comparison one may decide that urgent remedial work is needed and that site visitor numbers should be restricted. However before expending valuable resources on this there must be condence that the old data has not been altered, and that it is indeed what it is claimed to be. Other complications may arise. For example many digital objects cannot, or at least should not, be freely distributed. Even photographs taken for some purpose which has some passers-by in the scene perhaps should not be used without the permission of those passers-by but that may depend upon the different legal systems of the country in which the photograph was taken, the country where the photograph is held and the country in which it is being distributed. As time passes, legal systems change. Is it possible to determine the legal position easily? Thinking about another everyday problem many Web links no longer work. This will probably get worse over time, yet Web links are often used as an intrinsic part of virtual collections of things. How will we cope with being unable to locate what we need, after even a quite short time? Of course we may deposit our valuable digital objects in what we consider a safe place. But how do we know that it is indeed safe and can counter the threats noted above. Indeed what happens when that organisation which provides the safe place loses its funding or is taken-over and changes its name and function, or simply goes out of business? As a case in point the domain name casparpreserves.eu, within which the CASPAR web site belongs, is owned by the editor of this book; what will happen to that domain name in 50 years time when the DNS registration charge is no longer being paid?
Introduction
Increasingly one nds research papers, for example in on-line journals, which have links to the data on which the research is based. In such a case some or all of the above issues may threaten their survival. Another peculiarity of digital data is that it is easy to copy and to change. Therefore how can one know whether any digital object we have is what it is claimed to be how can we trust it? A related question is how was a particular digital object made? A digital object is produced by some process usually some computer application with certain inputs. In fact it could have been the product of a multitude of processes, each with a multitude of input data. How can we tell what these processes and inputs were, and whether these processes and inputs were what we believe them to have been? Or alternatively perhaps we want to produce something similar, but using a slightly changed process for example because the calibration of an instrument has changed how can this be done? To answer these kinds of questions for a physical object, such as a velum parchment or a painting one can do physical tests to give information about age, chemical composition or surface contaminants. While none of these provide conclusive answers to the questions, because one needs documentation about, for example, the chain of ownership of paintings, but at least such physical measurements provide a reality check. None of these techniques are available for digital objects. Of course these techniques can be applied to the physical carriers of the bits, but those bits can usually be changed without detection. One can think of technologies for example the carvings on stone where it might be easy to detect changes in the bits by changes in the physical medium, but even if no changes are detected one is still not certain about whether it is what is claimed.. One often hears or reads that the solution to all these issues is metadata i.e. data about data. There is some truth in that but one needs to ask some pertinent questions not the least of which are what types of metadata? and how much metadata? For example it is clear that by metadata many people simple refer to ways of classifying or nding something which is not enough for preservation. Without being able to answer these questions one might as well simply say we need extra stuff. Much of the rest of this book is about the multitude types of metadata that is needed. We will put the word in italics and quotes when we use it metadata to remind the reader to be careful to think about what the word means in its particular context. Much of the rest of this book aims at answering those two questions, namely: what types of metadata are needed? and how much of each of those types of metadata is needed?
1.3
Summary
The answers will be based on the approach provided by the Open Archival Information System (OAIS) Reference Model, also known as ISO 14721 [1], an international standard which is used as the basis of much, perhaps most, of the work in this area. Indeed it has been said [14] that it is now adopted as the de facto standard for building digital archives. This book aims to lead the reader into the more advanced topics which need to be addressed to nd solutions to the threats to our digital belongings, and more.
1.2 Terminology
Throughout this book we use the term archive to mean, following OAIS [1], the organization, consisting of people and systems, responsible for digital preservation (this is not the full OAIS denition more on this later). Occasionally the term repository or phrase digital repository is used to convey the same concept where it ts with other usage.
1.3 Summary
Our society increasingly depends upon our continuing ability to access, understand and use digitally encoded information. This chapter should have provided the 10,000 ft view of the issues which the rest of the book aims to map out the solutions for in detail.