0% found this document useful (0 votes)
30 views

Chapter 1 - Introduction

This chapter introduces some of the key challenges of digital preservation including: 1) Digital objects can become unusable over time if the file format or software needed to access them is no longer supported. Additional context is often needed to understand things like spreadsheets or scientific data. 2) Digital objects face threats like file corruption, obsolete file formats, broken web links, and unclear intellectual property restrictions over long periods of time. 3) It can be difficult to verify the authenticity, provenance, and trustworthiness of digital objects due to the ease of copying and modifying digital files without leaving traces. Ensuring the preservation of related documentation is also important for understanding and using digital objects in the future.
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views

Chapter 1 - Introduction

This chapter introduces some of the key challenges of digital preservation including: 1) Digital objects can become unusable over time if the file format or software needed to access them is no longer supported. Additional context is often needed to understand things like spreadsheets or scientific data. 2) Digital objects face threats like file corruption, obsolete file formats, broken web links, and unclear intellectual property restrictions over long periods of time. 3) It can be difficult to verify the authenticity, provenance, and trustworthiness of digital objects due to the ease of copying and modifying digital files without leaving traces. Ensuring the preservation of related documentation is also important for understanding and using digital objects in the future.
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Chapter 1

Introduction

This chapter provides a quick view of why digital preservation is important and why it is difcult to do. There is a need to be able to preserve the understandability and usability of the information encoded in digital objects. Because of this focus on information we shall often refer to digitally encoded information where we wish to stress the information aspects. However the basic techniques of digital preservation, discussed in many books on this subject, for example [712], focus, by analogy with traditional paper-based libraries and archives, on preserving the media or bit sequences and preserving the ability to render documents and images. This book addresses what might be termed the more advanced issues of digital preservation, beyond keeping the bits and the ability to render, bringing into play concepts of understandability, usability, knowledge and interoperability. In addition it is recognised that there are rights associated with digital objects; there is concern about how one can judge the authenticity of digital objects; there is uncertainty about how digital objects may be identied and located in the future. In responding to each of these concerns the likelihood is that additional digital artefacts will be created (such as the specication of the digital rights) which themselves need to be preserved so they can be used in future when they are needed! Thus we argue that one must be able to preserve many additional different digital objects if one wishes to really preserve any particular single digital object. Part I of this book provides the theoretical basis for preservation. In Part II we provide evidence from a variety of sources using many types of data from many disciplines, and show many tools which provide reasonable implementations of the techniques described here. These examples and much of the work described in this book are derived from the CASPAR project [2]. Part III addresses the important questions of how to keep costs under control and how to make sure money is not wasted by preparing ones archive for independent evaluation.

D. Giaretta, Advanced Digital Preservation, DOI 10.1007/978-3-642-16809-3_1, C Springer-Verlag Berlin Heidelberg 2011

Introduction

1.1 Whats So Special About Digital Things?


One might say that digital objects give rise to special concerns because the 1 s and 0 s which make up binary things are difcult to see. Hold on! you might say, in the case of CD-ROMs one can, with a microscope, see the pits in the surface. Well that may be true, but those little pits on the disk are not the bits. To get to the bits one needs to unravel the various levels of bit-stufng, the error correction codes and logical addressing. These things are handled by the electronics of the CD-ROM reader or of the computer hard disk, where one would be looking at magnetic domains rather than pits, and they expose a relatively simple electronic interface that talks to the rest of the computer systems in terms of bits. Such electronic interfaces illustrate a type of virtualisation which is widely used to allow equipment from many manufacturers to be used in computers. However the underlying technology of such disks changes relatively quickly and so do the interfaces, as a result one cannot usually use an old type of disk in a new computer. This applies both to the well known example of oppy disks, CD-ROMs and to internal spinning hard disks. Alright you may say, I know a better, simpler way, which has been proven to hold information for hundreds of years. How about simply writing my 1 s and 0 s on paper? Of course we could use the right acid-free paper! Or if one wanted something for thousands of years we could take a leaf from the Ancient Egyptians and carve the 1 s and 0 s on stone. Or to bring that up to date I know that people in the nuclear industry are trying out writing very tiny characters on Silicon Carbide sheets. [13]. Those techniques would get around some problems, although one might only want to use them for really, really, really important digital objects since they sound as if they could be very expensive. Therefore they are not solutions for the family photographs although they may be very good for simple text documents (although in that case one might as well simply print the characters out rather than the 1 s and 0 s). However there are some more fundamental problems with these approaches. For example they are not even the solution for things like spreadsheet where one needs to know what the columns and cells mean. Similarly scientic data, as we will see, needs a great deal of additional information in order to be usable.

1.1.1 Threats to Digital Objects of Importance to You


Take a moment to think about the digital objects which affect your life. These days at home we have family photographs and videos, letters, emails, bank records, software licences, identity certicates, spreadsheets of budgets and plans, encrypted private data and also zip les containing some or all of these things. One might have more complex things such as Word documents with linked-in spreadsheets or databases. Widening the picture now to ones work and leisure the list might include games, architectural plans, home nances, engineering designs, and scientic data from many sources, models and analysis results.

1.1

Whats So Special About Digital Things?

Many will already have had the experience of nding a digital object (lets say for simplicity that this is a le) for which one no longer remembers the details or for which one no longer has the software one used to use. In the case of images or documents there is, at the moment, a reasonable chance of nding some way of viewing them, and that may be perfectly adequate, although one might for example want to know who the people in a photograph are, or what language the book is written in and what the words mean. This would be equivalent to storing a book or photograph on a shelf and then picking it up after many years and still being able to view the symbols or images on the page as before, although the reader may not be able to understand the meaning of those symbols. On the other hand many may also have had the experience of nding a spreadsheet, still being able to view all the numbers, text and formulae, and yet be unable to remember what the various formulae, cells and columns mean. Thus despite knowing the format of the le and having the appropriate software, the information is essentially lost! Looking yet further aeld, consider the digital record of a cultural heritage site such as the Taj Mahal measured 10 years ago. In order to know whether or not visitors have damaged this heritage site one would need to compare those measurements with current day measurements which may have been captured with different instruments or stored in a different way. Thus one needs to be able to combine data of various types in order to get an answer. Based on the comparison one may decide that urgent remedial work is needed and that site visitor numbers should be restricted. However before expending valuable resources on this there must be condence that the old data has not been altered, and that it is indeed what it is claimed to be. Other complications may arise. For example many digital objects cannot, or at least should not, be freely distributed. Even photographs taken for some purpose which has some passers-by in the scene perhaps should not be used without the permission of those passers-by but that may depend upon the different legal systems of the country in which the photograph was taken, the country where the photograph is held and the country in which it is being distributed. As time passes, legal systems change. Is it possible to determine the legal position easily? Thinking about another everyday problem many Web links no longer work. This will probably get worse over time, yet Web links are often used as an intrinsic part of virtual collections of things. How will we cope with being unable to locate what we need, after even a quite short time? Of course we may deposit our valuable digital objects in what we consider a safe place. But how do we know that it is indeed safe and can counter the threats noted above. Indeed what happens when that organisation which provides the safe place loses its funding or is taken-over and changes its name and function, or simply goes out of business? As a case in point the domain name casparpreserves.eu, within which the CASPAR web site belongs, is owned by the editor of this book; what will happen to that domain name in 50 years time when the DNS registration charge is no longer being paid?

Introduction

Increasingly one nds research papers, for example in on-line journals, which have links to the data on which the research is based. In such a case some or all of the above issues may threaten their survival. Another peculiarity of digital data is that it is easy to copy and to change. Therefore how can one know whether any digital object we have is what it is claimed to be how can we trust it? A related question is how was a particular digital object made? A digital object is produced by some process usually some computer application with certain inputs. In fact it could have been the product of a multitude of processes, each with a multitude of input data. How can we tell what these processes and inputs were, and whether these processes and inputs were what we believe them to have been? Or alternatively perhaps we want to produce something similar, but using a slightly changed process for example because the calibration of an instrument has changed how can this be done? To answer these kinds of questions for a physical object, such as a velum parchment or a painting one can do physical tests to give information about age, chemical composition or surface contaminants. While none of these provide conclusive answers to the questions, because one needs documentation about, for example, the chain of ownership of paintings, but at least such physical measurements provide a reality check. None of these techniques are available for digital objects. Of course these techniques can be applied to the physical carriers of the bits, but those bits can usually be changed without detection. One can think of technologies for example the carvings on stone where it might be easy to detect changes in the bits by changes in the physical medium, but even if no changes are detected one is still not certain about whether it is what is claimed.. One often hears or reads that the solution to all these issues is metadata i.e. data about data. There is some truth in that but one needs to ask some pertinent questions not the least of which are what types of metadata? and how much metadata? For example it is clear that by metadata many people simple refer to ways of classifying or nding something which is not enough for preservation. Without being able to answer these questions one might as well simply say we need extra stuff. Much of the rest of this book is about the multitude types of metadata that is needed. We will put the word in italics and quotes when we use it metadata to remind the reader to be careful to think about what the word means in its particular context. Much of the rest of this book aims at answering those two questions, namely: what types of metadata are needed? and how much of each of those types of metadata is needed?

1.3

Summary

The answers will be based on the approach provided by the Open Archival Information System (OAIS) Reference Model, also known as ISO 14721 [1], an international standard which is used as the basis of much, perhaps most, of the work in this area. Indeed it has been said [14] that it is now adopted as the de facto standard for building digital archives. This book aims to lead the reader into the more advanced topics which need to be addressed to nd solutions to the threats to our digital belongings, and more.

1.2 Terminology
Throughout this book we use the term archive to mean, following OAIS [1], the organization, consisting of people and systems, responsible for digital preservation (this is not the full OAIS denition more on this later). Occasionally the term repository or phrase digital repository is used to convey the same concept where it ts with other usage.

1.3 Summary
Our society increasingly depends upon our continuing ability to access, understand and use digitally encoded information. This chapter should have provided the 10,000 ft view of the issues which the rest of the book aims to map out the solutions for in detail.

You might also like