Chapter 1 - Introduction

This chapter introduces some of the key challenges of digital preservation including: 1) Digital objects can become unusable over time if the file format or software needed to access them is no longer supported. Additional context is often needed to understand things like spreadsheets or scientific data. 2) Digital objects face threats like file corruption, obsolete file formats, broken web links, and unclear intellectual property restrictions over long periods of time. 3) It can be difficult to verify the authenticity, provenance, and trustworthiness of digital objects due to the ease of copying and modifying digital files without leaving traces. Ensuring the preservation of related documentation is also important for understanding and using digital objects in the future.

Uploaded by

Foveros Foveridis

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

30 views

Chapter 1 - Introduction

Uploaded by

Foveros Foveridis

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

Chapter 1

Introduction

This chapter provides a quick view of why digital preservation is important and why it is difcult to do. There is a need to be able to preserve the understandability and usability of the information encoded in digital objects. Because of this focus on information we shall often refer to digitally encoded information where we wish to stress the information aspects. However the basic techniques of digital preservation, discussed in many books on this subject, for example [712], focus, by analogy with traditional paper-based libraries and archives, on preserving the media or bit sequences and preserving the ability to render documents and images. This book addresses what might be termed the more advanced issues of digital preservation, beyond keeping the bits and the ability to render, bringing into play concepts of understandability, usability, knowledge and interoperability. In addition it is recognised that there are rights associated with digital objects; there is concern about how one can judge the authenticity of digital objects; there is uncertainty about how digital objects may be identied and located in the future. In responding to each of these concerns the likelihood is that additional digital artefacts will be created (such as the specication of the digital rights) which themselves need to be preserved so they can be used in future when they are needed! Thus we argue that one must be able to preserve many additional different digital objects if one wishes to really preserve any particular single digital object. Part I of this book provides the theoretical basis for preservation. In Part II we provide evidence from a variety of sources using many types of data from many disciplines, and show many tools which provide reasonable implementations of the techniques described here. These examples and much of the work described in this book are derived from the CASPAR project [2]. Part III addresses the important questions of how to keep costs under control and how to make sure money is not wasted by preparing ones archive for independent evaluation.

D. Giaretta, Advanced Digital Preservation, DOI 10.1007/978-3-642-16809-3_1, C Springer-Verlag Berlin Heidelberg 2011

Introduction

1.1 Whats So Special About Digital Things?

One might say that digital objects give rise to special concerns because the 1 s and 0 s which make up binary things are difcult to see. Hold on! you might say, in the case of CD-ROMs one can, with a microscope, see the pits in the surface. Well that may be true, but those little pits on the disk are not the bits. To get to the bits one needs to unravel the various levels of bit-stufng, the error correction codes and logical addressing. These things are handled by the electronics of the CD-ROM reader or of the computer hard disk, where one would be looking at magnetic domains rather than pits, and they expose a relatively simple electronic interface that talks to the rest of the computer systems in terms of bits. Such electronic interfaces illustrate a type of virtualisation which is widely used to allow equipment from many manufacturers to be used in computers. However the underlying technology of such disks changes relatively quickly and so do the interfaces, as a result one cannot usually use an old type of disk in a new computer. This applies both to the well known example of oppy disks, CD-ROMs and to internal spinning hard disks. Alright you may say, I know a better, simpler way, which has been proven to hold information for hundreds of years. How about simply writing my 1 s and 0 s on paper? Of course we could use the right acid-free paper! Or if one wanted something for thousands of years we could take a leaf from the Ancient Egyptians and carve the 1 s and 0 s on stone. Or to bring that up to date I know that people in the nuclear industry are trying out writing very tiny characters on Silicon Carbide sheets. [13]. Those techniques would get around some problems, although one might only want to use them for really, really, really important digital objects since they sound as if they could be very expensive. Therefore they are not solutions for the family photographs although they may be very good for simple text documents (although in that case one might as well simply print the characters out rather than the 1 s and 0 s). However there are some more fundamental problems with these approaches. For example they are not even the solution for things like spreadsheet where one needs to know what the columns and cells mean. Similarly scientic data, as we will see, needs a great deal of additional information in order to be usable.

1.1.1 Threats to Digital Objects of Importance to You

Take a moment to think about the digital objects which affect your life. These days at home we have family photographs and videos, letters, emails, bank records, software licences, identity certicates, spreadsheets of budgets and plans, encrypted private data and also zip les containing some or all of these things. One might have more complex things such as Word documents with linked-in spreadsheets or databases. Widening the picture now to ones work and leisure the list might include games, architectural plans, home nances, engineering designs, and scientic data from many sources, models and analysis results.

1.1

Whats So Special About Digital Things?

Many will already have had the experience of nding a digital object (lets say for simplicity that this is a le) for which one no longer remembers the details or for which one no longer has the software one used to use. In the case of images or documents there is, at the moment, a reasonable chance of nding some way of viewing them, and that may be perfectly adequate, although one might for example want to know who the people in a photograph are, or what language the book is written in and what the words mean. This would be equivalent to storing a book or photograph on a shelf and then picking it up after many years and still being able to view the symbols or images on the page as before, although the reader may not be able to understand the meaning of those symbols. On the other hand many may also have had the experience of nding a spreadsheet, still being able to view all the numbers, text and formulae, and yet be unable to remember what the various formulae, cells and columns mean. Thus despite knowing the format of the le and having the appropriate software, the information is essentially lost! Looking yet further aeld, consider the digital record of a cultural heritage site such as the Taj Mahal measured 10 years ago. In order to know whether or not visitors have damaged this heritage site one would need to compare those measurements with current day measurements which may have been captured with different instruments or stored in a different way. Thus one needs to be able to combine data of various types in order to get an answer. Based on the comparison one may decide that urgent remedial work is needed and that site visitor numbers should be restricted. However before expending valuable resources on this there must be condence that the old data has not been altered, and that it is indeed what it is claimed to be. Other complications may arise. For example many digital objects cannot, or at least should not, be freely distributed. Even photographs taken for some purpose which has some passers-by in the scene perhaps should not be used without the permission of those passers-by but that may depend upon the different legal systems of the country in which the photograph was taken, the country where the photograph is held and the country in which it is being distributed. As time passes, legal systems change. Is it possible to determine the legal position easily? Thinking about another everyday problem many Web links no longer work. This will probably get worse over time, yet Web links are often used as an intrinsic part of virtual collections of things. How will we cope with being unable to locate what we need, after even a quite short time? Of course we may deposit our valuable digital objects in what we consider a safe place. But how do we know that it is indeed safe and can counter the threats noted above. Indeed what happens when that organisation which provides the safe place loses its funding or is taken-over and changes its name and function, or simply goes out of business? As a case in point the domain name casparpreserves.eu, within which the CASPAR web site belongs, is owned by the editor of this book; what will happen to that domain name in 50 years time when the DNS registration charge is no longer being paid?

Introduction

Increasingly one nds research papers, for example in on-line journals, which have links to the data on which the research is based. In such a case some or all of the above issues may threaten their survival. Another peculiarity of digital data is that it is easy to copy and to change. Therefore how can one know whether any digital object we have is what it is claimed to be how can we trust it? A related question is how was a particular digital object made? A digital object is produced by some process usually some computer application with certain inputs. In fact it could have been the product of a multitude of processes, each with a multitude of input data. How can we tell what these processes and inputs were, and whether these processes and inputs were what we believe them to have been? Or alternatively perhaps we want to produce something similar, but using a slightly changed process for example because the calibration of an instrument has changed how can this be done? To answer these kinds of questions for a physical object, such as a velum parchment or a painting one can do physical tests to give information about age, chemical composition or surface contaminants. While none of these provide conclusive answers to the questions, because one needs documentation about, for example, the chain of ownership of paintings, but at least such physical measurements provide a reality check. None of these techniques are available for digital objects. Of course these techniques can be applied to the physical carriers of the bits, but those bits can usually be changed without detection. One can think of technologies for example the carvings on stone where it might be easy to detect changes in the bits by changes in the physical medium, but even if no changes are detected one is still not certain about whether it is what is claimed.. One often hears or reads that the solution to all these issues is metadata i.e. data about data. There is some truth in that but one needs to ask some pertinent questions not the least of which are what types of metadata? and how much metadata? For example it is clear that by metadata many people simple refer to ways of classifying or nding something which is not enough for preservation. Without being able to answer these questions one might as well simply say we need extra stuff. Much of the rest of this book is about the multitude types of metadata that is needed. We will put the word in italics and quotes when we use it metadata to remind the reader to be careful to think about what the word means in its particular context. Much of the rest of this book aims at answering those two questions, namely: what types of metadata are needed? and how much of each of those types of metadata is needed?

1.3

Summary

The answers will be based on the approach provided by the Open Archival Information System (OAIS) Reference Model, also known as ISO 14721 [1], an international standard which is used as the basis of much, perhaps most, of the work in this area. Indeed it has been said [14] that it is now adopted as the de facto standard for building digital archives. This book aims to lead the reader into the more advanced topics which need to be addressed to nd solutions to the threats to our digital belongings, and more.

1.2 Terminology
Throughout this book we use the term archive to mean, following OAIS [1], the organization, consisting of people and systems, responsible for digital preservation (this is not the full OAIS denition more on this later). Occasionally the term repository or phrase digital repository is used to convey the same concept where it ts with other usage.

1.3 Summary
Our society increasingly depends upon our continuing ability to access, understand and use digitally encoded information. This chapter should have provided the 10,000 ft view of the issues which the rest of the book aims to map out the solutions for in detail.

Front Matter
No ratings yet
Front Matter
22 pages
Dbms Script
No ratings yet
Dbms Script
26 pages
Escaping The Digital Dark Age
No ratings yet
Escaping The Digital Dark Age
5 pages
Introduction To Databases
No ratings yet
Introduction To Databases
26 pages
Managing The Digital You - Where and How To Keep and Organize Your Digital Life (2017)
No ratings yet
Managing The Digital You - Where and How To Keep and Organize Your Digital Life (2017)
131 pages
Alternative Types of Databasse
No ratings yet
Alternative Types of Databasse
2 pages
Subject: Port Information Systems and Platforms: Proposed By: Prof Tali
No ratings yet
Subject: Port Information Systems and Platforms: Proposed By: Prof Tali
9 pages
Internal Legislator Document
No ratings yet
Internal Legislator Document
7 pages
Mis - 2
No ratings yet
Mis - 2
12 pages
Defining Data Science
100% (1)
Defining Data Science
167 pages
Digital Gui Dev 3
No ratings yet
Digital Gui Dev 3
9 pages
Role of Libraries in Preservation of Manuscripts
No ratings yet
Role of Libraries in Preservation of Manuscripts
9 pages
Time Series Databases
100% (2)
Time Series Databases
81 pages
Guia para optimizar la organizacion de la informacion digital
No ratings yet
Guia para optimizar la organizacion de la informacion digital
60 pages
What Is A File?
No ratings yet
What Is A File?
10 pages
Evolution of Network
No ratings yet
Evolution of Network
3 pages
Basics of Big Data
No ratings yet
Basics of Big Data
14 pages
Data What Is Data?: Quantitative Qualitative Quantitative Data
No ratings yet
Data What Is Data?: Quantitative Qualitative Quantitative Data
4 pages
Copy Detection Mechanisms For Digital Documents
No ratings yet
Copy Detection Mechanisms For Digital Documents
12 pages
digital materiality
No ratings yet
digital materiality
22 pages
Selecting Transcript Lines in This Section Will Navigate To Timestamp in The Video
No ratings yet
Selecting Transcript Lines in This Section Will Navigate To Timestamp in The Video
16 pages
PD em Organizacoes
No ratings yet
PD em Organizacoes
15 pages
Object Storage Overview
No ratings yet
Object Storage Overview
11 pages
Data Manipulation at Scale
No ratings yet
Data Manipulation at Scale
4 pages
Semana 1: The Data Scientist's Toolbox
No ratings yet
Semana 1: The Data Scientist's Toolbox
20 pages
juliadatascience
No ratings yet
juliadatascience
260 pages
Seminar Report BIG DATA
No ratings yet
Seminar Report BIG DATA
28 pages
Keeping Our Biits About U NUNUNG
No ratings yet
Keeping Our Biits About U NUNUNG
5 pages
Data Science
No ratings yet
Data Science
87 pages
Week 14 Other Technologies-Cloud
No ratings yet
Week 14 Other Technologies-Cloud
21 pages
Bill Inmon - SWIMMING IN THE DATA LAKE - 1
No ratings yet
Bill Inmon - SWIMMING IN THE DATA LAKE - 1
3 pages
001 Introduction Big Data
No ratings yet
001 Introduction Big Data
12 pages
Ethiopin Tecica University Departement of Ict Cours Title: Big Data
No ratings yet
Ethiopin Tecica University Departement of Ict Cours Title: Big Data
15 pages
Data Governance
No ratings yet
Data Governance
2 pages
Organize Your Digital Life: How to Store Your Photographs, Music, Videos, and Personal Documents in a Digital World
From Everand
Organize Your Digital Life: How to Store Your Photographs, Music, Videos, and Personal Documents in a Digital World
Aimee Baldridge
3.5/5 (4)
Gutmann-IT and Society
No ratings yet
Gutmann-IT and Society
8 pages
Digital Collections: Preservation and Problems: 3 4 0 International CALIBER-2008
No ratings yet
Digital Collections: Preservation and Problems: 3 4 0 International CALIBER-2008
9 pages
ISC 424 - module
No ratings yet
ISC 424 - module
62 pages
Computers Nowadays
No ratings yet
Computers Nowadays
7 pages
Data Science
No ratings yet
Data Science
26 pages
Digital Forensics Workbook Hands on Activities in Digital Forensics 1st Edition Michael K Robinson instant download
100% (1)
Digital Forensics Workbook Hands on Activities in Digital Forensics 1st Edition Michael K Robinson instant download
58 pages
Internet Geography Coursework
100% (2)
Internet Geography Coursework
4 pages
Sample - Solution Manual Fundamentals of Database Management Systems Mark Gillenson
No ratings yet
Sample - Solution Manual Fundamentals of Database Management Systems Mark Gillenson
8 pages
Computer Science Field Guide - Student Version
No ratings yet
Computer Science Field Guide - Student Version
580 pages
Online Data
No ratings yet
Online Data
3 pages
1.2.1 Managing Knowledge in The Age of Digitalization-En
No ratings yet
1.2.1 Managing Knowledge in The Age of Digitalization-En
3 pages
Unit 1
No ratings yet
Unit 1
26 pages
Computer Assignment
No ratings yet
Computer Assignment
10 pages
Diglib-Litrev P.3 R
No ratings yet
Diglib-Litrev P.3 R
25 pages
Understanding Big Data
No ratings yet
Understanding Big Data
14 pages
Introduction To Data Mining
No ratings yet
Introduction To Data Mining
8 pages
Thesis Paper On Big Data
100% (3)
Thesis Paper On Big Data
7 pages
Big Data Analytics
No ratings yet
Big Data Analytics
21 pages
Digitizing Microfilm and Microfiche
From Everand
Digitizing Microfilm and Microfiche
Ronald J. Leach
No ratings yet
102346
No ratings yet
102346
40 pages
Thesis Documentation Example
100% (2)
Thesis Documentation Example
8 pages
PDF DW 2 0 The Architecture for the Next Generation of Data Warehousing 1st Edition Inmon William H Strauss Derek Neushloss Genia download
100% (3)
PDF DW 2 0 The Architecture for the Next Generation of Data Warehousing 1st Edition Inmon William H Strauss Derek Neushloss Genia download
50 pages
Unit II Notes
No ratings yet
Unit II Notes
39 pages
Enterprise Data Science: Smarter Decisions with Big Data
From Everand
Enterprise Data Science: Smarter Decisions with Big Data
Vidhur Gupta
No ratings yet
Data Virtualization: Selected Writings
From Everand
Data Virtualization: Selected Writings
Rick F. van der Lans
No ratings yet
Linking Words - Presenting Ideas
No ratings yet
Linking Words - Presenting Ideas
1 page
Whiskers
No ratings yet
Whiskers
129 pages
B.sc-computer-Science 2017 2018 Syllabus
No ratings yet
B.sc-computer-Science 2017 2018 Syllabus
70 pages
RRB_NTPC_Study_Notes_2025
No ratings yet
RRB_NTPC_Study_Notes_2025
3 pages
Tree Counting
No ratings yet
Tree Counting
6 pages
Lessons From 3 Idiot Movie
No ratings yet
Lessons From 3 Idiot Movie
14 pages
Colin Mcfarlane. Assemblage and Critical Urbanism. 2011
100% (1)
Colin Mcfarlane. Assemblage and Critical Urbanism. 2011
22 pages
Linn Mar's School Board Meeting Tonight
No ratings yet
Linn Mar's School Board Meeting Tonight
9 pages
Lesbian Modernism?: Book Review
No ratings yet
Lesbian Modernism?: Book Review
12 pages
Lesson Plan Format 1
No ratings yet
Lesson Plan Format 1
6 pages
Insanity Against Humanity: Dr. Mahboob A. Khawaja
No ratings yet
Insanity Against Humanity: Dr. Mahboob A. Khawaja
3 pages
MIS Report
No ratings yet
MIS Report
10 pages
Project Work: A Study On Communication Skills
100% (1)
Project Work: A Study On Communication Skills
56 pages
Yeast Process Production
100% (1)
Yeast Process Production
32 pages
NSM Website Bio - Gigliotti - Final
No ratings yet
NSM Website Bio - Gigliotti - Final
3 pages
Design of Butterworth Band-Pass Filter: Keywords: Ap, As, Ltspice
No ratings yet
Design of Butterworth Band-Pass Filter: Keywords: Ap, As, Ltspice
15 pages
Spider Project Presentation
No ratings yet
Spider Project Presentation
42 pages
Contemporary Quantitative Finance
No ratings yet
Contemporary Quantitative Finance
420 pages
Roanomics Vol.3, Iss. 1
No ratings yet
Roanomics Vol.3, Iss. 1
12 pages
Rodriguez Loor Luis Miguel - 5535425 - 0
No ratings yet
Rodriguez Loor Luis Miguel - 5535425 - 0
4 pages
Indian Association of Physics Teachers National Standard Examination in Chemistry (NSEC 2019 - 20) Question Paper Code: 34
No ratings yet
Indian Association of Physics Teachers National Standard Examination in Chemistry (NSEC 2019 - 20) Question Paper Code: 34
1 page
Attitude Scale by Yth
No ratings yet
Attitude Scale by Yth
11 pages
X Photos: Problem Description
No ratings yet
X Photos: Problem Description
3 pages
assignment6
No ratings yet
assignment6
2 pages
Lesson 5 Levels of Measurement - signedDBA
No ratings yet
Lesson 5 Levels of Measurement - signedDBA
9 pages
Library: What Books Would You Recommend Nick?
100% (1)
Library: What Books Would You Recommend Nick?
7 pages
Reviewer in General Mathematics Part Iv
No ratings yet
Reviewer in General Mathematics Part Iv
1 page
The Mirror Problem
No ratings yet
The Mirror Problem
10 pages
Bsa s01 s02 Ppt-In-class
No ratings yet
Bsa s01 s02 Ppt-In-class
125 pages
American Phrenological Journal (1848)
No ratings yet
American Phrenological Journal (1848)
24 pages

Chapter 1 - Introduction

Uploaded by

Chapter 1 - Introduction

Uploaded by

Chapter 1

1.1 Whats So Special About Digital Things?

1.1.1 Threats to Digital Objects of Importance to You

Whats So Special About Digital Things?

You might also like