Explore 1.5M+ audiobooks & ebooks free for days

From $11.99/month after trial. Cancel anytime.

Doing digital history: A beginner’s guide to working with text as data
Doing digital history: A beginner’s guide to working with text as data
Doing digital history: A beginner’s guide to working with text as data
Ebook303 pages6 hoursIHR Research Guides

Doing digital history: A beginner’s guide to working with text as data

Rating: 0 out of 5 stars

()

Read preview

About this ebook

This book is a practical introduction to digital history. It offers advice on the scoping of a project, evaluation of existing digital history resources, a detailed introduction to how to work with large text resources, how to manage digital data and how to approach data visualisation.

Doing digital history covers the entire life-cycle of a digital project, from conception to digital outputs. It assumes no prior knowledge of digital techniques and shows you how much you can do without writing any code. It will give you the skills to use common formats such as XML. A key message of the book is that data preparation is a central part of most digital history projects, but that work becomes much easier and faster with a few essential tools.
LanguageEnglish
PublisherManchester University Press
Release dateMay 18, 2021
ISBN9781526132697
Doing digital history: A beginner’s guide to working with text as data

Related to Doing digital history

Titles in the series (4)

View More

Related ebooks

History For You

View More

Reviews for Doing digital history

Rating: 0 out of 5 stars
0 ratings

0 ratings0 reviews

What did you think?

Tap to rate

Review must be at least 10 words

    Book preview

    Doing digital history - Jonathan Blaney

    INTRODUCTION

    This book is a practical guide for using digital tools and techniques for historical research, but it should be useful for anyone from across the humanities who is interested in working with collections of texts at scale. We do focus on text and so, although our examples will be historical, their applicability to the study of literature, for example, should be evident.

    This book is aimed at non-programmers and it does not teach or require the use of any programming language. The practical examples we work through substantiate our belief that you can do a great deal without programming, by leveraging the work of others. Learning to program is interesting and useful but it is not essential to doing digital history and many digital historians choose not to. There are certainly things that can only be done with coding but we have purposely omitted all such things. We hope you will be surprised by the power and flexibility of the approaches we show you.

    We have deliberately chosen to focus almost entirely on tools which have been used for decades and which we expect to continue to be used for many more years. They are mature and there is an abundance of advice available online to supplement what we show in the book.

    We do insist that much of the work in digital history projects is likely to consist in preparing the data for the interesting part of the project: the part that produces interesting results (whether that be an idea, a graph, a map or a website). This is often known as ‘data cleaning’. Actually, that term is slightly misleading because it implies that the problems with the underlying data are always errors of some kind. Sometimes the data needed for a project arrives in a clean and well-organised condition, but simply happens to be in the wrong format for its new use and so requires preparatory work. Data preparation is not much discussed when digital projects are written up, but this book is a practical guide and so we have tried to give it due prominence.

    Our book follows the general structure of other volumes in the IHR Research Guides series. Chapter 1, ‘The context of digital history’, describes the history of our subject and some of its milestones. What is available as a digital source, and how, is a product of the history of the subject and, more importantly, early drivers of digitisation, such as the commercial value of genealogical sources. In writing this chapter we found it hard to disentangle digital history from the digital humanities more broadly considered and we suggest that trying to impose a clear demarcation is unhelpful.

    Chapter 2, ‘Formulating your research questions’, will help you to think through your research ideas in the context of digital history. What techniques will you need? What is already available in terms of data and tools? A critical and judicious approach to early decisions here (both in terms of the research project you pursue and any resources you employ) can save you valuable time and energy and we give lots of advice on how to make those decisions. The last section may seem to skip ahead to some thoughts on where and how you might publish your research. We think that it is best to have a rough idea of this from the very beginning and we also suggest that you should not think of publishing solely in terms of final outputs.

    Chapter 3, ‘How a digital project begins’, is a nuts-and-bolts discussion of how a research project might go from books on a shelf to digital output. In our experience not many people are confident with the process of digitising material because they have no experience of doing so. This chapter describes that process. We digitised part of a book specifically for Doing digital history and have made our files freely available online (see Appendix 1). The book is The Post Office London Directory for 1879, and we have tried to use it for our historical questions and for our practical examples wherever we could throughout the book.

    Chapters 4 and 5 go into detail on how to work with digitised text automatically and at scale. We show how you can use the command line, which gives you access to hundreds of small programs written by other people, and with which you can accomplish an enormous amount without writing a line of code. We make no apology for talking at length about the command line: it is the Swiss army knife of computing, beloved of most programmers. Learning even a little bit about how to use the command line can transform the way you work. Plain text is covered in Chapter 4 and structured text in Chapter 5. We will show that plain text is harder to deal with, although perhaps easier to get hold of, and that structured text is preferable when available, even if at first glance its appearance may be more forbidding. For structured text we concentrate on XML (Extensible Markup Language), but the approaches we take should transfer reasonably easily to other formats.

    Chapter 6, ‘Caring for your digital history project’, covers the practicalities of managing your data and sharing it effectively. Our section on research data management spends a fair amount of time on using the Git tool to manage your data, because we think this is simply the best option available. Further, a great deal of reusable data can now be found in the form of a Git repository, so a basic understanding of what that means and how it works is becoming essential. We also look at documentation and metadata.

    Chapter 7, ‘Visualising your data’, gives an overview of visualising historical data with some advice on practical aspects, such as the use of colour. Here we use the Post Office data to create some visualisations of our own, in the form of charts and a map, with detailed information on how we went from dataset to visualisation and why we made the choices that we did.

    Chapter 8, ‘What next for digital history?’, is our attempt to predict what new technologies are around the corner for historians and how they might affect your work. This chapter ignores George Eliot’s advice, in Middlemarch, that ‘of all forms of mistake, prophecy is the most gratuitous’. We hope that, even if this chapter eventually proves to be laughably wrong in some of the details, the generalities will affect historical practice one way or another. We put the finishing touches to this book while much of the world was in lockdown because of the COVID-19 pandemic. We have not revised our predictions, even though our expectation of ‘gradual evolution and embedding rather than of revolution and disruption’ already looks dated. We think it is too early to say what long-term changes the pandemic will bring to the practice of history.

    We have also included three appendices: Appendix 1 describes the data repository we have created for this book – what is in it, how to get a copy of it and how you might want to use it to practise your skills. Appendix 2 is a table of command line tools and how to use them. We give a human-language description of what each command does and some more extended examples of how you can use them for common tasks. We hope this will be a useful ready reference for day-to-day command line work. Appendix 3 is a summary of the syntax of regular expressions. We think getting to grips with regular expressions, or regex, is essential for working digitally with text. We introduce regular expressions bit by bit in Chapter 4, but Appendix 3 provides a convenient summary in one place. Regular expressions are not easy to learn but we encourage you to keep practising and referring back to this appendix any time you need to.

    There are many things we have not included in this book. Equally, our readers do not need to master everything we do cover. Digital history has many facets, some more appropriate to a particular field or congenial to a particular researcher than others. If this book encourages some historians to try something new or go further with a digital approach than they had previously then it will have been worth writing.

      

    1  

    THE CONTEXT OF DIGITAL HISTORY

    INTRODUCTION

    It is a difficult task, doomed in advance, to say in a few words what has really changed in our area of study, and especially how and why that change took place.¹

    There are a number of strands we have to try to weave together in describing the context and development of digital history. We will start by discussing the place of digital history within the broader context of digital humanities, and then within the context of the development of technology in the post-war period. We will move on to discussing the effect of the digital on three areas of the historian’s craft: finding, writing and citing.

    A sceptical view of the impact of digital history might suggest that it only really involves traditional historical methods speeded up. It would be possible, for example, for a team of researchers over many years to read the whole of Hansard and create an index of the appearance of a particular term or set of terms; now one researcher can do the same thing in a leisurely morning’s work. But, as we will see in Chapter 2, the digitisation of Hansard has led to much deeper change in historical work. Speed, moreover, changes things by itself. The digital has changed life fundamentally for historians and, as there are trade-offs with all change, these changes are not uniformly positive. We will look at the effects of technological developments, but this will not be simply a celebration of digital approaches. Digital offers extra tools for the historian’s toolbox, not replacements for the old ones.

    A broader point, which we will touch on only here, is that the fundamental changes engendered by the digital have not just affected historians but indeed all of us. It has changed how plumbers, surgeons, schoolchildren, till operators and parents carry out their tasks and so has changed them as people.² Historians, then, are just one more group carried along by changes in society. Some commentators, like Nicholas Carr and Matthew Crawford, are concerned that the effect on society is overall a negative one, particularly in terms of the skills such as reading and reasoning which are central to historical research.³

    Just as the analogue world has not gone away, and smartphones can be switched off when walking in the countryside, so the working world of the historian is not wholly digital and we hope and expect that it never will be. The catalogue terminal still very frequently leads us to books and to manuscripts; the web leads to new opportunities to meet other historians, to visit archives, libraries and museums. At its best, the digital world can be a finding aid to what we really value.

    DIGITAL HUMANITIES AND DIGITAL HISTORY

    Much has been written and debated around the definition of digital humanities.⁴ This book is a practical guide to doing digital history and so not the place to revisit those debates. But we do want to insist on one thing: doing digital humanities need not involve writing programs (also known as coding).⁵ We would be committed to this proposition even if this book, which contains no coding, was not an attempt to instantiate it. Digital humanities, in our view, is a question of approach: if you are actively and critically using digital tools to aid your work in researching, teaching or learning, you are probably doing digital humanities. We would encourage anyone to learn to program if they are interested in doing so, but we do not see it as a defining characteristic of work in digital humanities.⁶

    The overlap between digital humanities and digital history is large. Many techniques are common to both and few historical resources are unexploited by researchers from other disciplines. For example, Early English Books Online (EEBO) is a commercial project which provides page scans of books published in England or English between 1472 and 1700.⁷ The books digitised by EEBO are of great interest to historians, but also to scholars of literature, theology, art history, linguistics, law and many other subjects.⁸ With twenty-five thousand transcriptions now available from the academic consortium EEBO-TCP, many researchers from different disciplines are able to use this corpus to do the aggregate work which is one feature of digital humanities. Digital resources many times smaller than EEBO are also of use to researchers across the humanities, and any bright line between digital history and digital humanities is of questionable value.

    The Spanish historian Anaclet Pons divides the history of what we now call digital humanities into three eras:

    1. The ‘heroic’ era of work by pioneers

    2. Pre-web work when computers were becoming widespread and the field was known as ‘humanities computing’

    3. The web-enabled digital humanities era, characterised by abundance

    Before we even get to the pioneers of the computing age, a literary example of work that would now be done by computing is T. C. Mendenhall’s ‘The characteristic curves of composition’ (1887). As described by Geoffrey Rockwell and Stéphan Sinclair, Mendenhall produced graphs of word length frequencies for authors in an attempt to prove that Francis Bacon wrote the works of Shakespeare.¹⁰ This was not the first visualisation by any means – William Playfair, working in the eighteenth century, is credited by the Oxford Dictionary of National Biography as the inventor of ‘three fundamental forms of statistical graph – the time-series line graph, the bar chart, and the pie chart’¹¹ – but it is a striking echo of work done today. To prove the point, Rockwell and Sinclair have recreated Mendenhall’s work in a downloadable Jupyter notebook.¹²

    However, the first of the early digital humanists proper was probably Josephine Miles. A scholar of English poetry, particularly of the seventeenth century, Miles pioneered computational approaches to literature. Most notably, she worked on a concordance (an alphabetical index to all the words used in a work, used as a finding aid) to the poetry of John Dryden using punched cards and early computers.¹³ Miles commented pithily on this process and its advantages:

    Three problems were primary: the bulk of the work, the cost of publication, the difficulty of accurate checking by assistants unfamiliar with the material. The decision to use IBM machines as an aid in checking helped solve the other problems in turn.¹⁴

    Better known, but slightly later, is the work of Roberto Busa. He used an IBM computer to make possible the indexing of the voluminous works of Thomas Aquinas. Appropriately, the output of this project, the Index Thomisticus, followed a sequence found with many long-lived digital humanities projects: it was published first as a multi-volume print edition, then as a CD-ROM, followed ultimately by a web version.¹⁵

    The beginnings of digital history itself are strongly associated with two historical movements: the quantitatively focused and largely American cliometrics movement and the French Annales school. Cliometrics lends itself particularly to questions of economic history and population history, placing great value on statistical methods and approaches. The article ‘The economics of slavery in the antebellum South’ by Alfred Conrad and John Meyer, published in 1958, is a foundational document for cliometrics’ mathematical approach to historical questions. Here Conrad and Meyer argue that, absent the US Civil War, slavery would have continued in the South, because it was profitable. The novelty of the paper, however, lay in their methodology rather than their conclusion. The authors tested their ‘hypothesis’ by examining data including prices for slaves and cotton, output per slave, life expectancy and reproduction rates among slaves (the language of the article is studiedly neutral on the horrors that lay behind the figures). Interestingly, given that a theme of our chapter is the way that wider technological change influenced historical practice, a criticism Conrad and Meyer make of previous approaches to this question is that:

    the debate over the value of the different constituent pieces of information reconstructs in embryo much of the historical development of American accounting practices.¹⁶

    Annales historians, although they differ somewhat in approach and interests, have focused on the longue durée of economic and social history and so, as with advocates of cliometrics, naturally took an early interest in how computing could make this work less labour intensive. Perhaps the most enthusiastic of the group was Emmanuel Le Roy Ladurie, who was clear that computers are not simply labour-saving devices, but that they altered the direction of historical study: where previously ‘the massive extent of the documents seems to have paralysed researchers’, now (he was writing in 1970), ‘modern techniques … permit a genuine historiographical revolution’.¹⁷

    An early example of the use of computing in British historical research was the work of Roderick Floud. While doing doctoral research in 1965, Floud obtained extensive historical records from an engineering firm and used a computer to analyse the vast quantity of material he had acquired. It is probably relevant that, as well as being an economic historian, Floud was a proponent of quantitative and econometric history: early historical work using computers often had to focus tightly on number crunching because computers were primarily designed for this function at that time. Floud recalls that, while there was hostility to cliometrics itself among some historians, this did not extend to the use of computers: ‘computing … was really seen as being a natural extension of other forms of historical scholarship’.¹⁸

    This distinction of Floud’s makes it difficult to disentangle much of the criticism of computational approaches to history which was published in the next couple of decades. The actual target seems often to be the

    Enjoying the preview?
    Page 1 of 1