What are the digital humanities?

By David M. Berry

Digital humanities are at the leading edge of applying computer-based technology in the humanities. Initially called ‘humanities computing’, the field has grown tremendously over the past 40 or so years. It originally focused on developing digital tools and the creation of archives and databases for texts, artworks, and other materials. From these initial uses, and as computation developed, computers offered increasingly sophisticated ways of handling and searching digitised culture. For example, with recent advances in digital imaging, it is now possible to produce very high-quality reproductions of books and artworks that can transform our ability to study them.

Pianist Shin Suzuma uses a digital score app for ensembles
Pianist Shin Suzuma uses a digital score app Syncphonia for ensembles powered by Sussex University research funded by AHRC

The key to understanding the digital humanities is to reject the idea that digital technology is invading the academy. Computers were used for humanist ends from very early on in their history, and not only, as one might expect, as mere storage for large libraries of text. Computer networks, particularly the internet, have also enabled digital files to be used from almost anywhere on the globe. This access to information has had a tremendous effect on the ability to undertake research in the arts and humanities.

Digital humanities incorporate key insights from languages and literature, history, music, media and communications, computer science and information studies and combine these different approaches into new frameworks. More recently, the disciplinary focus has widened to include critical digital studies, as well as fields more commonly associated with engineering such as machine-learning, data science and artificial intelligence. Indeed, as early adopters of technology, digital humanists were prescient in seeing that computation would have an increasing centrality to research in the humanities.

As part of their work, digital humanists have developed new methods, such as computer-based statistical analysis, search and retrieval, topic modelling, and data visualisation. They apply these techniques to archives and collections that are vastly larger than any human researcher or research group can comfortably handle. These methods enable ambitious projects to be created with large interdisciplinary teams that are brought together to work on difficult or complex projects. Digital humanists are transforming the idea of what a humanities research project can be, giving us new ways of seeing past and present cultures.

These new collections of historical or literary artefacts are often publicly available on the web or in digital databases, and the material they contain is more openly available than previously possible with print. They increase the ability for humanists to combine data sets, social media, sound, web and image archives and also to move between them with greater ease. Equally crucial has been the creation of software for analysing, understanding and transforming these digital materials. Digital tools can also be freely accessed over the internet so they can be easily incorporated into other projects, enabling the rapid diffusion of new methods, tools and ideas across disciplinary boundaries. These digital technologies open up exciting opportunities for connecting the humanities to a wider public culture.

The social network Facebook has authorised giants like Amazon, Netflix, Spotify and Microsoft to access the personal data of its 2.2 billion users, according to the 'New York Times'. Photo by Chesnot/Getty Images.
The social network Facebook has authorised giants like Amazon, Netflix, Spotify and Microsoft to access the personal data of its 2.2 billion users, according to the ‘New York Times’. Photo by Chesnot/Getty Images.

However, with the greater diffusion of digital technologies into our lives, new concerns have arisen about the capacity these technologies have to spy on their users, about digital bias and discrimination, and the emergence of ‘fake news’. Companies such as Facebook, Apple, Amazon, Netflix and Google use our data in very intrusive ways, making collection of both public and private data a public matter of concern. Here too digital humanities, with its expertise across many knowledge areas, can help us understand these problems and provide critical interventions and policy insights.

The academy is now much more comfortable with the use of computation across disciplines. It has brought new powers of analysis, comparison and understanding to a range of research areas. The digital humanities have been exemplary in transferring digital techniques and methods into the humanities and by doing so have laid the ground for a golden age of humanities research in the 21st century. In a digital age, the humanities need to communicate humanistic values and their own contribution to public culture more than ever. The humanities continue to ask the important question: what is a life worth living? The digital humanities are part of this tradition, helping us to reflect on this question and expanding our understanding of human culture in a digital world.


David M. Berry is Professor of Digital Humanities at the University of Sussex. He writes widely on the theory and philosophy of computation and algorithms. His most recent book is Digital Humanities: Knowledge and Critique in a Digital Age (with Anders Fagerjord). His forthcoming British Academy-supported research is concerned with the idea of a university in a digital age.


This post originally appeared on the British Academy blog.

Data Cleaning and Preparation

Earlier today Ben Jackson gave the first in this semester’s series of digital methods open workshops. Here are a few rough notes on what we covered. If you missed the workshop and want to try out some of it on your own, you can find the tasks here. For details of forthcoming workshops, go here. All workshops are free and open to everyone (but it helps if you register).

Ben started us off with a rapid hurtle through some of his recent and ongoing projects (slides), including his collaboration with Caroline Bassett exploring ways of analysing and visualising Philip K. Dick’s writing (counting electric sheep, baa charts), and work bringing to life the text data of the Old Bailey Online. (It’s a tremendously rich archive of nearly 200,000 trials heard at the Old Bailey between 1674 and 1913). Bringing to life, and also bringing to unlife: Ben uses a kind of estrangement effect to remind the observer of what the data isn’t telling us, populating his legal drama puppet show with a cast of spoopy skellingtons.

Ben Jackson puppet show

Most of the workshop was a free exploration of prompts and tools Ben pulled together. People basically tried out whatever they liked, while he glided from table to table rendering assistance.

Calibre is a free ebook manager that is also a bit of a Swiss army knife, and it just so happens one of its fold-out doohickeys is a very good ebook-to-plain-text converter. Ebook files (.AZW, .EPUB, .MOBI etc.) are stuffed with all kind of metadata that usually needs to be cleared away before you can do any analysis on the raw text itself. We also did something similar with another free tool, AntFileConverter, turning PDF into plain text. The lesson was that documents can be ornery and eccentric, and different converter tools will work differently and give rise to different glitches: “no single converter that will just magically work on every document.”

AntFileConverter is part of a family of tools. We also checked out TagAnt and AntConc. I feel like I only scratched the surface of these. TagAnt creates a copy of a text file with all the grammatical parts-of-speech tagged. So if you input something like “We waited for ages at Clapham Junction, with the guard complaining about people blocking the
doors” you get something like “We_PP waited_VVD for_IN ages_NNS at_IN Clapham_NP Junction_NP ,_, with_IN the_DT guard_NN complaining_VVG about_IN people_NNS blocking_VVG the_DT doors_NNS ._SENT” as output. PP is a personal pronoun, VVD is a past tense verb, IN is a preposition or subordinating conjunction, and so on. By itself this just seems to be an extremely pedantic form of vandalism. It does let you fairly easily find out if, for example, an author just loves adverbs. And tagging parts of speech could be the first step toward more interesting manipulations, for creative purposes (shuffle all the adverbs) and/or analytic purposes (analysis of genre or authorship attribution).

AntConc allows you to create concordances. A concordance is (more or less) an alphabetical list of key terms in a text, each one nestled in a fragment of its original context. So it’s a useful way to browse an unreadably large corpus based on some particular word (and so to some extent some particular theme) that interests you. Sure Augustine had stuff to say about sin and grace, but what did he think about, I don’t know, fingers?

Fingers

So a concordance helps you to find sections you might want to read more thoroughly.  But I guess it doesn’t just have to be used like that — like a kind of map, or a very comprehensive index — but could also be read in its own right, and that reading could comprise a legitimate way of encountering and gaining knowledge of the underlying text.

How might, for instance, reading every appearance of the word “light” constitute its own way of knowing how the term “light” is working within a text? Are such readings reliably productive of knowledge? Or is it more like you might get lucky and stumble on something intelligible, like how a particular word is being tugged in distinct, divergent directions by two different discourses it’s implicated in?

How do these tools actually work? Well, going by the name and a logo, a really fast clever ant just does it for you. Thanks ant!

AntConc screenshot

 

Voyant Tools is a web-based reading and analysis environment for digital texts. What does that mean in practice? When you feed it your text file, a bright little dashboard pops up with five resizable areas. Each one of these contains a tool, and you can swap different tools in and out. I’d guess there are about fifty or so tools, although I’m not sure how distinct they all are really.

Voyant Tools screenshot 1

At least one tool was very familiar: “Cirrus” in the top left corner makes a word cloud of the text you’ve inputted, with the most frequent words appearing the largest. Very common words like “a” and “the” are filtered out (in the lingo, they are “stopwords”). The bottom right tool, “Contexts,” was also pretty familiar, since it seems to be a concordance, like we’d just been doing in AntConc. “Summary” and “Trends” were pretty self-explanatory. “TermsBerry” required a bit more poking and prodding. It clusters the more frequent words near the middle, the rarer words round the edges. When you hover your mouse pointer over a word, some of the other drupelets light up to show you what other words tend to appear nearby. You can mess with the thresholds and decide exactly how close counts as “nearby.”

The “Topics” tool looks interesting. It starts with random seeds, then builds up a distinct word cluster around these seeds based on co-occurence and then tries to work out how these word clusters are distributed throughout the text. Each word cluster (or “topic”) technically contains all the words in the text, but each one is named after the top ten terms in the cluster. A few of these seem knitted together by some strong affect (“bed i’ve past lay depression writing chore couple suffering usually”) or a kind of prosody or soundscape (“it’s daily hope rope dropped round drain okay bucket bowls”). Others feel tantalisingly not-quite-arbitrary, resonant with linkages in the same way a surrealist painting is (“bike asda hard ago tried open bag surprisingly guy beard”). But I’m not sure how far I trust my instincts about these artefacts, and I definitely don’t yet know how they might be used to deepen my knowledge of a text, or how they relate to various notions you might invoke in a close reading (theme, conceit, discourse, semantic field, layer, thread, note, tone, mood, preoccupation, etc.).

The various tools on your Voyant dashboard also seemed to be linked, although I didn’t get round to fully figuring that out. Definitely whenever I clicked on a word in the “Reader” tool the other displays would change. Oh: and Voyant Tools seems to be pretty fussy, and didn’t want to run on some people’s laptops. I didn’t have any trouble though.

I got a bit sucked into trying to work out what the “Knot” tool does — it’s this strange rainbow claw waving at me — and didn’t spend much time on the last exercise, which was about regular expressions (or regex). Basically, these are conventions which let you do very fancy and complicated find-replace routines. You can search something like ‘a[a-z]’ which will match aa, ab, ac, ad, etc. Or (one of Ben’s examples) by replacing <[^>]+> with nothing, you can clear out all the XML tags in a text document. You can use regular expressions in plain old Word (just make sure you check the box in the find-replace dialogue), but they probably work a little better in a text editor like Atom or Sublime Text.

“The purpose of this part of the task is to teach you how to use them, not to teach you how to write them.” Phew! For me, regular expressions never seem to stick around very long in my memory, but it’s very useful to know in broad terms what they’re capable of. Every now and then a task pops up in the form of, “Oh my God, I have to go through the whole thing and change every …” and that’s my cue to start puzzling and Googling and figuring out whether it can be done with regular expressions. If it can, it will probably be quicker and more accurate, and it will definitely be more satisfying.

So: plenty explored, plenty more to explore. And I’m looking forward to the next workshop, Archival Historical Research with Tropy, on 19 February.

JLW