Sussex Humanities Lab Open Thursdays

SHL Open Lab Thursdays are an informal opportunity for co-working, experimentation, collaboration, and dialogue (also in the garden when the weather allows) facilitated by SHL Research Technician Alex Peverett.

We invite people to come and use the space and meet others engaged with: Technology, Creative Practice, Hacking, Making, Experimental Technology, Critical Making, Techno Feminism, Gaming, Media Archaeology, Music, Digital Art, Practice as Research, and more.

These informal sessions are following on from the ECT maker meetups for experimental & creative technology last term where Sussex students and researchers met, co-worked and skill swapped. 

Drop in, no booking required. All welcome!

Sussex Humanities Lab, Silverstone, SB211

Climate Crisis and the Digital Humanities

To coincide with COP26 in Glasgow this year, SHL is jointly running two special events with the Edinburgh Centre for Data, Culture & Society, the University of Southampton DH, and the Humanities & Data Science Turing interest group. This is just a “save the dates” announcement: more info and registration will be coming soon.

November 3rd, Digital Materialities, Digital Imaginaries

16:00-17:30 GMT Online

More information and registration here. Our speakers will explore the materiality of the digital, and ask what can be done at all levels to make our digital world more sustainable.


  • Heba Amin
  • Nathan Ensmenger
  • Wilko Hardenberg
  • Helen Pritchard

What emerging digital technologies may also play a role in mitigating and adapting to climate change, and where do the perils and pitfalls lie? How might digital technologies even change the way we think about ‘the human’ and our place within the planetary ecology? And what are the biggest questions we should be asking ourselves about digital technologies today?

This event forms part of the Sussex Humanities Lab’s Open Workshop Series and CDCS’s Autumn Seminar Series. It is open to all.

Greening the Digital Humanities

November 10th, 16:00 – 17:30 GMT  Online

While the first joint event (Nov 3rd) focuses on theories, perspectives, principles, and inspirations, the second event shifts to thinking about practice, collaboration, and community-building. In it we will discuss the issues we have encountered, the problems our community can solve, and assemble actions we will take together.

The core participants will be DH researchers, along with other key stakeholders. Together we will explore: How do we, as individuals and as organisations, address the environmental dimension of the digital technologies we use? Where should we turn to for best practice? Where is research and innovation most urgently needed, and who should be doing it? What are the politics and ethics of greening the digital, and where do the biggest controversies lie? What challenges, risks and trade-offs might we face?  As Margaret Atwood has suggested, “climate change” might be better termed “everything change.” How might the transition to a zero carbon economy transform what we do, and how we define our own expertise and responsibilities? In a world where so much is digitally entwined, and ecologically interconnected, how can knowledge and responsibility be justly distributed?

The 10th November workshop is invite only. If you are interested in participating, please contact and briefly outline your interest.

A Practical Zoom Session on Digitisation

By James Baker.

The Background

At Sussex we (and it is very much we, with Sharon Webb my main co-conspirator) run a program of weekly sessions across Year 1 of our History BA on ‘digital skills’. These start off very un-digital, with lectures on ‘what is history?’ and how historians write, sessions on why and how to reference (with a focus, of course, on Zotero). But as the year goes on they gradually transform into sessions on what it means to do History in the digital age, why we need to be ‘critical digital’ historians, and how to do Digital History in a critically informed manner.

We’ve been running this for 5 years. I wrote a short paper on it in the early days of the programme. And in 2019 we won the Royal Historical Society Innovation in Teaching Award for it. In practical terms, the sessions are a one-hour timetabled lecture slot embedded in two Year 1 core (compulsory) modules: the Early Modern World in the Autumn term and the Making of the Modern World in the Spring term. This means we get the whole History BA and joint-honours cohort (so 100+ students) every week for a year in a foundational period for their historical practice.

The Challenge

Feedback over the years has pushed us more and more towards practical sessions paired with lectures: so, one lecture on the history and theory of visualisation followed by two practical sessions in which the students do visualisation and are given questions to consider, usually some variant on ‘what is the relationship between X (a search result, a catalogue record, a visualisation) and historical reality?’. In the beforetimes running these practical sessions meant students gathering into small groups, distributing handouts, gearing tasks to mobile (mindful of inequalities of access to laptops), and lots of running up and down stairs to support learning and answer questions. Remote learning has changed that, and has seen us move towards pre-session tasks followed by live Zooms with breakouts, Q&As, and the like, and lots of opportunities for formative assessment – “put in the chat what you think was the main theme of the last lecture” – to check progress and comprehension (inspired by the Carpentries pedagogical model). This has mostly gone well, but one session was always going to be tricky: the practical session on digitisation.

This session follows a lecture on the history/process of digitisation (with a focus on the politics of why some things get digitised and not others), and is part of a strand of sessions on how primary sources are ‘made’ (that is, that they don’t just fall into the hands of galleries, libraries, archives and museums, rather they make their way from then to now via a series of historical choices and processes). The session is intended to encourage students through practice to see the gap between photographs of historical objects and historical objects, to consider the labour that mediates their access to the past, and to recognise that digital images are always faulty representations of primary sources. In the beforetimes this involved volumes of the London Illustrated News (kindly donated by Tim) being handed out and photographed, handouts, and lots of running around.

The Plan

We were keen to keep the session this year but I must admit to being initially flummoxed about how to do it remotely. Eventually I landed on the following.

First, I designed a task for them to do before the live Zoom session and put some instructions on the Canvas site (our VLE) for the module.

In this practical session we will consider the digitisation of primary sources and what that means for us historians. In advance of the session please complete the following task:

  • Find the oldest thing you own or have to hand (don’t worry to much about what is or how old it is, just pick something with which you can go through the analysis process)
  • Find a way of lighting your ‘oldest thing’ and take one or more pictures of it with your camera
  • Do a sensorial analysis of your ‘oldest thing’: what does it smell like, what does it sound like, what does it feel like (though please don’t lick it to find out what it tastes like!). Make a list.
  • Consider the differences between the photograph you took and the object. Specifically, imagine you’d only ever seen the photograph of the object, then put each of the things from you list into one of three categories and make a note of which category has the most things in it:
    1) I would have known this just by looking at a photograph of this object.
    2) I could have probably guessed this just by looking at a photograph of this object.
    3) I wouldn’t have known this just by looking at a photograph of this object.
  • If you need some inspiration, watch the video – below – of me analysing the title page from Rowlandson’s Caricature Magazine, Vol. 5 (1808), and consider the difference between the object, this ‘raw’ photograph of it, this edited version of the same photograph, and this version of it catalogued by the Met Museum.

The class

We then had the live session. The moment of truth was a poll on the results of their analysis, to which I added an Option 4: ‘I did not complete the task’. Thankfully only a handful went for Option 4 (I told them it was fine to do so, so I have no reason to think they were being dishonest), and nearly half went for option 2: in most cases they think they could have probably guessed a feature of their ‘oldest thing’ just by looking at a photograph of it.

I then asked for volunteers to talk a little about their ‘oldest thing’, what they found out about it by undertaking a sensorial analysis, and any interesting features that could not be known from the photograph alone. And as usual for Zoom sessions with open questions to a large group, I gave them two options: ‘x’ in the chat if they wanted to speak or type an answer in the chat if they’d prefer not to. I was overwhelmed by responses. The students had found, analysed and digitised a huge variety of objects: old books, posters, a chest of drawers, a board game, a bracelet they were given as a baby. And they had studiously and carefully considered the task at hand. One student told me that they might have guessed from a picture that their item of jewellery felt cold to touch, but not its weight (which they realised was surprisingly heavy). Another noted that from an image they might have assumed their boardgame was the size of a monopoly or scrabble board, but it was actually closer to A5 in size, which might say something about its use. We had a discussion about coins: how we might need to forget what we think we know about coins based on modern coins when looking at images of historic coins (e.g. coins might be softer in certain periods and places, hence coin clipping). We talked about signs of wear and use and how a well lit photograph might render those difficult to see. We talked about how smell might change over time and form during poor storage rather than when an item was in use. We talked about the fragile sound of thin paper and what that might mean for the cost or anticipated longevity of an object. We even digressed into a discussion of “the cloud”, where data is stored, and the material infrastructures of the internet.

The Future

In short we had a fantastic session and – as with the best teaching experiences – my lesson plan was largely discarded, replaced after the poll with an organic discussion driven by the enthusiasm and curiosity of the students who attended. As someone whose research traverses the digital and the material – the loss when datafied of the circumstances of production encoded in the architecture of a printed cataloguethe value of medium to understanding the message held in a digitised “Golden Age” satire – I care deeply about our students being able to navigate what it means to be a historian now. Sessions like these can be a hard sell to first year students who came to university thinking they would study the past rather than how to study the past. Sometimes when reading the feedback they give there are moments of doubt. But for every one of those there is the second year who pops by my office asking for some help with Zotero, or the final year student who runs a Twitter harvest to analyse commemorative practices, or the graduate who acknowledges going back to their Year 1 ‘digital’ lectures when analysing digitised primary sources for their dissertation. And then there are weeks like this, when a plan comes together and I’m left feeling inspired. By this time next year I sincerely hope that it will be safe to return to the classroom, to run lively and interactive practical sessions once again. But there are some aspects of this year of remote learning that I want to keep, and this session – in some form – is one of them.

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. Exceptions: quotations and embeds to and from external sources.

Some Upcoming Events

Just a glimpse of some upcoming events from SHL:

Text Analysis with Antconc, with Andrew Salway. Wednesday, 24 February at 15:00 GMT. “This workshop is for researchers who would like to use automated techniques to analyse the content of one or more text data sets (corpora), and to identify their distinctive linguistic characteristics and reveal new potential lines of inquiry. The text data could comprise thousands to millions of words of e.g. news stories, novels, survey responses, social media posts, etc.” More info here. Part of the SHL Open Workshops Series.

Dataset Publishing and Compliance, with Sharon Webb and Adam Harwood. Wednesday, 3 March at 15:00 GMT. “Funding bodies are placing increasing emphasis on data archiving in humanities research. The workshop will have a practical emphasis, aimed at helping you prepare data for deposit into a data archive or repository, to comply with grant applications requirements.” More info here. Part of the SHL Open Workshops Series.

Reality is Radical: Queer, Avant-Garde, Utopian Gaming, with Bo Ruberg, Amanda Phillips, and Jo Lindsay Walton. Monday 8 March at 17:00 GMT. “The Sussex Humanities Lab and the Sussex Centre for Sexual Dissidence are pleased to welcome leading critical game studies scholars Amanda Phillips and Bo Ruberg to explore the politics of contemporary games.Games themselves are a major cultural form, and the ‘ludic turn’ in recent years has also seen game design thinking and critical play practices spill out into many areas of social and economic life.” More info here. Part of the SHL Seminar Series.

Coming to Terms with Data Visualization and the Digital Humanities, with Marian Dörk. “How can visualization research and design be inspired by concepts from cultural studies, sociology, and critical theory? In contrast to the epistemological hegemony that engineering and science has held over data visualization, humanistic engagements with data and interfaces suggest different kinds of concerns and commitments for the study and design of data visualizations. From collaborative research in the arts and humanities arises a need to support critical and creative engagements with data and visualization.” More info here. Part of the SHL Seminar Series.

For more events, see the SHL website.

Open Workshop: Text Data Preparation

UPDATE: As you may have guessed, this event has been postponed for the foreseeable future.
Wednesday 19 February 15:00 until 17:00
Digital Humanities Lab, Silverstone
Speaker: Jack Pay
Part of the series: SHL Open Workshops

The workshop is most suitable for staff and postgraduate students, but all are welcome.

An eventbrite to sign up is available here.

The purpose of this workshop is to introduce researchers and interested parties to two key aspects of data preparation. A common problem when starting work on large scale processing of text is that it can be noisy, hard to analyse or structure in a machine readable manner.

In this workshop we will cover two common examples of problematic texts: crawled or downloaded web documents composed in html and (poorly) OCR’d texts taken from some historical corpus. The purpose of using these examples is to introduce participants to the tools and methods used in web-scraping and data wrangling.

The workshop will comprise of a presentation and semi-practical session; where the presentation will introduce the key problems and solutions to these methods and the practical session will present an illustrative example solution.

This workshop is not intended as a complete tutorial on how to prepare data, but serves as an introduction to provide participants with the information and knowledge of the potential tools to begin working on these problems themselves.

If you have any questions about this workshop, please contact the convenor, Ben Roberts.

SHL Workshops 2018-2019

The Sussex Humanities Lab offered a series of free skills-building workshops in the Digital Humanities. These took place on selected Wednesdays between 15:00 and 17:00, in the Lab.

15:00, 6 Feb – Clean Your Data. When datasets arrive, they are often vast and messy … and machines can be very fussy readers! In our first 2019 workshop, Ben Jackson explores a range of tools and techniques to tidy and transform text data, readying it for all kinds of analysis and adventure. Find out more or book a placeNote: Ben’s February workshop is a repeat of the workshop he ran in September 2017 to universal acclaim. (Update: notes from the workshop).

15:00, 19 Feb – Archival Research with Tropy. Collecting and managing photographs from archives can be time-consuming. In our second 2019 workshop, Sean Takats will introduce you to the free software package Tropy, and help you to manage the process of taking, saving, and describing photos of archival materials. Find out more or book a placeNote: This one’s on a Tuesday. (Update: notes from the workshop).

15:00, 13 March – Web Scraping with Wget. Wget is a very handy program for retrieving or ‘scraping’ material from the web. In this workshop, James Baker introduces your computer’s command line interface, and shows you how to write simple scripts to automate bulk-downloading from the web. At the end of the workshop, James will support you in scraping a website of your choice. You’ll also learn about The Programming Historian, a fantastic resource to gain more digital skills in the future. Find out more or book a place.

15:00, 3 April – Dataset Publishing and Compliance. Data repositories are already an important part of humanities research, and increasingly a requirement of humanities funding. In our fourth workshop, Sharon Webb and Adam Harwood will dive into the practicalities and best practice of research data management. We’ll explore how best to use metadata to describe and organise digital objects, touch on issues within digital preservation, and learn how to use current university infrastructure to deposit datasets. Find out more or book a place. (Update: notes from the workshop).

15:00, 10 April – Network Visualization. In an increasingly interconnected world, the network has emerged as a major category of analysis for humanities and the social sciences research.  In our fifth 2019 workshop, Andrew Salway introduces the popular dataviz tool Gephi, exploring how dry, cryptic datasets can efflorescence into colourful significance … and exposing some of the hidden choices that underlie the dataviz we encounter in everyday life. Find out more or book a place.

15:00, 1 May – Conserving Software-Based Artworks. Software-based artworks pose many challenges to conservators. Such art may involve complex systems of interconnected components and actors, aging technologies with a variable degree of significance, and boundaries which extend into their surrounding environment. In this workshop Tom Ensom and Chris King, who work closely with Tate’s collection of software-based artworks, will explore how disk imaging and emulation techniques can be used to address these challenges. Find out more or book a place.

15:00, 15 May – Data Modelling. Workshop led by Sharon Webb and Anna Maria Sichani. Data modelling considers how we interpret “things” and how we expect them to be interpreted by a computer program. This process ensures that we can manipulate, interrogate, and preserve information. This workshop will introduce participants to the theory and practice of data modelling: how do we translate “objects’, or real world entities, into “data”? Find out more or book a place.

About the series

The Digital Methods Open Workshops series is free and open to everybody. We aim to cater to all levels of ability and experience, both across the series as a whole and (as much as possible!) within individual workshops. The first two events are especially suitable for researchers who are just starting to dip their digits into the digital humanities.

Workshops normally take place in the Lab on Wednesdays 3-5; anyone who wants to can come along from 2 for some extra support in getting set up.

Prior to the 2018-2019 series, past series covered skills and tools related to digital forensics, machine vision, corpus linguistics, web-scraping, augmented reality, the integrated visualization of qualitative and quantitative knowledge, and much much more.

Publishing Your Research Data

Everyone is talking about the season finale. Tomorrow at 3pm in the lab, Sharon Webb and Anna Maria Sichani will be giving the last in this run of Digital Methods Open Workshops, on the topic data modelling. All are welcome, and there are still places available.

Meanwhile, maybe it’s worth a quick catch-up? So here are few no-nonsense notes from Sharon Webb and Adam Harwood’s absolutely brimming SHL Open Workshop in April, on the topic of creating and publishing research data.

What is research data?

Sharon started us off. Who should think about publishing research data? Everybody! Yes, you humanities researcher. Data is not just numbers. Of course, what counts as research data varies considerably across different subjects, methodologies, topics, backgrounds, and habits. Some classic examples of research data-sets might be a set of measurements, or interview recordings and transcripts … but it’s also worth thinking more imaginatively and speculatively about what constitutes your research data.

One working definition of data might be all the relatively raw information you generate as a researcher in your processes of abstraction and categorization. Formally, that might include text documents (PDF, Word, RTF), spreadsheets, databases, posters, slide decks, sound recordings of field interviews, online lectures, recordings of engagement and knowledge exchange events, podcasts. That might include software, art, music. That might include metadata — data about data.

It got me thinking … do I have research data I don’t even know about?

Why publish it?

What do you encounter, and what processes do you follow, that might be useful to preserve and document for future research? Where might there be opportunities for citation, for citizen research, for collaboration, for audit and validation? What new research might it make possible? What new research might it inspire? Even, perhaps, what creative and artistic interventions? There is a slightly subversive and democratic aspect to all this: making the data public benefits independent researchers. This was one of the real revelations of the workshop for me: just how much fascinating information is already publicly available.

There is of course a slightly more straightforward and pragmatic aspect to all this: the UKRI funding bodies now ask for a data management plan. For example, an AHRC standard route grant will require a data management plan “for grants planning to generate data (3 A4 pages maximum).” The AHRC have recently done away with their technical plan requirement. Other funders (e.g. Marie Curie) ask for a technical plan, and there may be an assumption that any data management considerations will be included there. 

Funders don’t generally accept the sentence, “My data is available on request.” Of course, there may be legitimate reasons for not making data available. Researchers should be aware of GDPR and the Data Protection Act. “Personal data” means any information relating to an identified or identifiable living individual. DMP Online structures the process of writing a data management plan, drawing on the specific guidance of the chosen funding bodies.

And, fwiw, Sussex also has a policy — “research data should be made freely and openly available with as few restrictions as possible in a timely and responsible manner […] regardless of whether or not the research is externally funded.” That said, there isn’t an actual Research Data Management Police Force roaming campus, as far as we’re aware.

How should I publish it?

One thing to consider is when you will deposit your data. Around grant writing proposal stage, it’s good to build in some time to actually prepare and deposit it. It can be a big chunk of work to get research data ready to be ingested by repositories. It’s not all mindless/mindful gruntwork either: there can be thorny questions around how to curate your data to make it useful for others and for long-term preservation. I can imagine there might be some interesting cross-disciplinary issues arising, and questions about how the framing of data blurs into its analysis and interpretation.

And where to deposit data? “Figshare Sussex probably,” seems to be the short answer. More broadly, it depends who you are affiliated with, and what their policies are. The Research Data Management service (a work-in-progress) may also answer some questions. There are institutional repositories, big generalist repositories, and domain-specific repositories, and there are different governance and funding models (i.e. public vs. commercial). Here are some handy links:

  • UKDataArchive, funded by the ESRC, is “the UK’s largest collection of digital research data in the social sciences and humanities.”
  • is a database of repositories (incomplete, but filled with good starting points).
  • FAIRsharing is another database of repositories, with more of a sciences and medicine emphasis.
  • The Journal of Open Humanities Data features peer-reviewed articles describing data and methodologies with high re-usability.
  • Zenodo is an open-access repository from CERN and the OpenAIRE program, with some similarities to Figshare. It runs on open source software (also called Zenodo).
  • Then there’s Figshare, of which Figshare Sussex is a part.

All Figshare content will be assigned a DOI; CC BY 4.0 is the default license. Figshare also allows you to create and share ‘Collections,’ bringing together relevant datasets (whether or not they’re yours). What you upload to Figshare Sussex will get sucked up to Figshare mothership, which is indexed by Google. 

You can also put on an embargo, a fact that for some reason gave me a lovely frisson of melodrama, “My data shall not be available … for ONE HUNDRED YEARS,” etc., and you can generate private links to share VIP access to embargoed data.

Your data will be backed up to Arkivum, which meets another common funding requirement, that the data will be preserved beyond the lifetime of the project. Arkivum keeps your data in three separate geographical locations. It doesn’t do file format shifts yet, but as part of The Perpetua Project (ominous energy), it eventually will do file format shifts as well.

Further background

The RCUK Concordat on Open Research Data explains precisely what open research data is, and what researchers can do to make their data open and freely accessible. It’s a long document, and Adam picked out a few key bits. The Concordat asserts the right of the creators of research data to reasonable first use. Support for development of appropriate data skills is recognised as a responsibility of all stakeholders — the university has a responsibility to provide useful services (which in our case is Figshare Sussex, as well as the emerging Research Data Management service).

Adam also touched on the FAIR Data Principles, originally intended for the sciences, but now with much wider adoption. Data should be Findable, Accessible, Interoperable, and Re-usable. These are criteria you can measure your data against toward the end of a research project.

We ended with a whirlwind tour of metadata from Sharon. ‘Metadata is a (love) letter to the future — it makes explicit how “things” can be used.’ Anne Gilliland: ‘In general, all information objects, regardless of physical or intellectual form they take, have three features — content, context, and structure — all of which should be reflected through metadata.’ We had a little look at the Dublin Core standard, on which Figshare is based (if you’re going to add fields of your own, it would be best practice to align them to Dublin Core) and a case study, the Re-animating Data Project (70+ interviews carried out in 1989).

I left with a grateful heart and and a brimming brain, forgetting to sign in again. And also a vague unease. I got to thinking about all the data exhaust I leave behind, in the course of my research and my “research,” all the behavioural surplus I scarcely control and could never deposit, and yet which is deposited somewhere in the marketplace of personal data.

And I was thinking about how data at scale tends to disclose more than ever intended. Data analytics discover patterns that can be used as knowledge, and whose status as knowledge is often undecidable in the contexts in which they are used. I was thinking of those robots and researchers who can gaze hungrily at the “About Me” section of your social media profile, seeing not your attempt at self-expression, but only trait correlation with lemma term-frequency–inverse document-frequency (or whatever). Tech giants have learned the art of growing tall and fat on data crumbs; what will they do with data feasts?



Data Cleaning and Preparation

Earlier today Ben Jackson gave the first in this semester’s series of digital methods open workshops. Here are a few rough notes on what we covered. If you missed the workshop and want to try out some of it on your own, you can find the tasks here. For details of forthcoming workshops, go here. All workshops are free and open to everyone (but it helps if you register).

Ben started us off with a rapid hurtle through some of his recent and ongoing projects (slides), including his collaboration with Caroline Bassett exploring ways of analysing and visualising Philip K. Dick’s writing (counting electric sheep, baa charts), and work bringing to life the text data of the Old Bailey Online. (It’s a tremendously rich archive of nearly 200,000 trials heard at the Old Bailey between 1674 and 1913). Bringing to life, and also bringing to unlife: Ben uses a kind of estrangement effect to remind the observer of what the data isn’t telling us, populating his legal drama puppet show with a cast of spoopy skellingtons.

Ben Jackson puppet show

Most of the workshop was a free exploration of prompts and tools Ben pulled together. People basically tried out whatever they liked, while he glided from table to table rendering assistance.

Calibre is a free ebook manager that is also a bit of a Swiss army knife, and it just so happens one of its fold-out doohickeys is a very good ebook-to-plain-text converter. Ebook files (.AZW, .EPUB, .MOBI etc.) are stuffed with all kind of metadata that usually needs to be cleared away before you can do any analysis on the raw text itself. We also did something similar with another free tool, AntFileConverter, turning PDF into plain text. The lesson was that documents can be ornery and eccentric, and different converter tools will work differently and give rise to different glitches: “no single converter that will just magically work on every document.”

AntFileConverter is part of a family of tools. We also checked out TagAnt and AntConc. I feel like I only scratched the surface of these. TagAnt creates a copy of a text file with all the grammatical parts-of-speech tagged. So if you input something like “We waited for ages at Clapham Junction, with the guard complaining about people blocking the
doors” you get something like “We_PP waited_VVD for_IN ages_NNS at_IN Clapham_NP Junction_NP ,_, with_IN the_DT guard_NN complaining_VVG about_IN people_NNS blocking_VVG the_DT doors_NNS ._SENT” as output. PP is a personal pronoun, VVD is a past tense verb, IN is a preposition or subordinating conjunction, and so on. By itself this just seems to be an extremely pedantic form of vandalism. It does let you fairly easily find out if, for example, an author just loves adverbs. And tagging parts of speech could be the first step toward more interesting manipulations, for creative purposes (shuffle all the adverbs) and/or analytic purposes (analysis of genre or authorship attribution).

AntConc allows you to create concordances. A concordance is (more or less) an alphabetical list of key terms in a text, each one nestled in a fragment of its original context. So it’s a useful way to browse an unreadably large corpus based on some particular word (and so to some extent some particular theme) that interests you. Sure Augustine had stuff to say about sin and grace, but what did he think about, I don’t know, fingers?


So a concordance helps you to find sections you might want to read more thoroughly.  But I guess it doesn’t just have to be used like that — like a kind of map, or a very comprehensive index — but could also be read in its own right, and that reading could comprise a legitimate way of encountering and gaining knowledge of the underlying text.

How might, for instance, reading every appearance of the word “light” constitute its own way of knowing how the term “light” is working within a text? Are such readings reliably productive of knowledge? Or is it more like you might get lucky and stumble on something intelligible, like how a particular word is being tugged in distinct, divergent directions by two different discourses it’s implicated in?

How do these tools actually work? Well, going by the name and a logo, a really fast clever ant just does it for you. Thanks ant!

AntConc screenshot


Voyant Tools is a web-based reading and analysis environment for digital texts. What does that mean in practice? When you feed it your text file, a bright little dashboard pops up with five resizable areas. Each one of these contains a tool, and you can swap different tools in and out. I’d guess there are about fifty or so tools, although I’m not sure how distinct they all are really.

Voyant Tools screenshot 1

At least one tool was very familiar: “Cirrus” in the top left corner makes a word cloud of the text you’ve inputted, with the most frequent words appearing the largest. Very common words like “a” and “the” are filtered out (in the lingo, they are “stopwords”). The bottom right tool, “Contexts,” was also pretty familiar, since it seems to be a concordance, like we’d just been doing in AntConc. “Summary” and “Trends” were pretty self-explanatory. “TermsBerry” required a bit more poking and prodding. It clusters the more frequent words near the middle, the rarer words round the edges. When you hover your mouse pointer over a word, some of the other drupelets light up to show you what other words tend to appear nearby. You can mess with the thresholds and decide exactly how close counts as “nearby.”

The “Topics” tool looks interesting. It starts with random seeds, then builds up a distinct word cluster around these seeds based on co-occurence and then tries to work out how these word clusters are distributed throughout the text. Each word cluster (or “topic”) technically contains all the words in the text, but each one is named after the top ten terms in the cluster. A few of these seem knitted together by some strong affect (“bed i’ve past lay depression writing chore couple suffering usually”) or a kind of prosody or soundscape (“it’s daily hope rope dropped round drain okay bucket bowls”). Others feel tantalisingly not-quite-arbitrary, resonant with linkages in the same way a surrealist painting is (“bike asda hard ago tried open bag surprisingly guy beard”). But I’m not sure how far I trust my instincts about these artefacts, and I definitely don’t yet know how they might be used to deepen my knowledge of a text, or how they relate to various notions you might invoke in a close reading (theme, conceit, discourse, semantic field, layer, thread, note, tone, mood, preoccupation, etc.).

The various tools on your Voyant dashboard also seemed to be linked, although I didn’t get round to fully figuring that out. Definitely whenever I clicked on a word in the “Reader” tool the other displays would change. Oh: and Voyant Tools seems to be pretty fussy, and didn’t want to run on some people’s laptops. I didn’t have any trouble though.

I got a bit sucked into trying to work out what the “Knot” tool does — it’s this strange rainbow claw waving at me — and didn’t spend much time on the last exercise, which was about regular expressions (or regex). Basically, these are conventions which let you do very fancy and complicated find-replace routines. You can search something like ‘a[a-z]’ which will match aa, ab, ac, ad, etc. Or (one of Ben’s examples) by replacing <[^>]+> with nothing, you can clear out all the XML tags in a text document. You can use regular expressions in plain old Word (just make sure you check the box in the find-replace dialogue), but they probably work a little better in a text editor like Atom or Sublime Text.

“The purpose of this part of the task is to teach you how to use them, not to teach you how to write them.” Phew! For me, regular expressions never seem to stick around very long in my memory, but it’s very useful to know in broad terms what they’re capable of. Every now and then a task pops up in the form of, “Oh my God, I have to go through the whole thing and change every …” and that’s my cue to start puzzling and Googling and figuring out whether it can be done with regular expressions. If it can, it will probably be quicker and more accurate, and it will definitely be more satisfying.

So: plenty explored, plenty more to explore. And I’m looking forward to the next workshop, Archival Historical Research with Tropy, on 19 February.