Bridging the gap between computational linguistics and concept analysis

by Eddie O’Hara Brown

“Bridging The Gap” by MSVG is licensed under CC BY 2.0.

One of our priorities at the Concept Analytics Lab is to utilise computational approaches in innovative explorations of linguistics. In this post I explore the disciplinary origins of computer science and linguistics and present areas in which computational methodologies can make meaningful contributions to linguistic studies.

The documented origins of computer science and linguistics place the fields in different spheres. Computer science developed out of mathematics and mechanics. Many of the forefathers of the field, the likes of Charles Babbage and Ada Lovelace, were first and foremost mathematicians. Major players in the development of the field in the twentieth century were often mathematicians and engineers, such as John von Neumann and Claude Shannon. On the other hand, linguistics developed out of philology and traditionally took a comparative and historical outlook. It was not until the early 20th century and the work of philosophers such as Ferdinand de Saussure, when major explorations into synchronic linguistics began.

The distinct origins of computer science and linguistics are still visible in academia today. For example, in the UK and other western universities, computer science is situated in STEM subjects, and linguistics often finds a home with humanities and social sciences. The different academic homes given to linguistics and computer science often poses a structural barrier to interdisciplinary study and creation of synergies between the two disciplines. 

Recent research shows that the merging of linguistic knowledge with computer science has clear applications for the field of computer science. For example, the language model BERT (Devlin et al., 2018) has been used by Google Search to process almost every English-based query since late 2020. But we are only just beginning to take advantage of computational techniques in linguistic research. Natural language processing harnesses the power of computers and neural networks to swiftly process and analyse large amounts of texts. This analysis complements traditional linguistic approaches that involve close reading of texts, such as narrative analysis of language, discourse analysis, and lexical semantic analysis.

One particularly impressive application of computational linguistics in the analysis of semantic relations is the word2vec model (Mikolov et al., 2013). word2vec converts words into numerical vectors and positions them across vector space. This process involves grouping semantically and syntactically similar words and distancing semantically and syntactically different. Through this process corpora consisting of millions of words can be analysed to identify semantic relations within hours. However, this information, as rich as it is, still needs to be meaningfully interpreted. This is where the expertise of a linguist comes in. For instance, word2vec may identify pairs of vectors between which the distance increased across different time periods. As linguists, we can infer that the words these vectors represent must have changed semantically or syntactically over time. We can rely on knowledge from historians and historical linguists to offer explanations as to why that change has occurred. We may notice further that similar changes occurred amongst only one part of speech, or note that the change first occurred in language of a particular author or a group of writers. In this way, the two fields of computer science and linguistics necessarily rely on each other for efficient, robust, and insightful research.

At the Concept Analytics Lab, we promote the use of computational and NLP methods in linguistic research, exploring benefits brought by the convergence of scientific and philological approaches to linguistics. 

References

Devlin, Jacob, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova (2018) ‘Bert: Pre-training of deep bidirectional transformers for language understanding.’ Available at https://arxiv.org/abs/1810.04805

Mikolov, Tomas, Kai Chen, Greg Corrado, and Jeffrey Dean. (2013) ‘Efficient estimation of word representations in vector space.’ Available at https://arxiv.org/abs/1301.3781

What is Concept Analytics and who are we?

Concept Analytics Lab (CAL) gathers linguists, AI engineers, and historians and is aligned with Sussex Humanities Lab within the Critical Digital Humanities and Archives research cluster. The principle mission behind Concept Analytics is to understand human thinking by analysing conceptual layering in texts. We overcome the divides between humanities, AI, and data science by harnessing the power of computational linguistics without losing sight of close linguistic analysis. 

Although CAL was formally set up in 2021, its existence is the culmination of research energies over the previous few years and our desire for a stable space to explore concept-related ideas with like-minded scholars.  Establishing the Lab has provided us with a platform from which we showcase our research expertise to researchers and other external partners. CAL has grown and changed through 2022, during which time we have counted on a team of six researchers at a range of stages in their careers, from undergraduate to postdoctoral level. The team is led by Dr Justyna Robinson. 

CAL has so far partnered with research groups within Sussex, e.g. SSRP, as well as ones further afield, e.g. Westminster Centre for Research on Ageing. We have worked closely with Archives such as Mass Observation Archive and Proceedings of the Old Bailey, as well as non-academic organisations. 

What were the highlights of the past year? 

Our activities in the past year centred around exploring the content of the Mass Observation Project (MOP) and their Archive of May 12 Diaries with the aim of identifying conceptual changes that happened during Covid-19. We have completed two main research projects. CAL was awarded funding through the UK-RI/HEIF/SSRP call Covid-19 to Net Zero, in collaboration with industry partner Africa New Energies, to identify the impact of Covid-19 on people’s perceptions and habits in the context of household recycling and energy usage. CAL was also commissioned by the PETRA project (Preventing Disease using Trade Agreements, funded by UKPRP/MRC) to discover key themes and perceptions the public holds towards post-Brexit UK trade agreements. Keep reading for summaries of the findings of these research projects, as well as our other achievements this year. 

Household recycling with Africa New Energy (ANE) 

Through this project we identified that respondents to the MOP Household Recycling 2021 directive were deeply committed to recycling, but that these feelings were coupled with doubt and cynicism in relation to the effectiveness of the current system. MOP writers pointed to a perceived lack of transparency and standardisation in recycling processes and systems. Lack of transparency and standardisation have also been identified as obstacles to recycling adherence and efficacy in more policy-based analytical surveys (Burgess et al., 2021; Zaharudin et al., 2022). Changes in recycling habits among the UK population were identified as resulting from external factors, such as Covid-19 and reduced services, as well as lack of knowledge about how and what can be recycled. This research has significantly impacted the way our grant partner ANE approach their operations in terms of gaining energy from organic waste content. The research results also led ANE to start work on gamifying the waste classification process. It aims to encourage recycling compliance by replacing the current sanction-based system with a more rewards-based system. This research shows that the CAL already has a track record of establishing commercial routes of impact for our research and we see extending the scope of this impact to be a critical next step in CAL’s research programme. Further details on the collaboration with ANE can be found in this blog post.  

We are seeking further HEIF funding to expand on the work already done with the Household Recycling directive to maximise policy impact by processing the handwritten answers and also processing the 2022 12 May Diaries for insight into the impact of the current energy crisis on respondents’ behaviour and attitudes to energy. As part of this project we would hold an exhibition in which we would invite various stakeholders including policy makers to showcase our work. 

MOP UK Trade Deals 

We were commissioned by the PETRA project’s lead Prof. Paul Kingston from the University of Chester to perform a conceptual linguistic analysis of the MOP UK Trade Deals directive. We used our approach to identify hidden patterns and trends in the answers to the directive questions. The conceptual analysis allows us to combine quantitative with qualitative methods and identify otherwise unperceived patterns. The main themes that arose were related to the perceived quality of trade deals and concerns about animal and ethical standards. We also performed an analysis linked to people’s knowledge, belief and desires. The results of the analysis will inform policy makers in their decisions regarding trade deals. Additionally this piece of work has attracted some interest from public health bodies with whom we are preparing a potential grant for future research. 

Papers and presentations 

In 2022 Justyna Robinson and Julie Weeds both presented the work they did within the context of the Old Bailey archives and have had their paper on that work published in the Transactions of Philological Society. In this paper they describe a novel approach to analysing texts, in which computational tools turn traditional texts into a corpus of syntactically-related concepts. Justyna Robinson and Rhys Sandow also have authored a paper forthcoming in 2023, ‘Diaries of regulation: Mass Observing the first Covid-19 lockdown’. This research will be presented at Mass Observation’s 85th Anniversary Festival, Mass Observation Archive, The Keep, 23rd April 2023. 

Website 

As part of the SSRP/HEIF funding we received earlier this year we have also developed a website, which can be found at conceptanalytics.org.uk, where we also post blogs with news pieces and short research insights. 

SHL Priority Areas: Intersectionality, Community and Community Technology research journey 

By Sharon Webb 

In my last blog I discussed the Sussex Humanities Lab’s priority research areas, and the thinking behind their implementation. This time around, I’d like to focus on the priority area that I lead — Intersectionality, Community and Community Technology (ICCT).  ICCT was conceptualised to bring together and to create coherency across a cluster of activities, and to highlight some of the values that inform the Lab’s work and how we operate as a research community. ICCT also reflects the way in which the Lab, its culture, its people, have offered me a focus that has helped develop and broaden my research profile, and the collaborations I am part of.

Since joining Sussex in 2015, I have integrated my work on digital preservation and digital archives with community archives and heritage work. From 2018, and upon reflection motivated by the explicit feminist values of our original leadership team — particularly Caroline Bassett, Tim Hitchcock, and Rachel Thomson — Cécile Chevalier and I have developed research and teaching that incorporates techno-feminism, and intersectional/queer/feminist Digital Humanities, with an investigation of these histories, alongside practical and creative interventions such as coding workshops and creative coding initiatives.

More recently, Irene Fubara-Manuel and Sandra Nelson have joined us in these efforts. Both contribute to our ‘Techno-Feminism: History and Practice’ MA module and have developed a programme of work for our ‘Feminist Approaches to Computational Technology’ network called Reflexive Re-Tooling: Alternative Workflows for the Feminist Researcher. Irene is also Co-I on ‘Full Stack Feminism in Digital Humanities’, alongside Cécile and I. (You can follow each project on Twitter: @FACT_ntwrk, @FullStackFem). Kate O’Riordan, Dean of the School of Media, Arts and Humanities, has also informed the way in which this research, and community, has developed. It’s also important to acknowledge the inclusive intersectional feminist frameworks, networks, and research that already existed at Sussex: my work and the Lab have benefited from these.  

In many respects the ICCT priority area reflects a research journey. It echoes the myriad ways that we build capacity around clusters of research and how we build community, connections, and networks that are valuable not only in terms of research output but research environments and cultures. The ways in which we manifest our research as individuals become part of a larger collective conversation — and that is the point!

Highlighting and centring community in this area was important. “Community” in this context is not, I hope, empty virtue signalling, but instead echoes a long tradition of working with community groups at Sussex and at the Lab. It acknowledges that perspectives outside of our academic circles should be included. These perspectives have a place within academic work, and are equally important, and sometimes more important, than the perspectives of professional academics. It also encourages us to think more about non-traditional research methods, outputs, and ways of disseminating, about the collective benefit of our research, and about new ways of listening and responding.  

In this regard, I was particularly inspired by the artists who took part in our Brighton Digital Festival event in Nov. 2021. ‘Subverting Digital Spaces’ was co-organised by the Lab (under the umbrella of the ICCT priority area), the Full Stack Feminism in Digital Humanities project, and Laurence Hill, visiting fellow at the Lab and Digital Curator of the Full Stack Feminism project. Artists and activists Teresa Braun, and Jake Elwes both spoke to subverting  

traditional digital platforms … [of] … queering datasets and developing digital tools for social intervention. Collectively, they … [draw] … from Intersectional, Black, Feminist, Queer and Trans activisms to create online spaces that challenge normative social constructs and their omissions.

Subverting Digital Spaces

Both artists represent aspects of ICCT and of the ways in which performance within and across digital spaces can subvert dominant narratives around gender, sexuality, and race. Interacting with technology, like machine learning and AI (particularly deep fake technologies in the case of Jake) highlighted not only how these technologies can be “queered,” but also the way in which queer and intersectional feminism have a role to play in questioning, disrupting and challenging digital spaces and technologies, spaces and technologies that often promote or amplify far-right sentiment and ideals of “normality” (or heteronormativity). Both artists investigate drag performance to subvert technical spaces, as a means to disrupt data sets based on normative bodies and normative abstracted models of the world (as represented/propagated through training data sets in AI, machine learning, and/or neural networks, for example).

What transpires from these experiments and performances are powerful interventions that highlight social, cultural, and techno-social inequities, imbalances, accompanied by methods or ways to subvert these. Jake’s work especially resonated – in Jake’s words, what happens when you introduce 1,000 images representing queer expressions, bodies, drag queens, drag kings

into a standard homogenised data set of 70000 images of human faces which is used as a standard to train facial recognition systems … which contain very little of this otherness? … [The resulting output] shifts all of the weight in this neural network from a space of heteronormativity into this space of queerness and queer celebration.

Jake Elwes

Jake questions whether we want to be in included in these systems, or whether we want to break them, to queer them. In this sense autonomy and agency within and over representation merge with questions of technological surveillance and acquiesces (or consent and unconsent). The big questions here are what models of the world are we building, what models do we have control over, and what models are impacting our engagement (or disengagement) with our world? How do technologies (AI, machine learning, neural networks) reduce our world to classifications and binaries, and indeed how do they perpetuate old systems of classification and categorisation? Both artist presentations offer useful and unique moments of reflection about the digital world we live in – or the digital world that is imposed upon us.  

As a research cluster, ICCT (Intersectionality, Community and Computational Technology) brings together and highlights the manifold ways the digital world (imposed on us) has the capacity and potential to be as systemically unjust, bias, and dis-enfranchising as our “analogue” world has historically proven to be. Yet, (on a more positive note) it also highlights the potentials of individual, project, and community interventions, often collaborative, to mitigate this harm and transform our digital environments and spaces. In this regard, the Lab’s open workshop, organised in collaboration with FACT and under the ICCT umbrella, to celebrate Ada Lovelace day (Oct. 2021), Building a Feminist Chat bot, as well our seminar withProfessor Patricia Murrieta-Flores, ‘The future of the past. The development of Artificial Intelligence and other computational methods for the study of Early Colonial Mexican documents’, highlights some of these interventions and ways of working. Both consider the ethics of building tools using AI and machine learning algorithms. In particular, Building a Feminist Chat Bot which stems from an ongoing collaboration between FACT, the Reanimating Data Project, Suze Shardlow and the Lab, centres a feminist ethics of care with relation to building tools and interfaces. It “builds a chat bot” but this is probably the least important aspect of the work –instead the process of building, the collective coding and skills sharing, are more important than the end-product. Centring work around a feminist ethics of care is not always easy. It requires additional resource and can become emotionally challenging – but it is worth it. Values of care are not maternalistic but instead centred on values of listening, of making space, of empathy (for those in the group as well as those you are building “for”), of ethics. It is a way of working that ideally should be embedded in how we do research anyway, but as a method makes these approaches explicit.  

SHL’s ICCT priority area includes intersectionality — not as a diversity-waving add-on (see Sara Ahmed On Being Included: Racism and Diversity in Institutional Life (2012)), but as a means of working in an ethically, feminist, community, and queer informed approach. Full Stack Feminism in Digital Humanities (a two-year AHRC-IRC funded project) explicitly draws upon intersectional feminism and is investigating how we embed these values within and across Digital Humanities practice and research. It explores what feminist DH methodologies look like and how we can develop a framework to encourage their inclusion through the life cycle of digital projects and creations. Intersectionality has become — alongside equalities, diversity, and inclusion — a bit of a catch-all term, but we use it with intention. It outlines our positionality as a lab, as a research community. In many ways, it also recognises our aspirations, and recognises the need to constantly think and rethink who we are and who we, as the Lab, want to be.  

You can find details and some recordings of some of the events mentioned here in the Lab’s past events listing. For more information on our research projects please visit, and for information on other priority areas.

SHL Priority Areas — what are they and why?

A short reflection one year on 

By Sharon Webb 

In 2021 the Sussex Humanities Lab, one of the University’s four flagship research programmes, reviewed and re-evaluated its research structure. In an effort to amplify voices within the Lab, and to attract new voices and contributors from outside of it, we devised eight so-called priority areas that reflect current research and the expertise of our members. These priority areas allow us to highlight our research and provide a structure for our seminar and open workshop series, as well as a way to support strategic research development and grant capture. A year in, we are reflecting on how this structure has or hasn’t worked. Either way, through this structure we have managed, despite Covid challenges, to develop a programme of work which has provided crucial points of discussion, dialogue, debate, and growth.  

Our priority areas aim to further build research capacity across the University and to provide entry points to new Lab associates and to the wider community. We recognise that for some it can be difficult to know exactly what the Lab “does,” and we hoped our priority areas would help demystify that. The fact is, we do a lot: we are diverse, and we work in such an agile manner that it can be difficult to pin us down – this has its advantages and disadvantages!

We define ourselves as a Lab because we are a space of doing, of experimenting, of making (watch this space for a co-authored chapter on this very topic soon). Our collaborations cut across boundaries and as a group we all work in an explicitly transdisciplinary and interdisciplinary fashion. Our work is also value-driven, with a concern for ethics, equalities and diversity work, and by social justice and sustainability issues. In that regard, we are driven by a set of values explicitly written into the fabric of the University of Sussex, and indeed values embedded in our home school, the School of Media, Arts and Humanities. It is probably no surprise then that many of our priority areas reflect these values and concerns, cutting across disciplines and subject areas – such as ‘Philosophy of AI’ or ‘Uncertainty and Interpretability of AI’ , led by Beatrice Fazi (MAH) and Ivor Simpson (EngInf) respectively. ‘Experimental Ecologies’, led by Alice Eldridge (Music), is concerned with developing wider disciplinary understanding our (human and other organisms) environmental relations in the anthropocene, where the biosphere and technosphere are irrevocably linked.  In this way ‘Experimental Ecologies’ aims to foster:

post-disciplinary research where arts and humanities, natural and computational sciences, traditional indigenous knowledge, and everyday local experiences have an equal footing in addressing key environmental issues at human-environment interfaces.

In this area, “an equal footing” is key, and this perspective and outlook informs much of work in other priority areas developed by Lab members. My own area for example, ‘Intersectionality, Community and Computational Technology’ (ICCT) highlights, challenges, and disrupts the way in which computational technology reproduces and reinforces various inequalities in society. It is concerned with, reflective of, and feeds into the value system of the Lab but it is also concerned with research that is driven by perspectives of equity and inclusion. Above all it is community driven, and its foundations are born from collaborative work with queer and intersectional feminist communities and research praxis – community perspectives are on par (on an equal footing) with academic ones. This priority area reflects existing work within the Lab, specifically through the ‘Feminist Approaches to Computational Technology (FACT) Network, the ARHC-IRC funded network grant, ‘Intersectionality, Feminism, Technology and Digital Humanities’ (IFTe), whose overaching objective is to:

‘un-code’ gendered assumptions, question our digital environments and systems, and embed intersectional feminist methods and theory within DH with a view to the creation of new DH futures

And more recently, ‘Full Stack Feminism in Digital Humanities’, a two-year project jointly funded by the Arts and Humanities Research Council (UK) and the Irish Research Council and part of their ‘UK-Ireland Collaboration in Digital Humanities Research Grants Call. This project aims to develop feminist praxis, methodologies, and ethics from within and across Digital Humanities projects and research. “Full stack” means we are concerned with issues related to inequalities in DH that span from the infrastructure layer to the representation layer – it reaches, and cuts, across all types of environments. In this sense, the Lab’s priority areas represent critical mass of research that grows through engagement within and across the Lab. 

You can read about all our priority areas and ways that you might get involved here: 

Our priority areas represent things that we care about, things that we want to grow, areas we want to foster and nurture. They are not static or fixed but rather a means for us to articulate our priorities but as we know priorities change as we as individuals, as members of society, as colleagues in a School/University develop. We nurture these areas not for the Lab’s own benefit but for the benefit of those that engage with us.  

So, reflecting a year on, does the structure work? Maybe it doesn’t matter what structure we have if the right conversations are happening, if the right collaborations are developing, and if ultimately our members, our community feel involved. Our research structure can only be judged by the collaborations and research they foster, and in this regard, I think we’re not doing too bad!  

Prepping Robo_Op (2021)