Experimental Methods in Cultural Analytics

The readings from this section discuss how the computational tools available inform the research done in the field of digital humanities, and lay out some possible best practices for making the most of computational tools for historical analysis while ensuring the creation of meaningful and scientifically rigorous research.

If you read one paper from this section, I’d suggest Theorizing Research Practices We Forgot to Theorize Twenty Years Ago. In this paper, the authors discuss how tools/research activities as commonplace as full-text search affect the results of humanist research. Pre-search, a humanist researcher might have formed the hypothesis that ‘blush’ and ‘shame’ are often found together in poetry, laboriously gone through enough poems to prove/disprove this hypothesis, and then written a paper about the results. However, with access to full-text search functionality, it’s possible to just keep coming up with word associations to test ex blush/rosy, or blush/furious, etc until you find a result that seems meaningful. In this way, the search is “not just a finding aid; it’s analogous to experiment,” and there’s certainly something “dubious about experiments that get repeated until they produce a desired result” (65). There are several other important implications about the increased access that search offers: With new access to exponentially more sources than one might have been able to easily find by hand, how many sources do you need to validate a hypothesis? When we sort by relevance to our query, do we filter out valuable information that we otherwise might have encountered in archive based research? Is it possible that search “only shows you what you already know to expect”? (66)

The authors discuss some other tools that might be used as an equivalent to search that could reduce the impact of prior assumptions. For example, topic modeling (algorithmic clustering of terms that occur in the same contexts) can reveal patterns of association not easily guessed, or that change across time. There are pros and cons for all of these algorithmic approaches to digital humanities research, and it’s important to understand how they work, and how initial assumptions and input by human researchers can change the results they generate.

Additional readings in this section dive deeper into how to use computational tools in the study of history, and how to add meaning along with the numbers that computational tools generate. The pamphlet A Quantitative Literary History of 2,958 Nineteenth-Century British Novels: The Semantic Cohort Method discusses the shifts in language and description of social spaces (houses, etc) between 1800-1900, as well as the methodologies the authors used to elucidate these shifts, and their learnings from the research project. The findings of the paper are really interesting, but I want to focus on the researcher’s takeaway on how to “go from numbers to meaning” (46) while doing computational historical research, and their suggested best practices.

The authors suggest that within most data-based historical research there is both a signal – “the behavior of the feature actually being tracked and analyzed,” (48) – and a concept, “the phenomenon that we take a signal to stand for, or the phenomenon we take the signal to reveal” (49). Successfully bridging and explaining the gap between signals and the concepts that they’re thought to represent is an important part of ensuring that research has both numbers and meaning. The authors also elaborate on some lessons learned from their research that overlap with many of the insights we’ve discussed in class, including: The importance of collaboration “especially with those who have had extensive training in working with data” (50) How helpful it is to “translate and visualize the data in forms that are more immediately interpretable to us as scholars of literature and culture” (50) There’s a “tendency toward validation” (48) when doing data analysis – ex believing that the is vs are chart reflects the unification of the US after the Civil War because it fits within the historical narrative as we understand it, instead of digging deeper to understand that the picture created is not that clear.

In Argument Clinic, Professor Weingart discusses the importance of providing not just data about past trends, but context about the data and results that allows readers and other researchers to understand the meaning and impact of the analysis discussed. For example, a data trend in which the number of people who eat apples increased by 10% over five years is much less meaningful if the population also increased 10% during that time. This article also discusses the importance of triangulating any argument you make using historical data – that is, backing up your argument/approaching the problem using many different angles and tools to strengthen your results.

It’s also interesting to see how the use of different tools, approaches, and hypotheses can generate seemingly opposing results. In The Transformation of Gender in English-Language Fiction, the authors discuss two trends they observed in their research – 1) and 2) , along with the different methodologies, hypotheses, and approaches they used to validate them. This was also a super interesting read, and I’d highly recommend at least skimming it, but I’m at 1k words so I’m not going to go into detail.