Google Books and Culturomics
- Date of Class: 1/24
- First due: 1/31
- Comments due: 2/7
- Revisions due: 2/14
In this session, we discussed what services like Google books and BYU Corpa can offer to our understanding and knowledge of the world that science simply cannot. We talked about how this idea, or “culturomics”, or basically analyzing writing to find patterns that shed light on human behavior is a very powerful tool for our understanding of history, pop culture, and politics. To better understand this, we looked at ngram searches that could explain these areas, like “coffee” vs. “tea” to see how culture has shifted from events like the Boston Tea Party to the establishment of Starbucks. We also explored things beyond culture; we explored stereotypes like “smelly Indian” and could see which races (Native American versus Eastern Indians) they were most likely referring to in their work. We looked at an array of ngrams, and the rest of the significant ones are listed in leader notes of the schedule but another significant ngram we looked at was “woman‘s consent” and this had a really telling graph of how rape culture has changed over the years and that added to the discussion talking about the value of culturomics. Some of the conclusions we came to were as follows: 1) we saw how “women’s consent” changed from the context of property and children into a more sexual misconduct context over time 2) we saw how powerful ngrams were in looking at lingustic changes (etc. Carnegie-Mellon vs. Carneige Mellon, color vs. colour, grey vs. gray) 3) we also discussed how history plays a role in the use of words over time. For example, speakeasy and prohibition existed before the prohibition started but now these terms have different meanings. 4) we also saw how context is super important. we looked at the chart of suicide, depression, cancer, etc… but we failed to be able to see what context these words were used in. Was it health depression or an economic depression? Was is a life suicide or a social suicide? The sheer volume of these words also makes it near impossible to individually search for, unline women’s consent where we were able to look at the context. Then we started talking about the limitations of Google books, specifically the sample size, OCR error, limitations after 2000s, and also problems in searching for what we want (human errors in analysis). This discussion led to thinking about how much we can base our conclusions on ngram culturomics in particular and which sources digital humanists might find to be more reliable.