Digital Labor

Date of Class: 2/7
First due: 2/14
Comments due: 2/21
Revisions due: 2/28

This class discussion focused on Digital Labor, specifically the labor associated with the massive yet cyptic Google Books project that we’ve analyzed in previous classes.

Our discussion started with asking if any participants had thought about how the Google Books scanning operations were conducted, and if so, what did they imagine the process to be like? A few of us realized we thought that the process would be more automated (because it’s Google!) and therefore we were surprised to find out that the books were scanned by hand as documented in this tumbler, with people turning each page before the scanner flashed. There was also some surprise about the general lack of publicized information about the scanning process from Google. Was it because Google’s project managers were self-conscious about the public learning about workers being paid $10 an hour for long shifts of page turning? (Several of us were surprised to hear about this). Were there other reasons for keeping the process under wraps? Several folks agree that details about the scanning process is “unglamorous” and probably won’t make Google look good - India compares it to Amazon’s Mechanical Turk, and Sabrina to revealing the Wizard behind the Wizard of Oz.

Another question was, “Shouldn’t Google be able to do this [scanning process] better?” We learned there is a faster way to scan books by cutting off their spines and feeding loose pages through an automated scanner (this is “destructive” book scanning versus the “non-destructive” scanning that Google was doing). But of course the libraries lending the books to Google were expecting them to be returned in a similar condition to what they lent them in, and many of these books are rare and are physical artifacts. However, some folks pointed out that even if Google had access to books that they paid for and could slice for scanning, presumably there would still be contracted labor involved for tasks like locating and purchasing books, setting books up for the slicing and automated scanning process etc.

While there has been strong critique of Google Books and its scanning operation such as those presented in the articles for this class reading, there are clear benefits to having such a large corpus of scanned books. For instance, many books and textbooks required for college classes are expensive and this may prevent some students from purchasing needed materials for class. If students know that they can access a particular book for free on Google Books, this free accessibility can support their studies. Additionally, Shambhavi points out that this can allow researchers, students, and scholars the ability to view texts that are rare and may not be able to be lent out or accessed in a particular archive.

What did we make of Google’s emphasis on search-ability and text discovery over traditional ways of “reading” a book? Searching in a book’s index seems to be the equivalent of a search bar or function today. But a disadvantage of only searching digitally would be not seeing available books and references around a particular source or in the vicinity of a book one may check out in a library - one may not be able to experience the kind of serendipitous in-person discovery of a new source. There are still benefits to analog books and physical libraries and archives.

Returning to observations of the Google Book’s scanning operation, Google doesn’t seem to operate with much cultural sensitivity to ways of reading books other than the mostly western left-to-right style of reading, which means that many of their scanned books in other languages were scanned backwards. This also applies to art that was scanned, or to pages that were more transparent which may lead to text or characters coming through the scanned pages from other pages. Often Google would simply remove texts from the scanning queue if they were deemed “too difficult” to scan. This also includes very large books or very tiny books. Many of these small books also include “pulp fiction” and novels that were cheaper and printed on material that wouldn’t age as well, which India points out leaves a large gap in our understanding of the public imagination. This raises serious questions for scholars who are interested in the literary canon and debates about which books are included in the canon and which books are excluded. Historically, books predominately from western, male, mostly white authors compose canons that are part of reading curriculums. Sabrina wonders if curriculums changed, would that influence the books that Google includes in its corpus? If the Google Books project excludes books that are in languages read in a different way (for example, Arabic is read from right to left), then this process of book elimination is reinforcing preexisting issues of homogenous western literary canons. Connected to the mistreatment of texts, Jaclyn notes that in this Wired article we read Google was described as an unofficial librarian that bibliophiles didn’t want, which is ironic considering that more recently Google appears to have lost interest in the Google books project.

We also learned about some invisible labor that we’ve all (probably) partaken in. One example is CAPTCHA, where human users respond to a test to decipher a sequence of distorted letters and numbers. Google used a version of CAPTCHA in order to digitize archives of books which contributed to the development of OCR (optical character recognition). See the Wiki on CAPTCHA

While our own invisible, albeit incrementally small labor in the case of CAPTCHA translating, has helped large companies like Google improve their digital books and OCR, we ended our discussion with three questions:

Is Google Books conscionable?
Is this work that had to get done?
Did it have to get done this quickly? Gayathri believes that the digitization of books was certainly inevitable, and thinks that if Google hadn’t decided to create their books corpus, there’s the possibility some other large private company would have taken on the task and possibly charged the public in ways that Google currently is not.