Text Mining

A Word Cloud of all the words I have used in all of my blogs.

I will shout from the heavens that I love voyant-tools! Before Voyant I didn’t have much of a concept or appreciation for text mining. I was first introduced to Voyant by Laura McGrath when I was a TA for the Mellon Scholars Program at Hope College. She was a English Major and much of her work involved distant reading and extensive text analysis, so she built a lot of the Mellon seminar around working with data sets, utilizing text and data analysis tools, and producing visualizations. I was blown away at the speed, user friendliness, and insightfulness of Voyant. I love the visuals with colors, the interactivity, the multilingual capacity, and the embedding functions. Such a brilliant tool! Okay, rant over. I was inspired and chose to incorporate this as a unit in the Mellon seminar when I taught it the following year as a post-bacc. (See Blog Tutorial here) I had my students do a simple activity to introduce them to Voyant and similar methodologies by having them read a Edgar Allen Poe short story about a murder mystery, “Murders in the Rue Morgue.” When they came to class having read the story we then ran it through Voyant to see what the tool might reveal about the text that was unexpected, and whether or not we could see any representations that were accurate reflections of the text. For example, looking at terms like “murder” and “weapon” in the Terms Berry visualization revealed in-text close presence with “monster” and “animal” which corresponded with the murderer actually being an orangutan (spoiler).  Additionally, since I intentionally chose a murder mystery, could Voyant reveal any crucial plot points, who the main character is based on name recurrence and so on? The Word Cloud matched the close reading that the window was key to unraveling the mystery and that Dupin is the main character. It was a fun, light exercise but feedback from students was great in getting their feet wet with what text analysis and mining could do.

In general I have not conducted extensive research using text mining but through projects like Ben Schmidt and Mitch Fraas’s “Mapping the State of the Union,” I have come to appreciate its potential, as well as the nuances and intentional decisions that get made about key terms selected. Lisa Rhody’s “The Story of Stop Words,” did an exception job of bringing to the forefront many of the nuanced decisions and implications of said decisions in the realm of text mining. I think that is one of the main takeaways I have for this seminar thus far: that every facet of a project involves intentional decision-making, which prompts the reality that bias in research at all stages, whether conscious or not, is quite possible, and one should actively interrogate their decisions, and document them. In fact looking back, before exposure to Voyant, Brandon Walsh ran a workshop for the 2017 annual Undergraduate Network for Research in the Humanities conference; there he introduced the concept of text mining, walking us through “bags of words” and what it means to make selective choices about which words to include and the impact such choices can have. It seems that because the Digital Humanities incorporates technologies there is a false perception of the field that this scientific technology makes research more objective. I think this is wrong on two accounts: one, scientific research is value-laden and not as objective as one may want to think (as my Philosophy of Science course would attest, see Robert Merton’s Social Theory and Social Nature but also Safiya Umoja Noble’s Algorithms of Oppression highlights the biased, prejudiced nature of human-made technologies many take for granted as objective, like Google searches). Two: the number of perhaps small, but still significant, deliberate choices that have impact on the scope, nature, results, and effects of a project within the Digital Humanities through using such technologies to me indicates a higher degree of subjectivity, and this, I’d argue, is extremely important to remember.

This week I played around with an idea I mentioned in class: putting my own papers into Voyant to see if I can identify a particular style. Not the most relevant to my research, but text mining is not exactly in my area anyway, so I went for something different and relevant for me as an emerging scholar. I made the following decisions:

  • I chose the final papers I wrote for my first semester graduate school seminars
  • I did not include the bibliographies
  • I did include the Titles of the papers
  • My citation format is MLA so there are in-text parenthetical citations

This Word Cloud is fascinating to me. It definitely fits with what the content of the papers are, but it’s also helpful to track recurring themes in my work, and possible topics for my dissertation.

I think the most interesting results of the Terms Berry is showing the relationship of church, which occurs with Maracle, indigenous, and structure, and the word spiritual occurring with music, physical, and knowledge. This has reignited my passion in these subjects and given me an interesting lens into my writing style.

I also noticed that the Bubblelines tracked the main terms across all of the papers, noting that language and body were the most consistently-used term throughout all of my writing. Perhaps Philosophy of Language is where I am headed…

Leave a Reply

Your email address will not be published. Required fields are marked *