Text analysis in History Harvests
Research Question
- How can the analysis of diction and syntax help researchers connect seemingly different individuals or groups?
By dissecting the transcribed interviews of our history harvest participants, we started to build a broader understanding of descriptive speech patterns, specifically regarding their unique objects. Whether it be varying socio-economic, political, geographic, or cultural backgrounds, the individuals consistently attached sentimental meanings to their objects of all shapes and sizes. Understanding the relationship between owner and object, particularly the way the owner speaks about their object, is key to understanding how the pair interact daily. With this information in mind, we aimed to pair interactions and descriptive speech patterns together. Essentially, we are slowly creating a model that would predict how unlike individuals could come together over similar feelings, actions, and object use. With the city of Bloomington as our example, we intended to identify and explain any trends that present themselves successfully.
Analysis Process
- Collect interview transcripts
The collection of the transcripts went smoothly. Thanks to the class, and additional help transcribing the interviews fairly early on in the process, it was as simple as having a folder full of them added to our Text Analysis Box folder.
- Separate objects into three different groups: Handcrafted, Manufactured and The Outlier
The creation of object groups and separation of said groups was not demanding. Although, it was time-consuming to carefully think about the groups that best represented the body of objects, and in turn, sort them accordingly.
- Group and upload transcripts onto Voyant
Uploading our transcripts into Voyant was simple… but not at first. This was an area that the group struggled with initially. The reason for this? Unfamiliarity with the tool at hand. We weren’t exactly sure how to approach uploading multiple transcripts into Voyant. Originally, we decided to sort all transcripts according to item type and then copy and paste them into one single document for the group. This sounded good at the time, but presented challenges later in the process. Eventually, we had to add transcripts as they were finishing up the transcribing process. Because of this, we needed to add to our documents. But—what had we already added to this mega-document, and what had we not added? How would we get these new documents into our existing Voyant pages? While it was possible to sort through what was already in the documents and add to our existing Voyant pages, it was a very tedious task. Learning from our mistakes, we eventually sorted the items by transcript, but put each unique document into folders for each category. Finally, we downloaded these folders from Box and uploaded entire folders at a time to Voyant.
- Make a list of stop words for each of the different groups using Voyant
In digital text analysis, stop words are the most commonly used words in a given language. In turn, it makes sense logically that stop word lists are sorted out before the processing of the natural language data. Removing them before running the analysis provides a more clean and efficient use of a program like Voyant. However, it should be stated that stop word lists are ironic. Our group expected them to be relatively straight forward and provide definite results. Yet, the stop word lists did present some challenges initially. While the use of different curvy quotes such as ‘vs.’ usually is not even recognized by human readers, computers will always catch it. Consequently, we learned that we needed to double-check our stop word lists for both sets of curvy quotes to avoid skipping over any common conjunctions that used an odd curvy quote. Additionally, we realized that there is no such thing as a complete stop word list. When should you stop? What is too specific, and is that even a thing? To combat that uncertainty, we elected to create basic stop word lists for our three categories of household, apparel, and miscellaneous. From there, we expanded them based on their specific words.
Object Groups: Handcrafted, Manufactured and The Outlier
Handcrafted
Manufactured
The Outlier
Malachi, we decided, was in need of his own category. He did not fit in either of the above categories, seeing as how we cannot manufacture or handcraft a living animal. In his owner’s interview, she talked about how she rescued him from a shelter, and how he allows her to leave the house and socialize with others by taking him on walks and to the dog park, pushing her out of her comfort-zone. With her interview, and seeing Malachi, there was not a way we could fit him in either handcrafted or manufactured.
Other Groups for the Objects
When we started to look at the objects more closely, we started to see that they could be grouped into smaller categories, too. These categories are: Tradition/Ritual/Habit, Accomplishments, International Culture, Family, and Art/Artwork.
These groups were chosen because of the way their contributors talked about their individual objects and the main focus words we saw appear when we ran a meta-corpus analysis. You can see above in the Handcrafted category that some of the largest words are family, culture, home, tattoo and in Manufactured, we see some similar words appear. From these, we decided that the categories above could yield another level of depth.
Note that each of these word clouds contain the 55 most common words for each group of transcripts, post-stop word list. We also decided to keep the word People out of the stop-word list. This was because we felt that seeing that how our contributors viewed themselves in relation to other people and cultures was important to show.
Tradition/Ritual/Habit
When we look at these items, we can see words like wearing, started, hunting, Christmas, and birthday all start to appear. Alone, these words may not seem like much, other than maybe holidays and activities. However, when we see words like phone, lipstick, coffee, friends, and family appear just as large, it tells us that all of these objects have to do with some sort of ritual, tradition, or habit that people have. Traditions, rituals, and habits play a daily role in our lives. If we take coffee for instance, it is no secret that many people drink coffee on a daily level, be it before work, at work, at school, in a coffee shop, etc. This we can classify as both a habit and a ritual: habit for it being “everyday” and ritual for it being part of their morning routine. If we take birthday or Christmas, this is where we see tradition come into the fold. Many people have their own traditions for special occasions; going to church and hanging lights for Christmas, and lighting and blowing out candles for a birthday celebration are just two examples, and both of these also cross into ritual also, by singing songs (“Happy Birthday”), or a candle lighting/tree lighting during a Christmas celebration at home or at church.
Accomplishments
When we look at this word cloud, we can see words like undergraduate, school, special, degree, graduation, and celebrate appear. What we see in these, even though the sample size for this category is small, is that all these objects pertain to accomplishments in some matter. We have a Ph.D. pin, sashes from graduation, a candle from a graduation ceremony, and a book that was published, all showing a major moment in their contributor’s lives.
International Culture
When we look at this word cloud, the first words that stand out are brazil, samba, culture, dance, and different. Seeing this, we notice that international culture played a big role in the ways that people talked about their objects. Brazil was not the only location where these objects had cultural significance; Japan, Mexico, Africa and Puerto Rico also had items with significance shown. This allowed us to see a multitude of things, the most profound being that people, even though they reside and feel connected to the Bloomington/Indiana University communities, also felt connected to the cultures where their items either originated from or where they are traditionally involved in the culture’s lives, festivities, etc.
Family
As anticipated, when we see this word cloud, family is the largest word. Other words that appear largely are mom, cousins, kids, auntie, and dog. What this cloud tells us is that people view their objects as having a connection to their family, be it the classic definition of family (mom, dad, siblings, etc.), or what they consider their family. This also gives us knowledge of how something like a painting, dog, or blanket can remind someone that they are part of a larger, more personal community.
Artwork
Lastly, in this word cloud, we can see a variety of words appear: tattoo, plate, cleats, alebrija and some colors and ways art is expressed: colors, black, show, blue. What this category represents is the different ways that someone expresses their identity. We had people bring us objects that represented culture, connections, and community in the forms of tattoos, a plate that doubles as a musical instrument, a beach wrap that represents Oshun, a god in Brazilian culture, and a picture of the Gorn from Star Trek that expresses a continuing love for inside jokes between a man and his mother.
In Conclusion
Text analysis allowed us to take information that people gave us in their interviews and use that information to make connections about what, or in one case, who, they contributed. The way they talked about their contributions allowed us to separate them into three main categories, and from there, six additional subcategories, seeing that some categories had contributions that spanned multiple ones. These categories let us visualize how seemingly different objects and people can be connected through their objects and their words.