You are here
Building Word Clouds to Generate Search Terms
© 2014 Jeff Boruszak. Word cloud generated by Voyant Tools.
Help your students get an overview of their topic and a leg up on their research by creating word cloud visualizations of their topics.
This assignment is meant to help students get an overview of their chosen research topic early in the semester by using VOYANT, a suite of word visualization tools available for free online. To use Voyant, users first create a "corpus," or a body of text documents, and upload them to the Voyant website. From there, users can use a host of visualization tools to perform "distant reading" on their texts. The most commonly used tool is Cirrus, a word cloud generator that creates an image from the most commonly used words in the corpus. With Cirrus, users can see at a glance what topics are discussed in a given body of text. Voyant can also perform functions such as: grouping distinctive words in texts together, allowing users to see not just how often words are used, but which words are commonly used in conjunction with one another; displaying words in context, so that users can easily see the words that immediately precede and follow a given term; and word trending functions showing the relative frequency of words between individual documents in a corpus.
There are many possibilities for this using this tool in class, and I developed this exercise for introductory or intermediate writing students to use Voyant to being their research. The goals for this exercise are:
- Give students time to use databases and practice fundamental search skills
- Force students to build a bibliography of potential sources early in the semester, giving them a body of text to come back to and examine more closely
- Generate keywords that will facilitate research further down the road
- Find ways to develop their research topics by seeing what their intitial searches yield.
After introducing students to a few databases (Lexis Nexis will be one of the better options for this exercise), students must build a small corpus of documents that are relative to their topic (they will have to read titles or opening paragraphs to see if they are relevant or duplicate texts). After uploading them to Voyant, they will "read" their topic from a distance, answering questions that will assist them in developing their understanding of their Voyant results.
Familiarize yourself with Voyant (http://voyant-tools.org/) and its documentation (http://docs.voyant-tools.org/). The easiest way to figure out how the program works is by uploading any sort of sample corpus, and playing with the different tools to see what they do. You may also want to run through the assignment yourself to see if you have any trouble yourself (I built and uploaded a corpus in under 10 minutes to practice. Students will take longer).
Step One: Build a Corpus
On your desktop, create a folder names "corpus".
Then choose a database and conduct a search on your topic. In order to build a usable corpus, you will want to find articles that are at least 500 words long (Lexis Nexis allows searching with word length minimums, so it will be a good place to start). You will have to vet your search results by reading both the title of the article and the first few sentences to make sure that they are relevant to your topic, and not duplicates of other articles in your corpus.
Your goal is to build a corpus of 20 documents. When you have found an article you feel would be useful, copy the text of the document and paste it into Notepad (Windows) or Microsoft Word (Mac). Save the document, giving it the name "01". Make sure you are saving the document as plain text (.txt) for the file type/format. Set the save location as the corpus folder on your desktop.
In a separate text document, copy and paste the URL of the article. This will not be used for this assignment, so you should not save it to the corpus folder. HOWEVER, you should use the compilation of URLs you are saving for further research later in the semester--at the end of this assignment you will have 20 articles you could go back to and read more closely.
Repeat this process for all 20 documents, making sure to name them "02," "03," and "04," through to "10," "11," "12," etc. When your corpus folder has the twenty .txt documents in it, you can move on to the next step.
Step Two: Upload Your Corpus
Right-click your corpus folder. Click "compress" (for mac) or "add to archive" (for windows) and create a zip file of your corpus. Then, go to the Voyant website (http://voyant-tools.org/). Click "Upload," then "Add," and the choose corpus.zip from your desktop.
Step Three: Using Voyant
The first thing you should do is find the permanent URL to your corpus. On the blue bar at the top of the page, click the save button next to the question mark. This will bring up and export menu. Select the first option (Export a URL for this tool and current data). Copy the URL and paste it into your address bar.
Your initial results will probably not look interesting. It will says that "a" "the" and "and" are the most common words in your corpus. We have to apply a "stoplist" to limit these common words from the results. At the end of your URL, add the following text:
Reload the page, and your corpus should now visualize correctly. Spend some time exploring your corpus. In particular, examine the word cloud in the top left. Also look under the "Summary" section on the left side. Scroll down and look at the lists of distinctive words.
Step Four: Questions
Post answers to the following in a discussion forum on Canvas.
Looking at your word cloud:
What are the most common words in your articles?
Are there words you would exclude from future searches?
Are there any unique words?
Looking at "distinctive words," can you give a name to any groups of distinctive words?
In general, do you see any ways to narrow down your topic?
Step Five: Final Actions
Send yourself an email, attaching your document of URLs, and the URL for your Voyant corpus, so that you can come back to it later if necessary.
Strict evaluation is not necessary. This is meant to be a learning exercise for students. Some will get through all of their questions, while others will only have just built their corpus. Depending on the importantce of keyword/search term generation, you may extend the assignment as homework if necessary.
Here are some sample corpuses on a marijuana controversy that I have generated as examples:
This exercise was developed for RHE 306, an introductory writing course. It is widely adaptable by topic.
Medical Marijuana: http://voyant-tools.org/?