April 7, 2010

Wordle

As a writer, it can be helpful to visualize a story or an article (1) to find overused words and (2) to identify main concepts or themes. If the main character of a novel is named Harry and he's a wizard, those words should dominate a visualization. If the most common words are "just" or "back" or "obviously", the author should probably revise and take a careful look at word choice.

Wordle is a popular visualization tool that many writers use, although it wasn't created explicitly for writers. From the site:
Wordle is a toy for generating “word clouds” from text that you provide. The clouds give greater prominence to words that appear more frequently in the source text. You can tweak your clouds with different fonts, layouts, and color schemes. The images you create with Wordle are yours to use however you like. You can print them out, or save them to the Wordle gallery to share with your friends.
Jonathan Feinberg, creator of Wordle (full disclosure: he and my husband worked in the same group at IBM research for two years), makes explicit that Wordle is a toy. It is not a tool for analysis. The reason relates to design: Wordle doesn't only count word frequencies and generate a visualization. It lets users make these word clouds pretty. Users can play with fonts, choose colors, even select whether Wordle should take capitalization into consideration. All of these decisions affect what the viewer judges to be important.

For example, if there are two words of the same size on a black background, one in white, the other in dark purple, which will stand out more? The viewer will give the white word more "weight" than the purple word because it's brighter against the background, although technically both words appeared in the text the same number of times.

I generated the following visualizations by running my previous post on Gender Guessing through Wordle:

Because of my design decisions, different words leap out in each visualization. In the third image, "GENDER" holds the most weight due to a combination of size, color, and location in the word cloud. In the first, "writing", "gender", and "words" have approximately equal prominence. In the final image, "words" pops the most, and "feminine" is fairly bright, much more so than "masculine" although they are similar in size/weight (therefore frequency).

For writers using Wordle as a revision tool, here are a couple of tips:
  1. Make multiple visualizations.
  2. Choose all caps or all lowercase so that words like "gender" and "Gender" aren't counted separately.
  3. Include a black and white visualization for a word cloud that doesn't apply color biases.
  4. Experiment with horizontal, vertical, and mixed orientations to see how this affects your understanding of word frequency/prominence.
For writers using Wordle for fun: Enjoy! It's an interesting and illuminating exercise to view your words through a new lens.