March 11, 2010

Gender Guessing

A few weeks ago, a conversation began on the New England Society of Children's Book Writers and Illustrators (NESCBWI) listserv on the accuracy of voice, specifically with regard to gender. How can someone writing across gender lines tell if a character's voice is authentic? Elizabeth Bluemle, author and bookseller, posted a link to Gender Guesser, and writers immediately began to copy and paste excerpts of their manuscripts into this tool to test how male or female their character sounded.

The tool is called Gender "Guesser" because according to the site, "[w]hile Gender Guesser may be 60% - 70% accurate, it is not 100% accurate. This is better than random guessing (50%), but should not be interpreted as 'fact'."

This begs the question, what is it about voice that sounds male or female? What do writers writing across gender need to consider?

From the website:
In 2003, a team of researchers from the Illinois Institute of Technology and Bar-Ilan University in Israel (Shlomo Argamon, Moshe Koppel, Jonathan Fine, and Anat Rachel Shimoni) developed a method to estimate gender from word usage. Their paper described a Bayesian network where weighted word frequencies and parts of speech could be used to estimate the gender of an author. Their approach made a distinction between fiction and non-fiction writing styles.

A simplified version of this work was implemented as the Gender Genie. They showed that fewer words were needed and that writing styles varied based on the forum. For example, fiction and non-fiction differs from blogs (informal writing). Even though the genres differ, there are still gender-specific word frequencies.

The claim is that a small subset of words can skew the "gender" of a writing sample, and these sets vary according to formal versus informal writing styles. The source code clearly shows how words are ranked as masculine or feminine. In the category of informal writing, the top five feminine words are: him, something, because, actually, and everything. The top five masculine words are: some, this, as, now, and good.

Two qualities leap out: first, with the exception of "him", feminine words are polysyllabic, whereas masculine words are monosyllabic. Second, masculine words are definite, e.g. "some" vs. "something," while the listed feminine words add vagueness or qualification.

Of course this only counts for 10-20% of gender guessing beyond the 50% accuracy of a coin toss. What comprises the other 30-40% of a character's voice?

Beyond the words themselves, there's the nature of thought, the types of observations a character makes, his or her metaphors for experiencing the world. Voice emerges from patterns of thought as well as from word choice and syntax, and these elements reinforce each other. It's also important to acknowledge that most people, and therefore most characters, will not register as 100% of a gender -- a certain level of ambiguity can actually (note the feminine word!) increase authenticity.

While a tool like Gender Guesser is fun, it provides a superficial measure of "authenticity." A character's voice must work in multiple dimensions which cannot be quantified easily.

March 2, 2010


Although writing is viewed as a low-tech occupation (all one needs is a pencil and paper), the relationship between writers and technology is constantly growing more complex. There are technological tools created for writers as well as tools that writers have appropriated. Publishing platforms are evolving, and many writers are expressing emotions ranging from excitement to uncertainty to fear about what new platforms mean for "traditional" storytellers.

This blog is a space in which to explore the relationship between writers and technology, covering everything from software for writers to new forms of storytelling. If you'd like me to post about a specific tool or topic, please comment below.