The tool is called Gender "Guesser" because according to the site, "[w]hile Gender Guesser may be 60% - 70% accurate, it is not 100% accurate. This is better than random guessing (50%), but should not be interpreted as 'fact'."
This begs the question, what is it about voice that sounds male or female? What do writers writing across gender need to consider?
From the website:
In 2003, a team of researchers from the Illinois Institute of Technology and Bar-Ilan University in Israel (Shlomo Argamon, Moshe Koppel, Jonathan Fine, and Anat Rachel Shimoni) developed a method to estimate gender from word usage. Their paper described a Bayesian network where weighted word frequencies and parts of speech could be used to estimate the gender of an author. Their approach made a distinction between fiction and non-fiction writing styles.The claim is that a small subset of words can skew the "gender" of a writing sample, and these sets vary according to formal versus informal writing styles. The source code clearly shows how words are ranked as masculine or feminine. In the category of informal writing, the top five feminine words are: him, something, because, actually, and everything. The top five masculine words are: some, this, as, now, and good.
A simplified version of this work was implemented as the Gender Genie. They showed that fewer words were needed and that writing styles varied based on the forum. For example, fiction and non-fiction differs from blogs (informal writing). Even though the genres differ, there are still gender-specific word frequencies.
Two qualities leap out: first, with the exception of "him", feminine words are polysyllabic, whereas masculine words are monosyllabic. Second, masculine words are definite, e.g. "some" vs. "something," while the listed feminine words add vagueness or qualification.
Of course this only counts for 10-20% of gender guessing beyond the 50% accuracy of a coin toss. What comprises the other 30-40% of a character's voice?
Beyond the words themselves, there's the nature of thought, the types of observations a character makes, his or her metaphors for experiencing the world. Voice emerges from patterns of thought as well as from word choice and syntax, and these elements reinforce each other. It's also important to acknowledge that most people, and therefore most characters, will not register as 100% of a gender -- a certain level of ambiguity can actually (note the feminine word!) increase authenticity.
While a tool like Gender Guesser is fun, it provides a superficial measure of "authenticity." A character's voice must work in multiple dimensions which cannot be quantified easily.