A better alternative to text CAPTCHAs

From Rich Gossweiler, Maryam Kamvar, & Shumeet Baluja’s “What’s Up CAPTCHA?: A CAPTCHA Based On Image Orientation” (Google: 20-24 April 2009):

There are several classes of images which can be successfully oriented by computers. Some objects, such as faces, cars, pedestrians, sky, grass etc.

Many images, however, are difficult for computers to orient. For example, indoor scenes have variations in lighting sources, and abstract and close-up images provide the greatest challenge to both computers and people, often because no clear anchor points or lighting sources exist.

The average performance on outdoor photographs, architecture photographs and typical tourist type photographs was significantly higher than the performance on abstract photographs, close-ups and backgrounds. When an analysis of the features used to make the discriminations was done, it was found that the edge features play a significant role.

It is important not to simply select random images for this task. There are many cues which can quickly reveal the upright orientation of an image to automated systems; these images must be filtered out. For example, if typical vacation or snapshot photos are used, automated rotation accuracies can be in the 90% range. The existence of any of the cues in the presented images will severely limit the effectiveness of the approach. Three common cues are listed below:

1. Text: Usually the predominant orientation of text in an image reveals the upright orientation of an image.

2. Faces and People: Most photographs are taken with the face(s) / people upright in the image.

3. Blue skies, green grass, and beige sand: These are all revealing clues, and are present in many travel/tourist photographs found on the web. Extending this beyond color, in general, the sky often has few texture/edges in comparison to the ground. Additional cues found important in human tests include "grass", "trees", "cars", "water" and "clouds".

Second, due to sometimes warped objects, lack of shading and lighting cues, and often unrealistic colors, cartoons also make ideal candidates. … Finally, although we did not alter the content of the image, it may be possible to simply alter the color- mapping, overall lighting curves, and hue/saturation levels to reveal images that appear unnatural but remain recognizable to people.

To normalize the shape and size of the images, we scaled each image to a 180×180 pixel square and we then applied a circular mask to remove the image corners.

We have created a system that has sufficiently high human- success rates and sufficiently low computer-success rates. When using three images, the rotational CAPTCHA system results in an 84% human success metric, and a .009% bot-success metric (assuming random guessing). These metrics are based on two variables: the number of images we require a user to rotate and the size of the acceptable error window (the degrees from upright which we still consider to be upright). Predictably, as the number of images shown becomes greater, the probability of correctly solving them decreases. However, as the error window increases, the probability of correctly solving them increases. The system which results in an 84% human success rate and .009% bot success rate asks the user to rotate three images, each within 16° of upright (8-degrees on either side of upright).

A CAPTCHA system which displayed ≥ 3 images with a ≤ 16-degree error window would achieve a guess success rate of less than 1 in 10,000, a standard acceptable computer success rates for CAPTCHAs.

In our experiments, users moved a slider to rotate the image to its upright position. On small display devices such as a mobile phone, they could directly manipulate the image using a touch screen, as seen in Figure 12, or can rotate it via button presses.