Saturday, January 23, 2010

Sikuli: Visual Image Search and Automation

Just found this amazing paper: Sikuli: Using GUI Screenshots for Search and Automation
From MIT's Tom Yeh, Tsung-Hsiang Chang, and Robert C. Miller.



The idea is simple: how often you wished you had the possibility to use images instead of words?
They say an image is worth a thousand words, and sometimes that is actually the case.

In the graphical computer world we live in, there are things that just can't be translated into words, and using images would greatly simplify some tasks.

From their abstract:
We present Sikuli, a visual approach to search and automation of graphical user interfaces using screenshots. Sikuli allows users to take a screenshot of a GUI element (such as a toolbar button, icon, or dialog box) and query a help system using the screenshot instead of name.

But that's not all... imagine you want to create a script to automate some task on your computer, and you have to press an icon that may be positioned in different places?

Something like... this:

Sikuli also provides a visual scripting API for automating GUI interactions, using screenshot patterns to direct mouse and keyboard events. We report a web-based user study showing that searching by screenshot is easy to learn and faster to specify than keywords.

This kind of tools would allow you to easily do some complex image processing tasks, such as - for instance - a baby monitoring script, that would check if the baby was sleeping face up.



A really interesting project we should all keep an eye on: Sikuli!

No comments:

Post a Comment

Related Posts with Thumbnails

Amazon Store