WUSTL professor Weinberger receives NSF CAREER award

Young faculty researcher to develop software that allows computers to learn similarities between images, text or sound

Weinberger

Kilian Q. Weinberger, PhD, assistant professor of computer science & engineering in the School of Engineering & Applied Science at Washington University in St. Louis, has won a prestigious Faculty Early Career Development Award (CAREER award) from the National Science Foundation (NSF).

The awards are given “in support of the early career-development activities of those teacher-scholars who most effectively integrate research and education within the context of the mission of their organization” with the goal of “building a firm foundation for a lifetime of integrated contributions to research and education.”

Eighteen CAREER awards are currently “active” at Washington University in St. Louis.

Weinberger will use the projected five-year, $440,000 award to perfect a type of machine learning that could be useful for a broad array of applications.

Weinberger’s CAREER project, “New Directions for Metric Learning,” seeks to solve one of the fundamental problems of machine learning: how to compare individual texts, images or sounds. If an algorithm could perfectly determine whether two instances of a data type are similar or dissimilar, most subsequent machine learning and data analysis tasks would become trivial, he says.

“A common similarity measure between two data instances is the total squared difference of their attributes,” Weingberger says. “With this metric, similar instances end up close together and dissimilar instances are far apart. Although this distance is a convenient and intuitive measure of similarity, it ignores the fact that the meaning of similarity is inherently task-and data-dependent.

“For example, one person might be interested in organizing articles by author, whereas a second might organize them by topic. Given the nature of their respective tasks, both should use very different metrics to measure document similarity.”

To deal with this difficulty, domain experts adjust their data representations by hand — but this is not a robust approach. It would be better if a software program could “learn” the metric (or data representation) that works best for each specific application, and this is the approach Weinberger plans to take.

“Such a metric can be learned,” Weinberger says, “by mapping the digital representation of the data into a high-dimensional representation, which is then deformed to move similar points closer together while keeping dissimilar data instances apart.

Weinberger’s birthday cake, made by a research colleague, is a schematic representation of data where similar points are pulled close together and dissimilar points moved apart by the deformation of the space they occupy. On the cake, the green points have moved closer to the green dot in the center, whereas the blue and red points have moved farther away from it.

“For illustration, imagine a sheet of paper with little dots. In a three-dimensional space you can fold the piece of paper such that any two points become close. If you want to move many pairs of points together at the same time, you might need several thousand dimensions. “

Weinberger expects medical screening will be one of the first applications for the new metric learning methods.

For the educational component of his grant, he plans to develop a K-12 curriculum module about machine learning, which he hopes will show students how fundamental mathematics is to the technologies they use in daily life.

Weinberger joined the WUSTL faculty in 2010 after a stint as a research scientist at Yahoo Research in Santa Clara, Calif., where he worked on spam filtering algorithms, multimedia search, high-dimensional data analysis and machine learning. His work on metric learning has won several outstanding paper awards.

Weinberger earned his bachelor’s degree in mathematics and computer science in 2002 from Oxford University in England, and his master’s and doctoral degrees in 2004 and 2007 in computer science from the University of Pennsylvania.