Sunday, January 30, 2011

How recommender researchers test their algorithm and make the system smarter?

FastCompany 今日介紹 RichRelevance 首席科學家 Darren Vengroff想出一個讓研究人員測試推薦系統演算法的方法,其實這個點子很簡單,將真實世界的資料包裝成一個黑箱,讓研究者上傳程式,使用這個機制測試演算法的好壞。
There are many holy grails in online commerce, but one that has frustrated C-level executives and engineers alike is how to produce better recommendation algorithms. Produce better recommendations, and you’ll sell more stuff.
Historically, however, there has been one major structural impediment to making significant breakthroughs on this front. But the Chief Scientist at RichRelevance, which provides personalization solutions for the likes of Walmart, Sears, and, may just have fixed that. 
First, however, the impediment: The people who are likely to produce breakthroughs--the really smart smarty-pants in the math departments of the world’s universities--don’t have access to large bodies of real-world data. And without real-world data, they can come up with as many hypotheses and new types of math as they like, but they’ll never really know if it actually works in the real world. It’s like trying to learn how to serve without tennis balls. You can swing as much as you like, but until you actually hit a real-live ball, you can never be sure if your swing would actually place a ball in the serve box. 
For their part, the people who have real-world data--the Amazons and eBays of the world--can’t share it with the researchers for reasons of customer privacy. “Even if we anonymize it, we’re handcuffed because we can’t give out data that can be reasonably be used to reconstruct who someone really is,” the Chief Scientist, Darren Vengroff, tells Fast Company.
Vengroff, however, has come up with a novel solution: He’s created a “black box” of sorts with real-world data that researchers can use to run experiments on. Researchers won’t be able to look at the data, but they will be able to dump their algorithms in and have the box spit out results, which the researchers can then use to refine their hypotheses. 
It’s a simple idea, but it wasn’t really possible to execute until the advent of the cloud. Now researchers from any part of the globe will be able to use the system to run experiments. (In principle, of course--in practice, a committee will vet proposals and choose which ones will actually run.) 
Vengroff, who once worked as a Principal Engineer at Amazon, says he got the idea for the project while attending a computing conference last fall. “In one of the sessions, there were three consecutive papers in a row where about two-thirds of the way through, I was really excited about what was being presented, and then they went down a different path than I thought they were going to go,” he says. “I realized if they only knew what the real-live data, that I look at every day, says, they wouldn’t have gone that way. They would have gone the right way and gotten to a much better solution.”
“Seeing these brilliant ideas get misapplied because of a very reasonable assumption about how shoppers might shop, but happens not to be true in the empirical data--I realized we’ve got to find a way to have this not happen anymore.” 
The system, which is in beta right now, will launch next month at conference in Palo Alto. Says Vengroff: “We’ve got a significant new path that we think is really going to change things.”
E.B. Boyd is’s Silicon Valley reporter. Follow me on Twitter, or email me.

No comments:

Post a Comment


~ 林徽因 · 馬雁散文集 · 蓮燈 ~ 馬雁 在她的散文《高貴一種,有詩為證》裡,提到「十多年前,還不知道林女士的八卦及成就前,在期刊上讀到別人引用的《蓮燈》」 覺得非常喜歡,比之卞之琳、徐志摩,別說是毫不遜色,簡直是勝出一籌。前面的韻腳和平仄的處理顯然高於戴...