Wednesday, December 17, 2008

Netflix Progress Prize for 2008 宣布了

Netflix Prize 官方網站在12月10日宣布,今年(2008)的年度成就獎頒給 BellKor in BigChaos.

It is our great honor to announce the winner of the Netflix Progress Prize for 2008 as team BellKor in BigChaos for their verified just-in-time submission on Sept 30 at 21:17:40 UTC achieving a 9.44% improvement over Cinematch. We congratulate the team of Yehuda Koren, Robert Bell and Chris Volinsky of AT&T Research Labs combined with Andreas Töscher and Michael Jahrer of Commendo Research for their superb work integrating many significant techniques to achieve this result.

In accord with the Rules the team has prepared a system description consisting of two papers, which we both make public below. We will be awarding the Prize in a presentation at the Netflix offices in Los Gatos on December 17, 2008 at 4pm. Andreas Töscher and Michael Jahrer will present a public talk at that time about their Prize algorithm. We will post a video of that presentation via the Forum.

BellKor 團隊在網站上提供該團隊所發表與本次競賽有關的論文,供有興趣的讀者下載參考:


  • The BellKor 2008 Solution to the Netflix Prize. This is the document which lays out our overall strategy - as was required in the rules of the competition in order to claim the Progress Prize.

  • Factorization Meets the Neighborhood: a Multifaceted Collaborative Filtering Model. KDD 2008..

  • Recent Progress in Collaborative Filtering. RecSys 2008

  • Factor in the Neighbors: Scalable and Accurate Collaborative Filtering. submitted

  • Chasing $1,000,000: How We Won The Netflix Progress Prize. ASA Statistical and Computing Graphics Newsletter. Volume 18, Number 2.

  • Lessons from the Netflix Prize Challenge. SIGKDD Explorations, Volume 9, Issue 2.

  • The BellKor Solution to the Netflix Prize. This is the document which lays out our overall strategy - as was required in the rules of the competition in order to claim the Progress Prize.

  • Scalable Collaborative Filtering with Jointly Derived Neighborhood Interpolation Weights. ICDM 2007.

  • Improved Neighborhood Based Collaborative Filtering. KDD 2007 Netflix Competition Workshop.

  • Modelling relationships at Multiple Scales to Improve Accuracy of Large Recommender Systems. KDD 2007 .

  • Tuesday, December 16, 2008

    Reading List: Diversity in Recommenders

    Daniel Lemire 在上個月整理他認為與推薦系統的多元推薦輸出(diversity of recommendation lists)有關的文獻,有些讀者在留言裡也提出他們的建議。初步過濾之後,我把自己感興趣的文章,用 CiteULikeRefworks 的輸出功能,製作IEEE 格式書目如後,作為備忘查考之用:

    [1] C. Clarke, M. Kolla, G. Cormack, O. Vechtomova, A. Ashkan, S. Büttcher and I. Mackinnon, "Novelty and diversity in information retrieval evaluation," in SIGIR '08: Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 2008, pp. 659-666.

    [2] D. Fleder and K. Hosanagar, "Blockbuster Culture's Next Rise or Fall: The Impact of Recommender Systems on Sales Diversity," SSRN eLibrary, 2008.

    [3] D. Fleder and K. Hosanagar, "Recommender systems and their impact on sales diversity," in EC '07: Proceedings of the 8th ACM Conference on Electronic Commerce, 2007, pp. 192-199.

    [4] L. Iaquinta, M. de Gemmis, P. Lops, G. Semeraro, M. Filannino and P. Molino, "Introducing Serendipity in a Content-Based Recommender System," Hybrid Intelligent Systems, 2008. HIS '08. Eighth International Conference on, pp. 168-173, 2008.

    [5] Q. Le and A. Smola, "Direct Optimization of Ranking Measures," Apr 2007. [Online]. Available: http://arxiv.org/abs/0704.3359.

    [6] D. Lemire, S. Downes and S. Paquet, "Diversity in open social networks," 2008.

    [7] L. Mcginty and B. Smyth, "On the Role of Diversity in Conversational Recommender Systems," 2003.

    [8] S. Mcnee, J. Riedl and J. Konstan, "Being accurate is not enough: How accuracy metrics have hurt recommender systems," in CHI '06: CHI '06 Extended Abstracts on Human Factors in Computing Systems, 2006, pp. 1097-1101.

    [9] K. Swearingen and R. Sinha, "Beyond algorithms: An HCI perspective on recommender systems," 2001.

    [10] Y. Xu and H. Yin, "Novelty and topicality in interactive information retrieval," J. Am. Soc. Inf. Sci. Technol., vol. 59, pp. 201-215, 2008.

    [11] C. Zhai, W. Cohen and J. Lafferty, "Beyond independent relevance: Methods and evaluation metrics for subtopic retrieval," in SIGIR '03: Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Informaion Retrieval, 2003, pp. 10-17.

    [12] F. Zhang, "Research on Recommendation List Diversity of Recommender Systems," Management of e-Commerce and e-Government, 2008. ICMECG '08. International Conference on, pp. 72-76, 2008.

    [13] M. Zhang and N. Hurley, "Avoiding monotony: Improving the diversity of recommendation lists," in RecSys '08: Proceedings of the 2008 ACM Conference on Recommender Systems, 2008, pp. 123-130.


    如果需要下載這些文章的電子檔,請到筆者的 CiteULike 資料庫(tag: Diversity)查看論文下載位址的細節資料。

    Saturday, December 13, 2008

    [Updated] 使用決策樹作股票預測

    Data Mining Research 最近發表了一系列使用決策樹作股票預測的文章,目前(2008/11/12)已經發表到第五篇《風險評估》,我會依照發表進度更新本文。除了本系列專文之外,Data Mining Research 還發表不少與資料挖掘研究趨勢以及技術探討的文章,讀者有興可到這個部落格挖挖舊文章說不定會有些意外收獲哦。


    (First Published on 2008/10/14 : Last Updated on 2008/12/13)

    Thursday, December 4, 2008

    [書摘]:『帝國』的佳釀

    我從不知道喬治亞生產白蘭地,不過卡普欽斯基(Ryszard Kapuscinski)的帝國:俄羅斯五十年告訴我們一件顛撲不破的真理:釀酒,如同每項藝術一樣,你必須有品味,其餘的則會隨之而來。

    帝國-俄羅斯五十年 Book Cover
    I. 初遇 - 南方 一九六七 - 喬治亞

    並非每個人都知道白蘭地之怎麼來的,想要製作白蘭地,你需要四樣東西:葡萄酒、陽光、橡樹和時間;如同每項藝術一樣,在這些之外,你還必須有品味,其餘的則會隨之而來。

    秋天葡萄收成期後,就開始製造葡萄酒,把酒倒進橡木的桶子裡,白蘭地所有的秘密都藏在橡樹的年輪當中。橡樹成長,把陽光收集到自己樹幹裡頭,就像琥珀沉澱在海底一樣。陽光慢慢沉澱進橡樹年輪,這段漫長的過程,持續好幾十年。年輕橡木製造出來的桶子是生產不出好得白蘭地。當橡樹成長,樹幹轉為銀色,表示橡樹逐漸壯大;木質收集了力量、顏色和芳香。不是每棵橡樹都會養出好的白蘭地,最棒的白蘭地是由長在乾燥的土地、寧靜的地方的單生橡樹林所養成的。.......

    然後桶匠開始製作桶子,....................

    朝桶子裡倒進葡萄酒,或五百、或一千公升不定,然後把桶子擺在木馬架上,順其自然。人不需要再多做什麼:必須等待,時機正確,水到就會渠成。酒現在進入了橡木,然後木頭釋放出一切,釋放出陽光;釋放出香味;釋放出顏色,木頭擠出它本身的汁液;開始運作。

    所以需要寧靜。

    .... 第一杯白蘭地在三年後出現,.........。..................................。但事實上,白蘭地的年齡還要更了不起一點,我們得把製成木桶的橡樹年齡也加上去,這次的橡木是在為法國大革命期間封存的酒努力

    一個人可以從味道分辨出白蘭地是年輕或年老的,年輕的白蘭地尖銳、快速、衝動,味道會酸;反過來說,老成的親切,溫柔,稍後才開始發光。老白蘭地中又許多溫暖、學多陽光,會平靜的進入一個人的腦袋,好不誇張。

    而且會盡它應盡之道。

    [書摘]:蘇煒談疏離

    蘇煒參與了一次傳統的耶魯布魯克學院的 dinner talk 後,有感而發:


    站在耶魯講台上 Book Cover
    「有閒」的「遠」與「有為」的「近」

    -- 現代生活裡談「疏離」久矣。人們感嘆科技信息時代的人際疏離、人情淡薄、世風日下,簡直已快成一種「政治正確」式的老生常談,似乎談是一種姿態,不談亦是一種姿態。竊以為在意識形態面放言「批判」的許多高蹈的姿態-比方有滋有味享受著「中產階級」的現世功利而聲淚俱下批判「中產階級生活方式」之類-也僅是姿態而已,其實是「有閒」的味道很多而「有為」的質感很少的。

    雖然談疏離似乎是一種流行,蘇煒並不悲觀....

    所謂「中產階級」的「有閒」並不可怕,「有為」才是其中最重要、具備正面建構價值的東西;當今信息、科技時代(你叫後工業、後資本時代也行)造成的社會疏離與人際隔膜是事實,但在一種大的建制中,再人文教育「有為」的掌控安排之中,有意加強人與人之間的個人接觸,創造一種良性的社會氛圍,建構一個「主流社會」或所謂「中產階級生活形態」的正米特質,就不但是具體的,也是可行的。如果「立場」也即「姿態」的話,....

    如果我的心是一朵蓮花

    ~ 林徽因 · 馬雁散文集 · 蓮燈 ~ 馬雁 在她的散文《高貴一種,有詩為證》裡,提到「十多年前,還不知道林女士的八卦及成就前,在期刊上讀到別人引用的《蓮燈》」 覺得非常喜歡,比之卞之琳、徐志摩,別說是毫不遜色,簡直是勝出一籌。前面的韻腳和平仄的處理顯然高於戴...