Saturday, October 31, 2009
Friday, October 30, 2009
Vint Cerf in #IL2009 opening keynote
Bourbaki Archives
Via Peter Woit I see that there is a project to put the archives of Bourbaki online. These are the internal manuscripts that the group generated while writing their famous series of books.
Bourbabi 是誰?1950年前後,一群法國數學家決定以集合論為基礎,用純演繹的方式,重寫整套數學,他們用共同的筆名 N. Bourbaki發表著作。他們的著作從最抽象的數學體系談起,然後才逐漸待殊化,這 種由廣入狹的次序,把數學組織得既簡潔又亮麗,所以數學家們對此頗為欣賞。這群數學家每年夏天,聚集在一個避暑勝地,每天上午討論怎樣寫書,下午開數學講論會 (colloquium)。 在上午的寫書討論中,必須一致通過才算定稿,完成的著作共有數十冊。
在學術界大受好評之後,這群數學家表示想為中小學寫數學教科書,當時的法國政府表示非常歡迎,美國數學界受了 Bourbaki 這一舉動刺激(其實更多人認為是受了俄國的 Sputnik 衛星事件的刺激),由 Edward G. Begle 所領導的 School Mathematics Study Group( SMSG 中小學數學研究組) 也開發了一套數學教科書,這就是所謂的新數學。但是 Bourbaki 的中小學教科書計劃並沒有如預期獲得成功,而邯鄲學步的新數學,更被美國教育界視為一場災難。事實上,數學家所能欣賞的和企業界所需要的數學有一段距離,和中小學生所能研讀的有更大的一段距離,這場失敗不算意外。
雖然中小學教科書的嘗試不若他們在學術界的嘗試那麼成功,不過他們的主張對學術界的影響是深遠的,他們寫書的方式,更是數學界學子代代相傳的傳奇。
Monday, October 26, 2009
Francisco Martin #RecSys09 Industry Keynote Summary
Lesson 1 – Make sure a recommender is really needed! Do you have lots of recommendable items? Many diverse customers?… also think Return-on-Investment… a more sophisticated recommender may not deliver a better ROI.
Lesson 2 – Make sure the recommendations make strategic sense. Is the best recommendation for the customer also the best for the business? What is the difference between a good and useful recommendation? Good recommendations .vs. useful recs; Obvious recommendations may not be useful; risky recs may deliver better long-term value (所有系統都是為企業需求而生,切記切記)
Lesson 3 - Choose the right partner! Select the right rec vendor vs hire some #recsys09 students. If you are a big company the best you can do is to organize a contest (為什麼不直接明說 Netflix ?LOL)
Lesson 4 – Forget about cold-start problems (!) …. just be creative. The internet has the data you need (somewhere…) (記住那句老話:We are limited only by our imigination)
Lesson 5 – Get the right balance between data and algorithms. 70% of the success of a #recsys is on the data, the other 30% on the algorithm (這個問題我們已經討論很多次了, Worry about the data before you worry about the algorithm)
Lesson 6 – Finding correlated items is easy but deciding what, how, and when to present to the user is hard… or don't just recommend for the sake of it. Remember user attention is a scarce and valuable resource. Use it wisely! … don't make a recommendations to a customer who is just about to pay for items at the checkout! User interface should get at least 50% of your attention.
Lesson 7 – Don't waste time computing nearest neighbours (use social connections)… just mine the social graph. Might miss useful connections??
Lesson 8 – Don't wait to scale (6, 7, 8, 9 顯然都是實務上的經驗談)
Lesson 9 – Choose the right feedback mechanism. Stars vs thumbs …. the YouTube problem. More research on implicit and other feedback mechanisms is needed. The perfect rating system is no rating system! … focus on the interface. Seems to me this is one of the gaps in current research… algorithms > data > interface
Lesson 10 – Measure Everything! … business control and analytics is a big opportunity here. (不僅要評量預測準不準,企業流程裡每個環節都要有評估機制,這是有真正創業、經營體驗的人的心得)
Keynote Takeaway – Think about application context; Focus on interface as much as algorithms; Be creative with start-up data. … the UI needs to get the lion’s share of the effort (50%) compared to algorithms (5%) , knowledge (20%), analytics (25%)
對於最後的 Takeaway,每個讀者或許都有自己的看法,畢竟要量化各因素在系統開發過程中的比重實在不容易,最後只能是被迫給出一組表達自己“經驗值”的數字。UI 的重要性當然毋庸置疑,只是 UI 為什麼是演算法的十倍?聰明的你(妳),想必有一套自己的想法!
Sunday, October 25, 2009
#RecSys09 話題: what's on recommender researchers' mind?
如果想更快知道究竟今年有哪些熱門話題,就來看看在 University College Dublin 教書的 Barry Smyth 為大家製作的標籤雲,看起來 Netflix 還是大熱門的 buzzword 啊!
Why I Am Such an Infrequent Blogger?
“My assumption, always, is that everyone knows everything I know AND MORE. Rephrase. Everyone who is interested in the kinds of thing that interest me knows everything I know AND MORE. If they're not interested they don't know but don't want to. So there's no point in mentioning things that strike me as interesting, unless a) these are events in the last, say, 5 minutes (so those disposed to be interested might not be au fait or b) I'm up for proselytizing (those not disposed to be interested might be with enough encouragement).”
補充:
@clickstone 兄的博客 Beyond Search 非常精彩,對網路產業訊息、推薦系統理論實務有興趣的讀者萬勿錯過,近日 Beyond Search 因“不明”技術原因無法連線,大家可以到豆瓣的鏡像(mirror)閱讀谷兄大作。
Tuesday, October 20, 2009
The Wheels of Life: From Cradle to ...
後來有人修改了這張圖,變成 A Cyclist's Wheels of Life , 在大眾瘋鐵馬的今日看來,貼切的很。
Wednesday, October 7, 2009
Sunday, October 4, 2009
What the heck?
如果你看了新聞想要改名,鄙人勸您還是別費事兒了。(Disclaimer: 鄙人的名字不在十大迷人排行榜之內)
關於 user-based 和 item-based 算法的思考
半夜被地震閙醒,起床上網瞎晃,看到十一假期間xlvector兄仍然勤奮不輟思考 user-based 和 item-based 演算法對於輸出多様化的比較,大為佩服,特記錄于後:
(此刻鄙人凖備爬上床睡回龍覺)
前一段时间和wendong聊天,他提到userbased算法的结果多样性不如itembased算法。对此,我觉得有几个问题
1) 我们知道所谓多样性,是指推荐结果两两都不怎么相似,从而不同的相似度度量其实产生不同的多样性度量。
2)常用的相似度有两种,一种是基于content的,一种是基于collaborative filtering的,那么根据我的实验,在这两种相似度的度量下,userbased的结果多样性都好于itembased的算法
3)但我觉得还是存在一种相似度,而这个相似度对应的多样性在item-based的方法下比较好
不知道大家对这个问题怎么看,在实际系统中userbased和itembased谁能产生多样的结果?
如果我的心是一朵蓮花
~ 林徽因 · 馬雁散文集 · 蓮燈 ~ 馬雁 在她的散文《高貴一種,有詩為證》裡,提到「十多年前,還不知道林女士的八卦及成就前,在期刊上讀到別人引用的《蓮燈》」 覺得非常喜歡,比之卞之琳、徐志摩,別說是毫不遜色,簡直是勝出一籌。前面的韻腳和平仄的處理顯然高於戴...
-
我向來不是很關注 Conference 的訊息,但是這學期開學後,一個月內接連聽到好幾個老師談他們對學術會議「 價值 」的看法,促使我反省原先的態度,所以這幾天作了一點功課。我發現下面三個 Conference Ranking 的列表頗有參考價值,抄錄於後,一則是備忘,再則分享給...
-
這是很多年前的舊文了,最近有些網友找到這篇文章,於是有了一些很有意思的對話,我記錄在下面兩篇文章,如果您有興趣,也歡迎看看這些簡短的記錄,批評指教。謝謝。 如何評估推薦系統(二) 記一次推薦系統對話 ----- 任何工作,包括學術研究與商業專案,都必須有衡量成績...
-
最近,有個朋友接了個不大不小( 不是 quick and dirty 的小案,但也不是可以讓供應商穿金戴銀的數字,所以叫做不大不小 )的系統開發案,甲、乙雙方為了文件交付標準,起了不小的爭執。經過協調,最後兩方都同意不用 CMMI 的標準(天曉得什麼是 CMMI 文件標準),改用...