The Road Less Traveled By: October 2009

Saturday, October 31, 2009

Friday, October 30, 2009

Vint Cerf in #IL2009 opening keynote

Vint Cerf , the man who is most called as father of the Internet, interviewed by Paul Holdengraber in Internet Librarian 2009 opening keynote.

Bourbaki Archives

從 Ars Mathematica 知道 Bourbaki Archive 的消息，一時頗為感慨，“誤入塵網中”的我，早以把大學時代唸的東西還給天地了，但是聽到這個消息還是很高興。

Via Peter Woit I see that there is a project to put the archives of Bourbaki online. These are the internal manuscripts that the group generated while writing their famous series of books.

Bourbabi 是誰？1950年前後，一群法國數學家決定以集合論為基礎，用純演繹的方式，重寫整套數學，他們用共同的筆名 N. Bourbaki發表著作。他們的著作從最抽象的數學體系談起，然後才逐漸待殊化，這種由廣入狹的次序，把數學組織得既簡潔又亮麗，所以數學家們對此頗為欣賞。這群數學家每年夏天，聚集在一個避暑勝地，每天上午討論怎樣寫書，下午開數學講論會 (colloquium)。在上午的寫書討論中，必須一致通過才算定稿，完成的著作共有數十冊。

在學術界大受好評之後，這群數學家表示想為中小學寫數學教科書，當時的法國政府表示非常歡迎，美國數學界受了 Bourbaki 這一舉動刺激（其實更多人認為是受了俄國的 Sputnik 衛星事件的刺激），由 Edward G. Begle 所領導的 School Mathematics Study Group（ SMSG 中小學數學研究組) 也開發了一套數學教科書，這就是所謂的新數學。但是 Bourbaki 的中小學教科書計劃並沒有如預期獲得成功，而邯鄲學步的新數學，更被美國教育界視為一場災難。事實上,數學家所能欣賞的和企業界所需要的數學有一段距離，和中小學生所能研讀的有更大的一段距離，這場失敗不算意外。

雖然中小學教科書的嘗試不若他們在學術界的嘗試那麼成功，不過他們的主張對學術界的影響是深遠的，他們寫書的方式，更是數學界學子代代相傳的傳奇。

(以上說明改寫自王九逵教授發表於科學月刊第三十一卷第三期的集合論與數學教育 )

Monday, October 26, 2009

Francisco Martin #RecSys09 Industry Keynote Summary

推薦系統界的年度盛事之一 ACM Recommender System 2009 剛剛落幕（October 22-25），Strands 的創辦人 Francisco J Martin 在會中以業界人士身份受邀發表的演說 (Industry Ketnote) : Top 10 Lessons Learned Developing, Deploying, and Operating Real-World Recommender Systems 中有許多值得大家思考的內容，Neal Lathia (MobBlog) 將 Martin 博士的演說，以推特(Twitter)筆法，整理成十則簡明的摘要：

Lesson 1 – Make sure a recommender is really needed! Do you have lots of recommendable items? Many diverse customers?… also think Return-on-Investment… a more sophisticated recommender may not deliver a better ROI.

Lesson 2 – Make sure the recommendations make strategic sense. Is the best recommendation for the customer also the best for the business? What is the difference between a good and useful recommendation? Good recommendations .vs. useful recs; Obvious recommendations may not be useful; risky recs may deliver better long-term value (所有系統都是為企業需求而生，切記切記)

Lesson 3 - Choose the right partner! Select the right rec vendor vs hire some #recsys09 students. If you are a big company the best you can do is to organize a contest （為什麼不直接明說 Netflix ？LOL）

Lesson 4 – Forget about cold-start problems (!) …. just be creative. The internet has the data you need (somewhere…) （記住那句老話：We are limited only by our imigination）
Lesson 5 – Get the right balance between data and algorithms. 70% of the success of a #recsys is on the data, the other 30% on the algorithm （這個問題我們已經討論很多次了, Worry about the data before you worry about the algorithm）

Lesson 6 – Finding correlated items is easy but deciding what, how, and when to present to the user is hard… or don't just recommend for the sake of it. Remember user attention is a scarce and valuable resource. Use it wisely! … don't make a recommendations to a customer who is just about to pay for items at the checkout! User interface should get at least 50% of your attention.

Lesson 7 – Don't waste time computing nearest neighbours (use social connections)… just mine the social graph. Might miss useful connections??

Lesson 8 – Don't wait to scale (6, 7, 8, 9 顯然都是實務上的經驗談）

Lesson 9 – Choose the right feedback mechanism. Stars vs thumbs …. the YouTube problem. More research on implicit and other feedback mechanisms is needed. The perfect rating system is no rating system! … focus on the interface. Seems to me this is one of the gaps in current research… algorithms > data > interface

Lesson 10 – Measure Everything! … business control and analytics is a big opportunity here. （不僅要評量預測準不準，企業流程裡每個環節都要有評估機制，這是有真正創業、經營體驗的人的心得）

Keynote Takeaway – Think about application context; Focus on interface as much as algorithms; Be creative with start-up data. … the UI needs to get the lion’s share of the effort (50%) compared to algorithms (5%) , knowledge (20%), analytics (25%)

對於最後的 Takeaway，每個讀者或許都有自己的看法，畢竟要量化各因素在系統開發過程中的比重實在不容易，最後只能是被迫給出一組表達自己“經驗值”的數字。UI 的重要性當然毋庸置疑，只是 UI 為什麼是演算法的十倍？聰明的你（妳），想必有一套自己的想法！

Sunday, October 25, 2009

#RecSys09 話題： what's on recommender researchers' mind?

這幾天推薦系統界的年度盛事之一 ACM Recommender System 2009 正在美國紐約市舉行，會後必然會有許多學養俱佳的學者和產業分析師和大家分享見聞以及會中眾人關注的話題，想儘快知道 what's up; what's hot 的讀者可以在推特上關注 #RecSys09 這個 hashtag 。

如果想更快知道究竟今年有哪些熱門話題，就來看看在 University College Dublin 教書的 Barry Smyth 為大家製作的標籤雲，看起來 Netflix 還是大熱門的 buzzword 啊！

Why I Am Such an Infrequent Blogger?

前幾天，Resys 的推動主力網友 @clickstone 兄抱怨小弟不夠勤快，總以 DailyMurmur 打發讀者，實在是冤枉啊，小弟不才不知如何自辯，借 Helen DeWit 宏文說明 Why I Am Such an Infrequent Blogger ，不知 clickstone 兄是否滿意？

“My assumption, always, is that everyone knows everything I know AND MORE. Rephrase. Everyone who is interested in the kinds of thing that interest me knows everything I know AND MORE. If they're not interested they don't know but don't want to. So there's no point in mentioning things that strike me as interesting, unless a) these are events in the last, say, 5 minutes (so those disposed to be interested might not be au fait or b) I'm up for proselytizing (those not disposed to be interested might be with enough encouragement).”

補充：
@clickstone 兄的博客 Beyond Search 非常精彩，對網路產業訊息、推薦系統理論實務有興趣的讀者萬勿錯過，近日 Beyond Search 因“不明”技術原因無法連線，大家可以到豆瓣的鏡像（mirror）閱讀谷兄大作。

Tuesday, October 20, 2009

你會放棄嗎？

這是人性的試煉，你會怎麼做？

via FRKNCNGZ.BLOG

The Wheels of Life: From Cradle to ...

今日在網路上看到一個標題叫 “The Wheels of Life”的圖片，覺得很有趣，搜尋之後才知道，這張圖在網路已經流傳很久了，我真是後知後覺。

http://www.vijayforvictory.com/photo/wheel-of-life/1193/

Wheel of life » Vijay For Victory via kwout

後來有人修改了這張圖，變成 A Cyclist's Wheels of Life ，在大眾瘋鐵馬的今日看來，貼切的很。

http://trailcentral.blogspot.com/2007/08/wheels-of-life.html

Beyond TrailCentral: Wheels of life... via kwout

Wednesday, October 7, 2009

我想研究水滸傳

Sunday, October 4, 2009

What the heck?

已經有許多人質疑有沒有為什麼英國人專門作古怪研究的八卦，其實說來簡單，懶惰的媒體為了填補版面，這樣的研究就不會絕跡。只是周五（2009/10/2）聯合報這則《男叫Lee 女名Kelly 上床進度快》就實在是無厘頭至極，究竟這是什麼時候的研究？2007 or 2009 ?

如果你看了新聞想要改名，鄙人勸您還是別費事兒了。(Disclaimer: 鄙人的名字不在十大迷人排行榜之內)

關於 user-based 和 item-based 算法的思考

半夜被地震閙醒，起床上網瞎晃，看到十一假期間xlvector兄仍然勤奮不輟思考 user-based 和 item-based 演算法對於輸出多様化的比較，大為佩服，特記錄于後：

(此刻鄙人凖備爬上床睡回龍覺)

前一段时间和wendong聊天，他提到userbased算法的结果多样性不如itembased算法。对此，我觉得有几个问题

1) 我们知道所谓多样性，是指推荐结果两两都不怎么相似，从而不同的相似度度量其实产生不同的多样性度量。

2)常用的相似度有两种，一种是基于content的，一种是基于collaborative filtering的，那么根据我的实验，在这两种相似度的度量下，userbased的结果多样性都好于itembased的算法

3)但我觉得还是存在一种相似度，而这个相似度对应的多样性在item-based的方法下比较好

不知道大家对这个问题怎么看，在实际系统中userbased和itembased谁能产生多样的结果？

Share and Enjoy:

Related posts:
my solutions of github contest – item based KNN
An improved item-based KNN predictor
Recommendation Systems: An Interview with Satnam Alag
到目前为止的进度

via xlvector.cn

後記：
請參閱鄭昀今日的《基于Google Reader发展起来的个性化推荐系统之三大问题》，這纔是專業手筆。

Posted via web from imrchen's posterous

The Road Less Traveled By