Thursday, December 3, 2009
[詩戀] 成為詩人之偶然
Saturday, November 28, 2009
Wednesday, November 25, 2009
Friday, November 20, 2009
明日的 OS?
- 明天的作業系統:Chrome OS (石墨工房)
- First Glimpse at Google Chrome OS (Lifehacker)
- Google Chrome OS: 5 Ways It’s Completely Different (Mashable)
- 全球主流博客对 Chrome OS 的看法 (谷奧 @guao)
- Chrome OS 终于露出真容,我们有了答案了吗?
Wednesday, November 11, 2009
Sunday, November 8, 2009
Saturday, October 31, 2009
Friday, October 30, 2009
Vint Cerf in #IL2009 opening keynote
Bourbaki Archives
Via Peter Woit I see that there is a project to put the archives of Bourbaki online. These are the internal manuscripts that the group generated while writing their famous series of books.
Bourbabi 是誰?1950年前後,一群法國數學家決定以集合論為基礎,用純演繹的方式,重寫整套數學,他們用共同的筆名 N. Bourbaki發表著作。他們的著作從最抽象的數學體系談起,然後才逐漸待殊化,這 種由廣入狹的次序,把數學組織得既簡潔又亮麗,所以數學家們對此頗為欣賞。這群數學家每年夏天,聚集在一個避暑勝地,每天上午討論怎樣寫書,下午開數學講論會 (colloquium)。 在上午的寫書討論中,必須一致通過才算定稿,完成的著作共有數十冊。
在學術界大受好評之後,這群數學家表示想為中小學寫數學教科書,當時的法國政府表示非常歡迎,美國數學界受了 Bourbaki 這一舉動刺激(其實更多人認為是受了俄國的 Sputnik 衛星事件的刺激),由 Edward G. Begle 所領導的 School Mathematics Study Group( SMSG 中小學數學研究組) 也開發了一套數學教科書,這就是所謂的新數學。但是 Bourbaki 的中小學教科書計劃並沒有如預期獲得成功,而邯鄲學步的新數學,更被美國教育界視為一場災難。事實上,數學家所能欣賞的和企業界所需要的數學有一段距離,和中小學生所能研讀的有更大的一段距離,這場失敗不算意外。
Monday, October 26, 2009
Francisco Martin #RecSys09 Industry Keynote Summary
Lesson 1 – Make sure a recommender is really needed! Do you have lots of recommendable items? Many diverse customers?… also think Return-on-Investment… a more sophisticated recommender may not deliver a better ROI.
Lesson 2 – Make sure the recommendations make strategic sense. Is the best recommendation for the customer also the best for the business? What is the difference between a good and useful recommendation? Good recommendations .vs. useful recs; Obvious recommendations may not be useful; risky recs may deliver better long-term value (所有系統都是為企業需求而生,切記切記)
Lesson 3 - Choose the right partner! Select the right rec vendor vs hire some #recsys09 students. If you are a big company the best you can do is to organize a contest (為什麼不直接明說 Netflix ?LOL)
Lesson 4 – Forget about cold-start problems (!) …. just be creative. The internet has the data you need (somewhere…) (記住那句老話:We are limited only by our imigination)
Lesson 5 – Get the right balance between data and algorithms. 70% of the success of a #recsys is on the data, the other 30% on the algorithm (這個問題我們已經討論很多次了, Worry about the data before you worry about the algorithm)
Lesson 6 – Finding correlated items is easy but deciding what, how, and when to present to the user is hard… or don't just recommend for the sake of it. Remember user attention is a scarce and valuable resource. Use it wisely! … don't make a recommendations to a customer who is just about to pay for items at the checkout! User interface should get at least 50% of your attention.
Lesson 7 – Don't waste time computing nearest neighbours (use social connections)… just mine the social graph. Might miss useful connections??
Lesson 8 – Don't wait to scale (6, 7, 8, 9 顯然都是實務上的經驗談)
Lesson 9 – Choose the right feedback mechanism. Stars vs thumbs …. the YouTube problem. More research on implicit and other feedback mechanisms is needed. The perfect rating system is no rating system! … focus on the interface. Seems to me this is one of the gaps in current research… algorithms > data > interface
Lesson 10 – Measure Everything! … business control and analytics is a big opportunity here. (不僅要評量預測準不準,企業流程裡每個環節都要有評估機制,這是有真正創業、經營體驗的人的心得)
Keynote Takeaway – Think about application context; Focus on interface as much as algorithms; Be creative with start-up data. … the UI needs to get the lion’s share of the effort (50%) compared to algorithms (5%) , knowledge (20%), analytics (25%)
對於最後的 Takeaway,每個讀者或許都有自己的看法,畢竟要量化各因素在系統開發過程中的比重實在不容易,最後只能是被迫給出一組表達自己“經驗值”的數字。UI 的重要性當然毋庸置疑,只是 UI 為什麼是演算法的十倍?聰明的你(妳),想必有一套自己的想法!
Sunday, October 25, 2009
#RecSys09 話題: what's on recommender researchers' mind?
如果想更快知道究竟今年有哪些熱門話題,就來看看在 University College Dublin 教書的 Barry Smyth 為大家製作的標籤雲,看起來 Netflix 還是大熱門的 buzzword 啊!
Why I Am Such an Infrequent Blogger?
“My assumption, always, is that everyone knows everything I know AND MORE. Rephrase. Everyone who is interested in the kinds of thing that interest me knows everything I know AND MORE. If they're not interested they don't know but don't want to. So there's no point in mentioning things that strike me as interesting, unless a) these are events in the last, say, 5 minutes (so those disposed to be interested might not be au fait or b) I'm up for proselytizing (those not disposed to be interested might be with enough encouragement).”
@clickstone 兄的博客 Beyond Search 非常精彩,對網路產業訊息、推薦系統理論實務有興趣的讀者萬勿錯過,近日 Beyond Search 因“不明”技術原因無法連線,大家可以到豆瓣的鏡像(mirror)閱讀谷兄大作。
Tuesday, October 20, 2009
The Wheels of Life: From Cradle to ...
後來有人修改了這張圖,變成 A Cyclist's Wheels of Life , 在大眾瘋鐵馬的今日看來,貼切的很。
Wednesday, October 7, 2009
Sunday, October 4, 2009
What the heck?
如果你看了新聞想要改名,鄙人勸您還是別費事兒了。(Disclaimer: 鄙人的名字不在十大迷人排行榜之內)

關於 user-based 和 item-based 算法的思考
半夜被地震閙醒,起床上網瞎晃,看到十一假期間xlvector兄仍然勤奮不輟思考 user-based 和 item-based 演算法對於輸出多様化的比較,大為佩服,特記錄于後:
1) 我们知道所谓多样性,是指推荐结果两两都不怎么相似,从而不同的相似度度量其实产生不同的多样性度量。
2)常用的相似度有两种,一种是基于content的,一种是基于collaborative filtering的,那么根据我的实验,在这两种相似度的度量下,userbased的结果多样性都好于itembased的算法
Wednesday, September 30, 2009
關於 Noise 的隨想與牢騷

Tuesday, September 8, 2009
Where do people show their true feelings?
其實,我覺得製作這張圖的工具 - Graph Jam Build更有意思,有時間應該來玩玩。
Sunday, September 6, 2009
Saturday, September 5, 2009
Wednesday, September 2, 2009
Monday, August 31, 2009
SNS 是教師的幫手嗎?
一份由FacultyFocus 所做的調查報告中說,接受調查的1,900大學教授約有半數從來沒用過 Twitter,這些不使用 Twitter 的受訪者表示他們認為這個社群工具對於他們的教育工作沒有幫助。少部份目前沒有使用推特的受訪者,說有一半的機會,他們會試著把推特當作一個學習工具。
The Chronicle 在報導這個問題的時候,問讀者中的教育工作者,有多少人和學生間使用推特溝通?從 FacultyFocus 的調查看來,已經這麼做的師生應該是少數,而推特 140 個字元的限制(或說特色),使得教育工作者對於溝通的成效不是那麼放心。
筆者好奇的是,不管是那一種型態的社群網路,當師生同在一個 SNS 裡,而且 connected 的情況下,究竟對於授課、解惑的互動,究竟會產生怎樣的影響?已經有不少教育工作者,使用 blog 和 wiki 來輔助教學,社群網路服務對於教育從業人員有什麼幫助呢?
不同的社群網路服務有其不同的特色, Twitter 以訊息的傳播與複製見長,Facebook 則從人脈網路的連結出發,不同的連結方式,會有什麼影響?
As a teaching tool, PowerPoint can be awful. Is Twitter any better? One promotes passivity; the other, connectedness and interactivity (unless you follow people like us, who are about as responsive as a dining room table). The Chronicle of Higher Education raises an interesting question: should professors be tweeting with their students? Or is it a poor substitute for face-to-face interaction? Of course, some say brevity is the soul of wit, and 140 characters is very, very brief.
Sunday, August 30, 2009
Saturday, August 29, 2009
The Evolution of Retweeting
Twitter has incorporated other user-generated linguistic tools, such as using a hash symbol in front of a word to make it easily searchable (like "#conference09"). Another common technique is typing @ in front of a username to reply directly (but publically) to the user, which Twitter also formalized after users adopted it. These linguistic tools have even trickled into other social media environments, including YouTube, Flickr, Facebook, and blogs. ......Currently, there is no set format for retweeting, which loosely consists of reposting someone's tweet and giving due credit. .............But the retweeting format is much more inconsistent and complex than the targeted reply and hashtag conventions, according to Microsoft Research social media scientistDanah Boyd, who recently posted a paper on the behavior of retweeting. Variations include typing the attribution at the end and using "via," "by," or "retweet" instead of "RT." What's more, people often add their own comments before or after a retweet. This becomes a problem with Twitter's 140-character limit, explains Boyd. Typing "RT @username" takes up characters, and so does adding a comment. To deal with this, users will paraphrase or omit part of the original text, sometimes leading to incorrect quotes.
Users often employ retweets to provide context in conversation, says Susan Herring, a professor of information science and linguistics at Indiana University and editor in chief of the Language@Internet journal. "I can't imagine that [the new Twitter tool] will be very satisfactory to Twitter retweeters," says Herring. "A retweet plus a comment is a conversation. A retweet alone could be an endorsement, but it's a stretch to view an exchange of endorsements as a conversation." Herring does agree that it will increase retweeting and broaden the range of users who retweet.
"People will continue to repurpose Twitter to meet their needs," predicts Herring. "I can't imagine that those who are passionate retweeters will discontinue their practices."
Tuesday, August 25, 2009
這個現象解釋起來很簡單,很多產品有不同的包裝版本(想像一下書籍的普及版、精裝本、典藏版, 等等),甚至同樣的版本,也可能在資料庫裡有兩筆以上資料(仍然以書籍做例子,想像一下同一本書的一刷、二刷),我們如何知道這些不同的商品其實都是同一個產品?Greg Linden 也曾經以 YouTube 為例,撰文 YouTube cries out for item authority 說明 item authority問題對服務提供者造成的困擾以及挑戰。
随着豆瓣数据库里的版本数据的完善,豆瓣猜的智商也将大大提高,再也不会推荐同一作品不同版本的书给你了;有些已经绝版不再出售的图书页面(比如 86年版的《傲慢与偏见》),会有最近新版的价格帮助购买(比如06年的《傲慢与偏见》有售);对于多达十几种版本的图书,版本页面还会显示各自的收藏人数和评分,帮助大家比较版本的好坏。
Friday, August 21, 2009
Thursday, August 20, 2009
因為我找到的 PDF 連結是壞的,還沒有機會看到全文,所以只能從摘要想像大概。不過,這的確是個有趣的想法,推薦系統不僅在 attention economy 大環境裡協助企業主攫取客戶注意力,還能協助程式設計師更好的完成他的工作(assignment),看來 "We're limited only by our imagination" 這句話真是一點都沒錯。
DebugAdvisor: A Recommender System for Debugging
In large software development projects, when a programmer is assigned a bug to fix, she typically spends a lot of time searching (in an ad-hoc manner) for instances from the past where similar bugs have been debugged, analyzed and resolved. Systematic search tools that allow the programmer to express the context of the current bug, and search through diverse data repositories associated with large projects can greatly improve the productivity of debugging. This paper presents the design, implementation and experience from such a search tool called DebugAdvisor.
The context of a bug includes all the information a programmer has about the bug, including natural language text, textual rendering of core dumps, debugger output etc.
Our key insight is to allow the programmer to collate this entire context as a query to search for related information. Thus, DebugAdvisor allows the programmer to search using a fat query, which could be kilobytes of structured and unstructured data describing the contextual information for the current bug. Information retrieval in the presence of fat queries and variegated data repositories, all of which contain a mix of structured and unstructured data is a challenging problem. We present novel ideas to solve this problem.
We have deployed DebugAdvisor to over 100 users inside Microsoft. In addition to standard metrics such as precision and recall, we present extensive qualitative and quantitative feedback from our users.
Monday, July 27, 2009
Er....Netflix Prize goes to...?
原本以為最終結果應該是 BPC 的囊中物,應該不會有什麼懸念,但就在時間截止之前,另外一個隊伍 Ensemble 宣稱他們也跨過門檻(Breaking - Netflix Prize, we’ve got a winner, and it’s Greek! (updated)),甚至一度自行宣佈他們勝過原本的領先者,是最終的贏家(下圖是目前Leaderboard公佈的成績)。不過,一位自稱 An Insider 的網友在這篇文章之後留言,解釋他們可能誤解了規則,BPC 在 Test Set 的成績較優,才是最終的贏家(自稱 Just a guy in garage 的 Gavin Potter 很快的在個人部落格撰文解釋為什麼 BPC 才是贏家)。
到目前為止, Netflix 仍然沒有正式宣佈誰是最後的贏家,只是宣佈停止收件,並且說有兩個隊伍通過門檻。
As of July 26, 2009 18:42:37 UTC, we have stopped gathering submissions for the Netflix Prize contest. There are submissions from two teams that meet the minimum requirements for the Grand Prize. We are contacting the lead team and we will report, as soon as possible, when and if we have a verified winner for the Grand Prize.
不管誰贏得這個比賽, Daniel Lemire 說的好,充份鼓勵各種創意和多元化的發展,才是學術發展的正確方向:
Both teams broke the 10% barrier by using a diverse coalition, by merging several different ideas. As Peter Turney recently stated:I am now more convinced than ever that science needs diverse explanations, techniques and opinions. We should actively reward creativity. Science is not merely about truth-seeking.
There are no whole-truths, but we can get by reasonably well with a large number of half-truths.
Saturday, July 25, 2009
微博版沐心泉遊記 - Day 1
試著用近日流行的微博(microblogging)體寫下兩日所感所思,打個比方講,如果昨日我帶著能上網的手機,沿路發推(twitter,其實 plurk, jaiku, facebook 都可以,看你喜歡什麼,反正這只是想像),追蹤訂閱(subscribe)我的資料串流的讀者,看到的大約就是下面的東西吧。我沒有檢查是不是每一則都少於 140 個字元,我想應該不會差太多。
09:10 [出發]
家裡的大小美女昨夜太興奮,早上爬不起來,反而是昨夜小酌一杯的我最早起床,呼喚大家起床梳洗,把行李搬上車,去小七買杯 City Cafe 的熱拿鐵,然後開車上中山高速公路,一路向南,出發!!
10:10 [GPS]
老婆向同事借了一台 GPS 導航機子,但是這台機器表現讓人大失所望,在之後的兩天裡,不僅只有三次定位成功,定位成功後,冰冷的機器聲吐出的路徑建議,竟然是不存在的路名,全車乘客一致通過,把這台機器當作笑話。
11:30 [懷舊午餐]
11:40 [佐餐的懷舊歌曲]
三嘴滷店內放的音樂都是 40 年前民歌時期當紅的歌曲,葉佳修、銀霞、潘安邦、蔡琴...,我們夫婦兩興高采烈的向兩個女兒介紹這些歌有多紅,『這首歌把拔國中的時候很紅喔』,『這首歌的歌詞我會背』,換來的是女兒的冷眼...
13:00 [縣道 129]
因為 GPS 不管用,只好翻出地圖,找出從台中南區接上縣道 129 往新社鄉的路線, 於是一邊看著曖昧不明的路標,一邊對照著地圖上模糊不清的字跡,一路上且暫(停)且走,終於走過中興嶺,找到今日行程的第一個目標:白冷圳
15:34 [沈默的白冷圳]
15:50 [花園裡的草泥馬]
15:55 [下午茶]
公主的花園當然是歐洲風格,這樣才符合童話故事的想像,花園裡的下午茶是典型的英式下午茶,採取「絕對管飽」的實惠策略,擺盤雖然不如五星級飯店的下午茶精緻,但是誠意一樣令人滿意 (grin)
16:05 [寄給自己的明信片]
16:45 [自拍成狂]
17:20 [Up Up Up]
17:30 [與蚊共舞]
18:00 [一次打死7個何足道哉]
已經被下午茶撐的滿滿胃實在塞不下食物,草草用過晚餐,在用餐的亭子裡走動,一方面讓胃的飽漲感舒緩點,一方面運動中的雙腳比較不會被蚊子「咬」住。但是蚊子實在太多,只好行進間邊拍打著雙腳,一掌下去總有斬獲,走出餐廳掌下擊斃了30 隻蚊子...
20:00 [rummikub之夜]
22:15 [超級星光大道]
23:55 [萬木森森一草堂]
24:00 [Good Night]
Friday, July 10, 2009
[詩戀] 父親的草原母親的歌 - 從歌聲裡尋找蒙古
有位網友在Youtube 上的評論區這麼評價兩位的詮釋:『腾格尔的故乡在伊克昭盟鄂托克旗,这里由于沙漠侵蚀,大部分已经沙化,草原几乎成了过去。腾格尔的歌粗旷伤感,是对已故草原的眷 恋以及对草原沙化的无奈。布仁的家乡在呼伦贝尔,这里水草丰满, 森林茂密,牧人过着悠闲的放牧生活,布仁的歌声中透露的则是对家乡美丽大自然的无限憧憬和遐想。 』
Thursday, July 9, 2009
[詩戀] Lose a bloody man?
The day he moved out was terribleWendy Cope 的詩很有生活味,還有一首 Bloody Man 談男女情事也很有意思:
That evening she went through hell
His absence wasn't a problem
But the corkscrew had gone as well
Bloody men are like bloody buses---
You wait for about a year
And as soon as one approaches your stop
Two or three others appear.
You look at them flashing their indicators,
Offering you a ride.
You're trying to read the destinations,
You haven't much time to decide.
If you make a mistake, there is no turning back.
Jump off, and you'll stand there and gaze
While the cars and the taxis and lorries go by
And the minutes, the hours, the days
Monday, June 29, 2009
這個錯誤當然很快就被發現而且更正了,Danny Sullivan 解釋發生這錯誤顯然是因為谷歌在使用 Wikipedia 資料時,錯用了另外一位 Michael Jackson 的資料。Matthew Hurst 認為這是一個典型的在執行細節上失誤的範例,並且仔細的分析執行文字探勘(text mining)工作時每一個步驟的細節,還語重心長的說 Attention to detail will always be a killer feature!
最後,筆者忍不住要加上一段有點 cynical 的按語:身為微軟員工的 Matthew Hurst 在分析谷歌這次失誤的時候,是怎樣的心情咧!?
Daily Murmur 2009/06/28
~ 林徽因 · 馬雁散文集 · 蓮燈 ~ 馬雁 在她的散文《高貴一種,有詩為證》裡,提到「十多年前,還不知道林女士的八卦及成就前,在期刊上讀到別人引用的《蓮燈》」 覺得非常喜歡,比之卞之琳、徐志摩,別說是毫不遜色,簡直是勝出一籌。前面的韻腳和平仄的處理顯然高於戴...

我向來不是很關注 Conference 的訊息,但是這學期開學後,一個月內接連聽到好幾個老師談他們對學術會議「 價值 」的看法,促使我反省原先的態度,所以這幾天作了一點功課。我發現下面三個 Conference Ranking 的列表頗有參考價值,抄錄於後,一則是備忘,再則分享給...
這是很多年前的舊文了,最近有些網友找到這篇文章,於是有了一些很有意思的對話,我記錄在下面兩篇文章,如果您有興趣,也歡迎看看這些簡短的記錄,批評指教。謝謝。 如何評估推薦系統(二) 記一次推薦系統對話 ----- 任何工作,包括學術研究與商業專案,都必須有衡量成績...
最近,有個朋友接了個不大不小( 不是 quick and dirty 的小案,但也不是可以讓供應商穿金戴銀的數字,所以叫做不大不小 )的系統開發案,甲、乙雙方為了文件交付標準,起了不小的爭執。經過協調,最後兩方都同意不用 CMMI 的標準(天曉得什麼是 CMMI 文件標準),改用...