Tuesday, March 30, 2010

Monday, March 29, 2010

C'est La Vie

Flowing Data 有個很有趣的單元 Data Underload,用 Google 搜尋某個【句子+特定關鍵字】的筆數結果製成圖表,結果常令人發噱。雖然有些數字和經驗與常識所得的揣測相符,有時可未必如此,但有時結果和親手驗證的結果未必相符,就不知是怎麼回事了?

今日(2010/03/29)的主題:Life is like + ???,拜電影阿甘正傳之賜,人生就像一盒巧克力(life is like a box of chocolate)的結果僅次於人生就是那麼回事啦(life is like that)!這個答案很精彩,很有點洞燭世情的滄桑味道,讓人不由得不信。但是我自己動手查詢 life is like a book 的結果,卻遠遠超過這張圖的數字?



看完上面這張圖,一時手癢,做了一份中文版【人生就像 + ???】,結果不算意外,大家都覺得人生就像一首歌!?


Sunday, March 28, 2010

15 Petabytes a day makes a smarter planet ?

這個月初整理 How big is Digital Universe 的相關數據, 令人乍舌的數字充分顯示:人類製造資訊的能力和產出垃圾的能力一樣高明(呃,資訊 = 垃圾?)。就個人而言,如何在資訊洪流裡面,如何篩選有用的訊息,少做無用之功,是現在人必修的功課;對群體而言,如何利用科技,充分開發資料的價值,是研究人員的艱巨挑戰。IBM 力推的智慧星球(Smarter Planet)計畫,顯然對科技的力量有正面樂觀的期許:


Saturday, March 27, 2010

Daniel Lemier's advice on How to Write Good Papers

Daniel Lemire 是我很敬佩的一位學者,他在推薦系統領域知名度相當高,而且是學術界投入部落格書寫的先行者之一。他的部落格並不局限於推薦系統議題,還涉及演算法和資訊科學理論的基礎問題,他還不時以自身的經驗提供出如何做研究、寫論文的建議,文章品質都很棒。昨日 Daniel Lemire 在 SlideShare 放上他針對研究生演講(talk?)如何撰寫論文的簡報,看了簡報,我只有一個想法,我實在是太混了。


Friday, March 26, 2010

[PDF] How to read a research paper

哈佛大學教授 Michael Mitzenmacher (部落格 My Biased Coin 的作者)提供一份很棒的如何閱讀論文 (下載電子檔) 的指引和建議,並且願意提供原稿(他是用 Latex 寫的),讓大家繼續發揚改進。Mitzenmacher 教授在他的部落格還特別推薦Jason EisnerHow to Read a Technical Paper 的最後一節 What to read



What to read

  • creative web search



    • experiment with several searches
    • put yourself in an author's shoes; what phrases might they have used?
    • become a power searcher! (read the help pages for your search engine) 

  • find related work



    • backward references: follow the bibliography to earlier papers
    • forward references: see who else has cited the work (via an interface such as Google Scholar




    • has someone else already listed the right papers for you?



      • survey papers in journals (also called "review articles")
      • course syllabi
      • reading group webpages
      • chapters in textbooks
      • online tutorials
      • literature review chapters from dissertations
      • direct recommendations from friends or professors (perhaps at other institutions) 




      • breadth-first exploration



        • read a lot of abstracts (and skim the papers as needed) before deciding which papers are best to read
        • it's okay to read multiple related papers at once, flipping back and forth so that they clarify one another
        • to get a feel for the research landscape in an area, flip through the proceedings of a relevant recent workshop, conference, or special-theme journal issue 




        • when the going gets tough, switch to background reading



          • textbooks or tutorials
          • review articles
          • introductions and lit review chapters from dissertations
          • early papers that are heavily cited
          • sometimes Wikipedia

        Tuesday, March 23, 2010

        [Video] Nature by Numbers

        我想大家應該都同意數學是科學之母,其實數學無處不在,在我們身邊的大自然裡,在我們伸手可及的生活裡。一個老外 Cristóbal Vila 製作了一個短片,把自然界裡可見的數字和幾何現象,做了很棒的呈現:


        (影片的背景可參考這個網站

        Wednesday, March 17, 2010

        後台

        今天一早,社區停車場門口的土地公廟前搭了個棚子,把整條巷子攔腰堵住,停車場的出路也被攔住只剩一邊。看棚子的樣子,應該是要請班子來演布袋戲,台子後的音響從大清早就放着風塵味很重的流行歌(鼻音、抖音忒重,歌詞不是悔就是恨的那種歌),吵的人火氣直往上冒。早晨送孩子上學,出停車場門口,心裡暗恨,土地公不是要保一方平安嗎,是誰搞這麼一齣吵吵嚷嚷,讓人不得安寧。想到怨氣太重,不知會不會衝撞了那位,只得壓下心火,咬牙認了。

        流行歌放了個把小時,布袋戲開演了,誇張的特色口白聲和鑼鼓聲在巷子裡迴響流竄,實在是沒法子集中精神幹活,索性到巷子口看看。到了土地公廟門前,從戲台側邊可以看到後台的狀況,赫然發現,拿着男男女女的戲偶,搬演着風花雪月的竟然是個還沒上小學的娃兒,端坐一旁的禿頂中年,只是個打下手遞戲偶的。


        Monday, March 8, 2010

        Resys China 電子雜誌創刊號面世了

        文棟兄和Resys群組裡諸位同學的心血結晶 - Resys China 電子雜誌創刊號 - 面世了,套用文棟在發刊介紹的一段話,現在網路閱讀越來越傾向於淺閱讀,希望這份人工精編的內容能給大家帶來一些思考和收穫。我相信會的

        Sunday, March 7, 2010

        長征 20 公里

        昨日從地圖上找到一條從家中到桃園機場的騎車路線,下午便推着借來的小摺,殺向濱海再奔突而回,涼風徐徐好不快意。

        (根據 Google Map 的路線規劃,全程約 21.2 公里,故曰 20 公里長征)



        Wednesday, March 3, 2010

        How Big Is Digital Universe

        早期 Data Mining 領域的論文,常以資訊科技進步造成資訊過載作為開場白,比如說“因為資訊科技的進步,人類儲存和收集的資料以倍數成長” (例句:with the rapid advance of information technology, blah blah blah...)。雖然這樣的開場白看多了,難免會覺得膩煩,但是 data mining 確是解決資訊過載的手段之一,論文還是得耐着性子看下去。

        不過資訊科技到底把這個世界弄得多混亂,着實是個大問題。每個人都知道資訊過載是個問題,但究竟這個過載到底有多超,到底有多少資訊在人類社會流通(不論實體形式還是數位形式),實在很難說的清楚。資訊不是不生不滅的實體物質,資訊隨着人類的心智活動、社會活動而持續增加,所以資訊量是個逐漸變大的數字,不是恆久不變的常數。

        不過寫文章的人,或是靠賣資訊維生的顧問們,為了表示自己說的話有根有據,足以信賴,還是得弄出幾個量化的數字出來。


        Google 在 2008 年公布  MapReduce 的論文裡面說,在  2007 年 9 月 Google 每天要處理的資料量達到 20,000 terabytes (參見下圖),而且這個數字每天都在成長。網路設備生產廠商  Cisco 則估計從 2008 年到 2013 年, 全球 IP 網路的流量將增加 5 倍, 2013 年的全球流量將高達 667 exabyte ( 原報告是說 2/3 zetabyte , 1 zetabyte = 10^21 bytes)。


        今年2月25日的經濟學人雜誌的主題是資訊洪流(Data Deluge),當期的專題報導裡提到:根據某個研究報告,人類在 2005 年製造(依照報告的說法是 created, captured and replicated)了 150 exabyte (1 exabyte = 1 billion gigabytes)數位資料,今年這個數字將成長到 1,200 exabyte 。在網路上搜尋相關數據之後,我想這個研究報告應該是指 EMC 支持 IDC 做的 Digital Universe 報告,這份報告在 2007 年首次發表,指出人類製造數位資料的本領愈來愈大,五年間可以成長到10倍, 2008 年將預估數值做了修正,報告中預估到了 2012 年,人類製造的數位資料將高達 1,800 exabyte。

          (資料來源 Digital Universe , The Economists )

        雖然這些數字只是估計,但作為吹牛的談資,或是寫文章的參考依據,倒也足夠了。但對於個人而言,一輩子能 接觸、經手處理的資訊量,實在是極為有限,怎樣少碰些垃圾訊息,少做些無用功,才是更大的課題。

        Tuesday, March 2, 2010

        Is "suffering" an indispensable part of research?

        看到 Research as a second language 談 "mentoring .vs. coaching" 的文章,忍不住要問 Is "suffering" an indispensable part of research? 

        Mentoring vs. Coaching
        The Centre for Development of Human Resources and Quality Management in Denmark is holding a conference to present the results of a PhD coaching project. The project, which involved PhD students from three universities, appears to have been a success.

        Specifically, they discovered that:
        • The participants got a lot out of the coaching.
        • The coaching did not get in the way of traditional academic supervision
        • Individual coaching works better than workshops.
        My experience confirms these conclusions. But the theme of the conference appears to be captured in the question, "Do you necessarily have to go through a lot of suffering to get a PhD?" That question, and the fact that the coaching was found "not to disturb" academic supervision, got me thinking about what I do. In fact, it got me reconsidering.

        First, I do believe that "suffering" is an important part of research (in Danish, as Kierkegaard pointed out, suffering rhymes with science). Second, I've long noticed that "coaching" is often an inappropriate metaphor because outside of actual sports the "coach" is often not a master of the craft she coaches; rather she has a generalized ability to motivate others and help them get organized. (This, by the way, does not always mean she has an ability to get herself organized or get anything done herself.) Like me, she may not know very much about the area of scholarship that the PhD student is working in.

        .......


        如果我的心是一朵蓮花

        ~ 林徽因 · 馬雁散文集 · 蓮燈 ~ 馬雁 在她的散文《高貴一種,有詩為證》裡,提到「十多年前,還不知道林女士的八卦及成就前,在期刊上讀到別人引用的《蓮燈》」 覺得非常喜歡,比之卞之琳、徐志摩,別說是毫不遜色,簡直是勝出一籌。前面的韻腳和平仄的處理顯然高於戴...