Thursday, May 3, 2007

Birth of Data Mining

Data Mining is the evolution of a filed with long history, the term "data mining" emerged in late '80s and the researches of data mining flourished since 1990s. Many believed that the birth of data mining (or knowledge discovery) should trace back to the 1989 IJCAI workshop on Knowledge Discovery in Databases took pace in Detroit, Michigan, USA. The report was published in AI magazine and the bibex can be found at ACM digital library. The context of the document can be found at KDnuggets. (The Proceedings of the conference may be of interest).

The summary of the report are as follows:

The workshop confirmed that knowledge discovery in databases is an idea whose time has come. Some of the important research issues addressed in the workshop were:
  • Domain knowledge. It should be used to reduce search space, but used carefully so as not to prevent un-anticipated discoveries. While a specialized learning algorithm will outperform a general method, a desirable compromise is to develop a framework for augmenting the general method with the specific domain knowledge.
  • Dealing with Uncertainty. Databases typically have missing, incomplete or incorrect data items. Thus any discovery algorithm must deal with noise. Rules discovered in noisy data will necessarily be approximate.
  • Efficiency. Exponential and even high-order polynomial algorithms will not scale for dealing with large volumes of data. Efficient linear or sublinear (using sampling) algorithms are needed.
  • Incremental Approaches. Incremental algorithms are desirable for dealing with with changing data. An incremental discovery system that can re-use its discoveries may be able to boot-strap itself.
  • Interactive Systems. Perhaps the best practical chance for discovery comes from systems, where a ``knowledge analyst'' uses a set of intelligent, visual and perceptual tools for data analysis. Such tools would go far beyond the existing statistical tools and significantly enhance the human capabilities for data analysis. What tool features are necessary to support effective interaction? Algorithms need to be re-examined from this point of view (e.g. a neural network may need to generate explanations from its weights).
The incremental, interactive discovery methods may transform the static databases of today into evolving information systems of tomorrow. Caution is required for discovery on demographic databases, to avoid findings that are illegal or unethical. Some of the research issues that were little addressed in this workshop, but are likely to become more important in the future are:
  • Discovery Tools. Deductive and object-oriented database systems can provide some of the needed support for induction on large volumes of data. Parallel hardware may be effectively used. What additional operations should be provided by the tools to support discovery?
  • Complex Data. Dealing with more complex (not just relational) data, including text, geographic information, CAD/CAM, and visual images.
  • Better Presentation. The discovered knowledge can be represented not only as rules, but as text, graphics, animation, audio patterns, etc.. Research on visualization and perceptual presentation is very relevant here.

IJCAI stands for International Joint Conferences on Artificial Intelligence.

No comments:

Post a Comment


~ 林徽因 · 馬雁散文集 · 蓮燈 ~ 馬雁 在她的散文《高貴一種,有詩為證》裡,提到「十多年前,還不知道林女士的八卦及成就前,在期刊上讀到別人引用的《蓮燈》」 覺得非常喜歡,比之卞之琳、徐志摩,別說是毫不遜色,簡直是勝出一籌。前面的韻腳和平仄的處理顯然高於戴...