Thursday, May 3, 2007

Birth of Data Mining

Data Mining is the evolution of a filed with long history, the term "data mining" emerged in late '80s and the researches of data mining flourished since 1990s. Many believed that the birth of data mining (or knowledge discovery) should trace back to the 1989 IJCAI workshop on Knowledge Discovery in Databases took pace in Detroit, Michigan, USA. The report was published in AI magazine and the bibex can be found at ACM digital library. The context of the document can be found at KDnuggets. (The Proceedings of the conference may be of interest).

The summary of the report are as follows:

The workshop confirmed that knowledge discovery in databases is an idea whose time has come. Some of the important research issues addressed in the workshop were:
  • Domain knowledge. It should be used to reduce search space, but used carefully so as not to prevent un-anticipated discoveries. While a specialized learning algorithm will outperform a general method, a desirable compromise is to develop a framework for augmenting the general method with the specific domain knowledge.
  • Dealing with Uncertainty. Databases typically have missing, incomplete or incorrect data items. Thus any discovery algorithm must deal with noise. Rules discovered in noisy data will necessarily be approximate.
  • Efficiency. Exponential and even high-order polynomial algorithms will not scale for dealing with large volumes of data. Efficient linear or sublinear (using sampling) algorithms are needed.
  • Incremental Approaches. Incremental algorithms are desirable for dealing with with changing data. An incremental discovery system that can re-use its discoveries may be able to boot-strap itself.
  • Interactive Systems. Perhaps the best practical chance for discovery comes from systems, where a ``knowledge analyst'' uses a set of intelligent, visual and perceptual tools for data analysis. Such tools would go far beyond the existing statistical tools and significantly enhance the human capabilities for data analysis. What tool features are necessary to support effective interaction? Algorithms need to be re-examined from this point of view (e.g. a neural network may need to generate explanations from its weights).
The incremental, interactive discovery methods may transform the static databases of today into evolving information systems of tomorrow. Caution is required for discovery on demographic databases, to avoid findings that are illegal or unethical. Some of the research issues that were little addressed in this workshop, but are likely to become more important in the future are:
  • Discovery Tools. Deductive and object-oriented database systems can provide some of the needed support for induction on large volumes of data. Parallel hardware may be effectively used. What additional operations should be provided by the tools to support discovery?
  • Complex Data. Dealing with more complex (not just relational) data, including text, geographic information, CAD/CAM, and visual images.
  • Better Presentation. The discovered knowledge can be represented not only as rules, but as text, graphics, animation, audio patterns, etc.. Research on visualization and perceptual presentation is very relevant here.

IJCAI stands for International Joint Conferences on Artificial Intelligence.

No comments:

Post a Comment


做一個更好的人,可以過上更好的生活,所以「我」要做一個更好的馬克杯!! Image Source: I NEED COFFEE: Life is Coffee Comics #23