Data Warehousing and Data Mining
The process of Discovering meaningful
patterns & trends often previously
unknown, by shifting large amount of
data, using pattern recognition, statistical
and Mathematical techniques.
A group of techniques that find
relationship that have not previously been
What Is Data Mining?
Data mining (knowledge discovery in
Extraction of interesting (non-trivial, implicit,
previously unknown and potentially useful)
information or patterns from data in large databases
Alternative names and their “inside stories”:
Knowledge discovery(mining) in databases (KDD),
knowledge extraction, data/pattern analysis, data
archeology, data dredging, information harvesting,
business intelligence, etc.
What is not data mining?
(Deductive) query processing.
Data Mining: Confluence of Multiple Disciplines
Data Mining: On What Kinds of Data?
Database-oriented data sets and applications
Relational database, data warehouse, transactional database
Advanced data sets and advanced applications
Data streams and sensor data
Time-series data, temporal data, sequence data (incl. bio-sequences)
Structure data, graphs, social networks and multi-linked data
Heterogeneous databases and legacy databases
Data Mining Applications
Data mining is a young discipline with wide and
There is still a nontrivial gap between general
principles of data mining and domain-specific,
effective data mining tools for particular
Some application domains
Biomedical and DNA data analysis
Financial data analysis
Biomedical Data Mining and DNA
DNA sequences: 4 basic building blocks (nucleotides): adenine (A),
cytosine (C), guanine (G), and thymine (T).
Gene: a sequence of hundreds of individual nucleotides arranged in a
Humans have around 100,000 genes
Tremendous number of ways that the nucleotides can be ordered and
sequenced to form distinct genes
Semantic integration of heterogeneous, distributed genome
Current: highly distributed, uncontrolled generation and use of a
wide variety of DNA data
Data cleaning and data integration methods developed in data
mining will help
DNA Analysis: Examples
Similarity search and comparison among DNA sequences
Compare the frequently occurring patterns of each class (e.g.,
diseased and healthy)
Identify gene sequence patterns that play roles in various diseases
Association analysis: identification of co-occurring gene sequences
Most diseases are not triggered by a single gene but by a
combination of genes acting together
Association analysis may help determine the kinds of genes that
are likely to co-occur together in target samples
Path analysis: linking genes to different disease development stages
Different genes may become active at different stages of the
Develop pharmaceutical interventions that target the different
Download the pdf file to get all the unit wise notes.
Try to save your valueable time!