Data Warehouse and Data Mining - read.pudn.com

Data Warehouse and Data Mining - read.pudn.com

2001 6 7

Data warehouse is a subject oriented, integrated,non-volatile and time variant collection of data in support of managements decision [Inmon,1996]. Data warehouse is a set of methods, techniques,and tools that may be leveraged together to produce a vehicle that delivers data to end-users on an integrated platform [Ladley,1997].

Data warehouse is a process of crating, maintaining,and using a decision-support infrastructure [Appleton,1995] [Haley,1997][Gardner 1998].

Customer ID [Inmon,1996]] Data Mart, ODS

Data Mart -- Operation Data Store ODS , DB , DW (Subject -Oriented) ETL, ETL

ETL Extract/Transformation/Load Microsoft DTS; IBM Visual Warehouse etc.

OLAP

ETL (Repository)) Relational Appl. Package Legacy [Pieter ,1998] MidTier Warehouse Admin. Tools Extract, Transform and Load Central Data Warehouse

Data Mart Local Metadata Central Metadata External Data Cleansing Tool Source Databases RDBMS Metadata Exchange Local Metadata Data Modeling

Tool Data Extraction, Transformation, load MidTier Local Metadata Central Data Warehouse Data Mart RDBMS MDB End-User DW Tools Architected Data Marts Data Access and

Analy)sis ODS ODS OLTP Tools Relational Appl. Package Legacy Warehouse Admin. Tools Extract, Transform and Load MidTier Central Data Warehouse

Local Metadata Central Metadata External Data Cleansing Tool Source Databases Data Mart Data Modeling Tool Hub - Data Extraction, Transformation, load Metadata Exchange

MidTier RDBMS Local Metadata Data Mart RDBMS Local Metadata Central Data Warehouse and ODS Architected Data Marts MDB End-User DW Tools Data Access and Analy)sis

[Douglas Hackney) ,2001] i2 Supply) Chain Packaged I2 Supply) Chain Non- Architected Data Mart Oracle Financials Siebel CRM Packaged Oracle Financial Data Warehouse Subset Data Marts 3rd Party) e-Commerce

Custom Marketing Data Warehouse / i2 Supply) Chain Oracle Financials Siebel CRM Common Staging Area Federated Financial Data Warehouse Federated Packaged I2 Supply) Chain Data Marts

3rd Party) e-Commerce Real Time ODS Federated Marketing Data Warehouse Subset Data Marts Analy)tical Applications Real Time Data Mining and Analy)tics Real Time Segmentation, Classification, Qualification,

Offerings, etc. BI Front- and backoffice OLTP e-Business sy)stems External information providers ETL tools & DW templates Data profiling & reengineering tools Demand-driven data acquisition & analysis Metadata Interchange Federated data warehouse and data mart systems

Decision engine models, rules and metrics OLAP & data mining tools, Analysis templates Analytic application development tools & components Analytic applications HR Analy)tics & Reporting Financial Analy)tics & Reporting CRM Analy)tics & Reporting

EKP - Enterprise Knowledge Management Portal Supply) Chain Analy)tics & Reporting Informed decisions & actions EPM Analy)tics & Reporting Business information & recommendations -

End-User Tool Relational Package Legacy Data Data Staging Staging Enterprise Data Warehouse Datamart RDBMS Datamart RDBMS

ROLAP External source Data Clean Tool End-User Tool End-User Tool MDB End-User Tool ETL

ETL -[Alex Berson etc, 1999]

Internet

metadata repository [Martin Stardt 2000] OLAP

Top-Down Bottom Up Top-down Approach Build Enterprise data warehouse Common central data model Data re-engineering performed once Minimize redundancy and inconsistency Detailed and history data; global data discovery Build datamarts from the

Enterprise Data Warehouse (EDW) Subset of EDW relevant to department Mostly summarized data Direct dependency on EDW data availability Operational Data External Data Enterprise Warehouse Local Data Mart Local Data Mart ( )

ROI -- --

( ) EDB EDB ( ) Example of Star Schema Product

Date Date Month Year Store StoreID City State Country Region Sales Fact Table Date Product Store Customer unit_sales dollar_sales Yen_sales Measurements ProductNo

ProdName ProdDesc Category QOH Cust CustId CustName CustCity CustCountry Year Year Example of Snowflake Schema Product Month Month Year Date Sales Fact Table

Date Month Date Product Store City City State State Country Country Region StoreID City State Country Measurements

Store Customer unit_sales dollar_sales Yen_sales ProductNo ProdName ProdDesc Category QOH Cust CustId CustName CustCity CustCountry OLTP ---

(Metrics)

1. 2. 6. 7.

5. DASD 4. 3. [Inmon 1996]]

/ / / 8. [Inmon 1996]] SQL 10. 9.

11. I/O CPU 12. 13. 14. 15. 16. [Inmon 1996]] 17. /

18. DBMS DBMS 10GB/100GB/TB DBMS Lock Commit CheckPoint Log DeadLock Roolback. DBMS DBMS DBMS DBMS DSS 19. DBMS 20. DBMS

DBMS DBMS OLAP 21. DASD/ [Inmon 1996]] 22. DSS IT / 23. / / /

/ / / / / / / 24. CDC

[Inmon, 1999] 2000 5

DW <10 12%

50-100 22% 500-1000 16% 100-500 36% DW Meta Group Survey) 3000+ >1000 14% DW 100-500 DW

50-250GB 19% 250-500GB 8% 500GB-1TB 21% <50GB 12% DW Meta Group Survey) 3000+ > 1 TB 40% How Much?

$3-6m for mid-size company), less if smaller, more if larger $10m+ for large organizations, large data sets 10-50+% annual maintenance costs 33% Hardware / 33% Software / 33% Services How Long? 2-4 y)ears for 80/20 of full sy)stem for mid-size company) 6-12 months for initial iteration 3-6 months for subsequent iterations How Risky)?

For EDW Projects, 20% (Meta) to 70% (OTR, DWN) fail High failure rate for non-business driven initiatives Very) few sy)stems meet the expectations of the business Failure not due to technology), due to soft issues Massive upside to successful projects (100% - 2000+% ROI) 99% politics - 1% technology)

Inmon,W.H., Building the Data Warehouse ,Johm Wiley) and Sons,1996. Ladley),John,Operational Data Stores:Building an Effective Strategy),Data warehouse:Pratical Advice form the Experts,Prentice Hall,Englewood Cliffs,NJ,1997. Gardmer,Stephen R., Building the Data warehouse,Communication of ACM, September 1998, Volume 41, Numver 9, 52-60. Douglas Hackney) , Http:// www.egltd.com, DW101: A Practical Overview, 2001 Pieter R. Mimno, The Big Picture - How Brio Competes in the Data Warehousing Market, Presentation to Brio Technology) - August 4, 1998. Alex Berson, Stephen Smith, Kurt Therling, Building Data Mining Application for CRM, McGraw-Hill, 1999 Martin Stardt, Anca Vaduva, Thomas Vetterli, The Role of Meta for Data Warehouse, 2000 W.H.Inmon, Ken Rudin, Christopher K. Buss, Ry)an Sousa, Data Warehouse Performance, John Wiley) & Sons , 1999

Data Mining Upsides Data Mining Downsides Data Mining Use Data Mining Industry) and Application Data Mining Costs Clustering 22% Direct Marketing 14% Cross-Sell Models 12% www.kdnuggets .com 2001/6/11 News Data Mining Upsides

Discovery) of previously) unknown relationships, trends, anomalies, etc. Powerful competitive weapon Automation of repetitive analy)sis Predictive capabilities

Data Mining Downsides Knowledge discovery) technology) immature Long learning and tuning cy)cles for some technologies Black box technology) minimizes confidence VLDB (Very) Large Data Base) requirements Data Mining Uses Discover anomalies, outliers and exceptions in process data Discover behavior and predict outcomes of customer relationships

Churn management Target marketing (market of one) Promotion management Fraud detection Pattern ID & matching (dark programs, science) Data Mining Industry) and Applications From research prototy)pes to data mining products, languages, and standards

IBM Intelligent Miner, SAS Enterprise Miner, SGI MineSet, Clementine, MS/SQLServer 2000, DBMiner, BlueMartini, MineIt, DigiMine, etc. A few data mining languages and standards (esp. MS OLEDB for Data Mining). Application achievements in many) domains Market analy)sis, trend analy)sis, fraud detection, outlier analy)sis, Web mining, etc. Data Mining Costs Desktop tools: $500 and up (MSFT coming at low price point)

Server / MF based: $20,000 to $700,000+ Must also add cost of extensive consulting for high end tools Dont forget long training and learning curve time Ongoing process, not task automation software

1989 IJCAI Workshop on Knowledge Discovery) in Databases

1991-1994 Workshops on Knowledge Discovery) in Databases Knowledge Discovery) in Databases (G. Piatetsky)-Shapiro and W. Frawley), 1991) Advances in Knowledge Discovery) and Data Mining (U. Fay)y)ad, G. Piatetsky)Shapiro, P. Smy)th, and R. Uthurusamy), 1996) 1995-1998 International Conferences on Knowledge Discovery) in Databases and Data Mining (KDD95-98) Journal of Data Mining and Knowledge Discovery) (1997) 1998 ACM SIGKDD, SIGKDD1999-2001 conferences, and SIGKDD Explorations

More conferences on data mining PAKDD, PKDD, SIAM-Data Mining, (IEEE) ICDM, DaWaK, SPIE-DM, etc. Data Mining: Confluence of Multiple Disciplines Database Technology Machine Learning (AI) Information Science Statistics Data Mining Visualization Other

Disciplines A Multi-Dimensional View of Data Mining Databases to be mined Relational, transactional, object-relational, active, spatial, timeseries, text, multi-media, heterogeneous, legacy), WWW, etc. Knowledge to be mined Characterization, discrimination, association, classification, clustering, trend, deviation and outlier analy)sis, etc. Techniques utilized Database-oriented, data warehouse (OLAP), machine learning,

statistics, visualization, neural network, etc. Applications adapted Retail, telecommunication, banking, fraud analy)sis, DNA mining, stock market analy)sis, Web mining, Weblog analy)sis, etc. Research Progress in the Last Decade Multi-dimensional data analy)sis: Data warehouse

and OLAP (on-line analy)tical processing) Association, correlation, and causality) analy)sis Classification: scalability) and new approaches Clustering and outlier analy)sis Sequential patterns and time-series analy)sis Similarity) analy)sis: curves, trends, images, texts, etc. Text mining, Web mining and Weblog analy)sis Spatial, multimedia, scientific data analy)sis Data preprocessing and database compression Data visualization and visual data mining Many) others, e.g., collaborative filtering Research Directions [Han J. W. , 2001] Web mining Towards integrated data mining environments and tools

Vertical (or application-specific) data mining Invisible data mining Towards intelligent, efficient, and scalable data mining methods Towards Integrated Data Mining Environments and Tools OLAP Mining: Integration of Data Warehousing and Data Mining Query)ing and Mining: An Integrated Information Analy)sis Environment

Basic Mining Operations and Mining Query) Optimization Vertical (or application-specific) data mining Invisible data mining Query)ing and Mining: An Integrated Information Analy)sis Environment Data mining as a component of DBMS, data warehouse, or Web information sy)stem Integrated information processing environment

MS/SQLServer-2000 (Analy)sis service) IBM IntelligentMiner on DB2 SAS EnterpriseMiner: data warehousing + mining Query)-based mining Query)ing database/DW/Web knowledge Efficiency) and flexibility): preprocessing, on-line processing, optimization, integration, etc. Vertical Data Mining

Generic data mining tools? Too simple to match domain-specific, sophisticated applications Expert knowledge and business logic represent many) y)ears of work in their own fields! Data mining + business logic + domain experts A multi-dimensional view of data miners Complexity) of data: Web, sequence, spatial, multimedia, Complexity) of domains: DNA, astronomy), market, telecom, Domain-specific data mining tools

Provide concrete, killer solution to specific problems Feedback to build more powerful tools Invisible Data Mining Build mining functions into daily) information services Web search engine (link analy)sis, authoritative pages, user profiles)adaptive web sites, etc. Improvement of query) processing: history) + data

Making service smart and efficient Benefits from/to data mining research Data mining research has produced many) scalable, efficient, novel mining solutions Applications feed new challenge problems to research Towards Intelligent Tools for Data Mining Integration paves the way) to intelligent mining Smart interface brings intelligence

Easy) to use, understand and manipulate One picture may) worth 1,000 words Visual and audio data mining Human-Centered Data Mining Towards self-tuning, self-managing, selftriggering data mining Integrated Mining: A Booster for Intelligent Mining Integration paves the way) to intelligent mining Data mining integrates with DBMS, DW, WebDB, etc

Integration inherits the power of up-to-date information technology): query)ing, MD analy)sis, similarity) search, etc. Mining can be viewed as query)ing database knowledge Integration leads to standard interface/language, function/process standardization, utility), and reachability) Efficiency) and scalability) bring intelligent mining to reality) CRISPDM

XML PMML SOAP Simple Object Access Protocol CRoss-Industry) Standard Process

for Data Mining OLE DB For Data Mining

1 2

+ +

+ + DNA MIS ERP CRM

E_Business

ETL

MIS ER P CRM 1 2 3 4

5 1 2 3 4 1 2 3 4 5

ERP CRM ETL Any) Questions? [email protected]

Recently Viewed Presentations

  • Scholarship Award Ceremony May 12, 2010 SBVC Auditorium

    Scholarship Award Ceremony May 12, 2010 SBVC Auditorium

    James Yurkunski Memorial. Perry KimOsher Initiative Scholarship ... Justin Ayoub "This scholarship allows me to concentrate on my studies by supplementing my finances for educational expenses." ...
  • Cardiac Output - Weebly

    Cardiac Output - Weebly

    Cardiac Output. Still, some people have stronger heart than others. Their heart can pump more blood and allow them to do more work. The amount of blood that your heart pumps in a minute is called Cardiac Output (Q)
  • Cognates &amp; Colors

    Cognates & Colors

    Cognates, Colors & Simple Words Cognates German and English (long with Danish and Dutch) are related languages. Several words like "Park", "Computer" & "Ball" are exactly the same in both German and English in spelling (only the pronunciation is different).
  • Harry Lampiris, MD, Moderator Chief, ID Section, Medical

    Harry Lampiris, MD, Moderator Chief, ID Section, Medical

    Planning Committee members: Amanda Newstetter, Claire Rappoport, Jessica Price, and Harry Lampiris . Speakers: Harry Lampiris MD, Susa Coffey MD, Jennifer Price MD PhD, and Hyman Scott MD MPH. bayareaaetc.org. Learning objectives: 1. Identify the current science research on the...
  • Color Theory: Part I

    Color Theory: Part I

    PowerPoint Presentation PowerPoint Presentation PowerPoint Presentation PowerPoint Presentation PowerPoint Presentation PowerPoint Presentation PowerPoint Presentation PowerPoint Presentation COLOR THEORY: PART III Neutral Colors Browns & greys can be created by combining COMPLIMENTARY COLORS (opposite colors on color wheel) Blacks & dark...
  • Ciberinfncia Profa. Dra. Patricia Alejandra Behar Mestranda Caroline

    Ciberinfncia Profa. Dra. Patricia Alejandra Behar Mestranda Caroline

    Para definir um software educacional, é preciso desvendar a filosofia educacional por trás da construção deste programa de computador. Essa nova categoria de software coloca o aluno em uma nova posição no processo de aprendizagem.
  • Chapter 8 Police and Constitutional Law - Union High School

    Chapter 8 Police and Constitutional Law - Union High School

    Verdana Arial Wingdings Times Eclipse 1_Eclipse Chapter 8 Police and Constitutional Law Search and Seizure Arrests Searches With Warrants Totality of Circumstances PowerPoint Presentation Plain View Searches Open Fields Doctrine Plain Feel Doctrine Warrantless Searches PowerPoint Presentation PowerPoint Presentation Special...
  • Cell Division - Socorro Independent School District

    Cell Division - Socorro Independent School District

    This process is ASEXUAL reproduction and occurs in BODY CELLS (AKA: Somatic cells/Diploid Cells) The Life of a Cell Interphase: G1Phase Cells spend MOST of their lives in interphase G1. During this time the cell is doing its job Cellular...