Data warehouse is a subject oriented, integrated,non-volatile and time variant collection of data in support of managements decision [Inmon,1996]. Data warehouse is a set of methods, techniques,and tools that may be leveraged together to produce a vehicle that delivers data to end-users on an integrated platform [Ladley,1997].
Data warehouse is a process of crating, maintaining,and using a decision-support infrastructure [Appleton,1995] [Haley,1997][Gardner 1998].
Customer ID [Inmon,1996]] Data Mart, ODS
Data Mart -- Operation Data Store ODS , DB , DW (Subject -Oriented) ETL, ETL
ETL Extract/Transformation/Load Microsoft DTS; IBM Visual Warehouse etc.
ETL (Repository)) Relational Appl. Package Legacy [Pieter ,1998] MidTier Warehouse Admin. Tools Extract, Transform and Load Central Data Warehouse
Data Mart Local Metadata Central Metadata External Data Cleansing Tool Source Databases RDBMS Metadata Exchange Local Metadata Data Modeling
Tool Data Extraction, Transformation, load MidTier Local Metadata Central Data Warehouse Data Mart RDBMS MDB End-User DW Tools Architected Data Marts Data Access and
Analy)sis ODS ODS OLTP Tools Relational Appl. Package Legacy Warehouse Admin. Tools Extract, Transform and Load MidTier Central Data Warehouse
Local Metadata Central Metadata External Data Cleansing Tool Source Databases Data Mart Data Modeling Tool Hub - Data Extraction, Transformation, load Metadata Exchange
MidTier RDBMS Local Metadata Data Mart RDBMS Local Metadata Central Data Warehouse and ODS Architected Data Marts MDB End-User DW Tools Data Access and Analy)sis
[Douglas Hackney) ,2001] i2 Supply) Chain Packaged I2 Supply) Chain Non- Architected Data Mart Oracle Financials Siebel CRM Packaged Oracle Financial Data Warehouse Subset Data Marts 3rd Party) e-Commerce
Custom Marketing Data Warehouse / i2 Supply) Chain Oracle Financials Siebel CRM Common Staging Area Federated Financial Data Warehouse Federated Packaged I2 Supply) Chain Data Marts
3rd Party) e-Commerce Real Time ODS Federated Marketing Data Warehouse Subset Data Marts Analy)tical Applications Real Time Data Mining and Analy)tics Real Time Segmentation, Classification, Qualification,
Offerings, etc. BI Front- and backoffice OLTP e-Business sy)stems External information providers ETL tools & DW templates Data profiling & reengineering tools Demand-driven data acquisition & analysis Metadata Interchange Federated data warehouse and data mart systems
Top-Down Bottom Up Top-down Approach Build Enterprise data warehouse Common central data model Data re-engineering performed once Minimize redundancy and inconsistency Detailed and history data; global data discovery Build datamarts from the
Enterprise Data Warehouse (EDW) Subset of EDW relevant to department Mostly summarized data Direct dependency on EDW data availability Operational Data External Data Enterprise Warehouse Local Data Mart Local Data Mart ( )
ROI -- --
( ) EDB EDB ( ) Example of Star Schema Product
Date Date Month Year Store StoreID City State Country Region Sales Fact Table Date Product Store Customer unit_sales dollar_sales Yen_sales Measurements ProductNo
ProdName ProdDesc Category QOH Cust CustId CustName CustCity CustCountry Year Year Example of Snowflake Schema Product Month Month Year Date Sales Fact Table
Date Month Date Product Store City City State State Country Country Region StoreID City State Country Measurements
50-100 22% 500-1000 16% 100-500 36% DW Meta Group Survey) 3000+ >1000 14% DW 100-500 DW
50-250GB 19% 250-500GB 8% 500GB-1TB 21% <50GB 12% DW Meta Group Survey) 3000+ > 1 TB 40% How Much?
$3-6m for mid-size company), less if smaller, more if larger $10m+ for large organizations, large data sets 10-50+% annual maintenance costs 33% Hardware / 33% Software / 33% Services How Long? 2-4 y)ears for 80/20 of full sy)stem for mid-size company) 6-12 months for initial iteration 3-6 months for subsequent iterations How Risky)?
For EDW Projects, 20% (Meta) to 70% (OTR, DWN) fail High failure rate for non-business driven initiatives Very) few sy)stems meet the expectations of the business Failure not due to technology), due to soft issues Massive upside to successful projects (100% - 2000+% ROI) 99% politics - 1% technology)
Inmon,W.H., Building the Data Warehouse ,Johm Wiley) and Sons,1996. Ladley),John,Operational Data Stores:Building an Effective Strategy),Data warehouse:Pratical Advice form the Experts,Prentice Hall,Englewood Cliffs,NJ,1997. Gardmer,Stephen R., Building the Data warehouse,Communication of ACM, September 1998, Volume 41, Numver 9, 52-60. Douglas Hackney) , Http:// www.egltd.com, DW101: A Practical Overview, 2001 Pieter R. Mimno, The Big Picture - How Brio Competes in the Data Warehousing Market, Presentation to Brio Technology) - August 4, 1998. Alex Berson, Stephen Smith, Kurt Therling, Building Data Mining Application for CRM, McGraw-Hill, 1999 Martin Stardt, Anca Vaduva, Thomas Vetterli, The Role of Meta for Data Warehouse, 2000 W.H.Inmon, Ken Rudin, Christopher K. Buss, Ry)an Sousa, Data Warehouse Performance, John Wiley) & Sons , 1999
Data Mining Upsides Data Mining Downsides Data Mining Use Data Mining Industry) and Application Data Mining Costs Clustering 22% Direct Marketing 14% Cross-Sell Models 12% www.kdnuggets .com 2001/6/11 News Data Mining Upsides
Discovery) of previously) unknown relationships, trends, anomalies, etc. Powerful competitive weapon Automation of repetitive analy)sis Predictive capabilities
Data Mining Downsides Knowledge discovery) technology) immature Long learning and tuning cy)cles for some technologies Black box technology) minimizes confidence VLDB (Very) Large Data Base) requirements Data Mining Uses Discover anomalies, outliers and exceptions in process data Discover behavior and predict outcomes of customer relationships
Churn management Target marketing (market of one) Promotion management Fraud detection Pattern ID & matching (dark programs, science) Data Mining Industry) and Applications From research prototy)pes to data mining products, languages, and standards
IBM Intelligent Miner, SAS Enterprise Miner, SGI MineSet, Clementine, MS/SQLServer 2000, DBMiner, BlueMartini, MineIt, DigiMine, etc. A few data mining languages and standards (esp. MS OLEDB for Data Mining). Application achievements in many) domains Market analy)sis, trend analy)sis, fraud detection, outlier analy)sis, Web mining, etc. Data Mining Costs Desktop tools: $500 and up (MSFT coming at low price point)
Server / MF based: $20,000 to $700,000+ Must also add cost of extensive consulting for high end tools Dont forget long training and learning curve time Ongoing process, not task automation software
1989 IJCAI Workshop on Knowledge Discovery) in Databases
1991-1994 Workshops on Knowledge Discovery) in Databases Knowledge Discovery) in Databases (G. Piatetsky)-Shapiro and W. Frawley), 1991) Advances in Knowledge Discovery) and Data Mining (U. Fay)y)ad, G. Piatetsky)Shapiro, P. Smy)th, and R. Uthurusamy), 1996) 1995-1998 International Conferences on Knowledge Discovery) in Databases and Data Mining (KDD95-98) Journal of Data Mining and Knowledge Discovery) (1997) 1998 ACM SIGKDD, SIGKDD1999-2001 conferences, and SIGKDD Explorations
More conferences on data mining PAKDD, PKDD, SIAM-Data Mining, (IEEE) ICDM, DaWaK, SPIE-DM, etc. Data Mining: Confluence of Multiple Disciplines Database Technology Machine Learning (AI) Information Science Statistics Data Mining Visualization Other
Disciplines A Multi-Dimensional View of Data Mining Databases to be mined Relational, transactional, object-relational, active, spatial, timeseries, text, multi-media, heterogeneous, legacy), WWW, etc. Knowledge to be mined Characterization, discrimination, association, classification, clustering, trend, deviation and outlier analy)sis, etc. Techniques utilized Database-oriented, data warehouse (OLAP), machine learning,
statistics, visualization, neural network, etc. Applications adapted Retail, telecommunication, banking, fraud analy)sis, DNA mining, stock market analy)sis, Web mining, Weblog analy)sis, etc. Research Progress in the Last Decade Multi-dimensional data analy)sis: Data warehouse
and OLAP (on-line analy)tical processing) Association, correlation, and causality) analy)sis Classification: scalability) and new approaches Clustering and outlier analy)sis Sequential patterns and time-series analy)sis Similarity) analy)sis: curves, trends, images, texts, etc. Text mining, Web mining and Weblog analy)sis Spatial, multimedia, scientific data analy)sis Data preprocessing and database compression Data visualization and visual data mining Many) others, e.g., collaborative filtering Research Directions [Han J. W. , 2001] Web mining Towards integrated data mining environments and tools
Vertical (or application-specific) data mining Invisible data mining Towards intelligent, efficient, and scalable data mining methods Towards Integrated Data Mining Environments and Tools OLAP Mining: Integration of Data Warehousing and Data Mining Query)ing and Mining: An Integrated Information Analy)sis Environment
Basic Mining Operations and Mining Query) Optimization Vertical (or application-specific) data mining Invisible data mining Query)ing and Mining: An Integrated Information Analy)sis Environment Data mining as a component of DBMS, data warehouse, or Web information sy)stem Integrated information processing environment
MS/SQLServer-2000 (Analy)sis service) IBM IntelligentMiner on DB2 SAS EnterpriseMiner: data warehousing + mining Query)-based mining Query)ing database/DW/Web knowledge Efficiency) and flexibility): preprocessing, on-line processing, optimization, integration, etc. Vertical Data Mining
Generic data mining tools? Too simple to match domain-specific, sophisticated applications Expert knowledge and business logic represent many) y)ears of work in their own fields! Data mining + business logic + domain experts A multi-dimensional view of data miners Complexity) of data: Web, sequence, spatial, multimedia, Complexity) of domains: DNA, astronomy), market, telecom, Domain-specific data mining tools
Provide concrete, killer solution to specific problems Feedback to build more powerful tools Invisible Data Mining Build mining functions into daily) information services Web search engine (link analy)sis, authoritative pages, user profiles)adaptive web sites, etc. Improvement of query) processing: history) + data
Making service smart and efficient Benefits from/to data mining research Data mining research has produced many) scalable, efficient, novel mining solutions Applications feed new challenge problems to research Towards Intelligent Tools for Data Mining Integration paves the way) to intelligent mining Smart interface brings intelligence
Easy) to use, understand and manipulate One picture may) worth 1,000 words Visual and audio data mining Human-Centered Data Mining Towards self-tuning, self-managing, selftriggering data mining Integrated Mining: A Booster for Intelligent Mining Integration paves the way) to intelligent mining Data mining integrates with DBMS, DW, WebDB, etc
Integration inherits the power of up-to-date information technology): query)ing, MD analy)sis, similarity) search, etc. Mining can be viewed as query)ing database knowledge Integration leads to standard interface/language, function/process standardization, utility), and reachability) Efficiency) and scalability) bring intelligent mining to reality) CRISPDM
XML PMML SOAP Simple Object Access Protocol CRoss-Industry) Standard Process
Cardiac Output. Still, some people have stronger heart than others. Their heart can pump more blood and allow them to do more work. The amount of blood that your heart pumps in a minute is called Cardiac Output (Q)
Cognates, Colors & Simple Words Cognates German and English (long with Danish and Dutch) are related languages. Several words like "Park", "Computer" & "Ball" are exactly the same in both German and English in spelling (only the pronunciation is different).
Planning Committee members: Amanda Newstetter, Claire Rappoport, Jessica Price, and Harry Lampiris . Speakers: Harry Lampiris MD, Susa Coffey MD, Jennifer Price MD PhD, and Hyman Scott MD MPH. bayareaaetc.org. Learning objectives: 1. Identify the current science research on the...
PowerPoint Presentation PowerPoint Presentation PowerPoint Presentation PowerPoint Presentation PowerPoint Presentation PowerPoint Presentation PowerPoint Presentation PowerPoint Presentation COLOR THEORY: PART III Neutral Colors Browns & greys can be created by combining COMPLIMENTARY COLORS (opposite colors on color wheel) Blacks & dark...
Para definir um software educacional, é preciso desvendar a filosofia educacional por trás da construção deste programa de computador. Essa nova categoria de software coloca o aluno em uma nova posição no processo de aprendizagem.
This process is ASEXUAL reproduction and occurs in BODY CELLS (AKA: Somatic cells/Diploid Cells) The Life of a Cell Interphase: G1Phase Cells spend MOST of their lives in interphase G1. During this time the cell is doing its job Cellular...
Ready to download the document? Go ahead and hit continue!