Opportunities and Challenges along the ñInformation SuperHighwayî
MASSACHUSETTS INSTITUTE OF TECHNOLOGY SLOAN SCHOOL OF MANAGEMENT INFORMATION TECHNOLOGIES GROUP SEMANTIC INTEGRATION: Context Interchange (COIN) Project Presentation & Demonstration For Drs. Poindexter, Popp, Weishar, et al 1 July 2003 Stuart Madnick ([email protected]) Michael Siegel ([email protected]) Richard Wang ([email protected]) 1 Agenda Introduction to the Context Interchange (COIN) Project Motivation "[A]lthough there are many private and public databases that contain information potentially relevant to counterterrism programs, they lack the necessary context definitions (i.e., metadata) and access tools to enable interoperation with other databases and the extraction of meaningful and timely information." National Research Council (2002), "Making the Nation Safer" (emphasis added) Description of the COIN Context Mediation Technology Demonstration of the COIN Technology 2
COntext INterchange (COIN) Project Applications OUTPUT PROCESSI NG CONTEXT INPUT MEDIATIO PROCESSI NG N ODBC Driver * Automatic conflict detection and conversion Receivers - Derived data - Source Browser s Web selection
TRUSTE Publishing - SourceD attribution AGENTS * Automatic web wrapping - Semistructured text -Multisource query plan APPLICATIONS: Financial services, and electronic commerce, asset visibility, inexecution transit visibility. Web Pages Sources Data base s 3 Key COIN Technologies
Web Wrapper Extract selected information from web (HTML+XML) Allows web to be treated as large relational SQL database Handles dynamic web sites, cookies, login, etc. Performs SQL Joins & Unions involving DBs + Web sources Context Mediator Resolve semantic (meaning) differences Enable meaningful aggregation & comparison
4 Background on DARPA Support for Context Mediation Research Initial efforts funded as part of DARPA Intelligent Integration of Information (I3) Program Period: July 1993 - Sept 1998 Started under: Gio Wiederhold then under: Dave Gunning & Bob Neches I3 Program discontinued Other related activity: MIT Total Data Quality Management (TDQM) Since 1991 (web.mit.edu/tdqm) 5 Multiple Perspectives . . . old lady or young lady ? 6 Role Of Context 02-01-03 $ Context Context 01-02-03
? Context 03-02-01 CONTEXT VARIATIONS: Data: Databases - GEOGRAPHIC ( US vs. UK ) Web data - FUNCTIONAL (CASH MGMT vs. LOANS ) E-mail - ORGANIZATIONAL ( CITIBANK vs. CHASE ) 7 Types of Context Representational Ontological Temporal Example Temporal
Representational Currency: $ vs Francs before 2000, Scale factor: 1 vs 1000 thereafter Ontological Revenue: Includes vs excludes interest Revenue: Excludes interest before 1994 but incl. thereafter 8 Example : Context Differences ( from multiple web sources) Daimler Benz ( DCX ) P/E Ratio Financial Data ABC 11.6 Bloomberg 5.57 DBC 19.19 MarketGuide 7.46 9
Complementary Aggregation Example Q: How did CO2 emissions (total, per GDP, per capita) change over time (between 1990 and 2000) in Yugoslavia? Context 1: YUG as a geographic region bounded before the breakup Context 2: YUG as a legal autonomous state Related effort: - Laboratory for Information Globalization and Harmonization Technologies and Studies ( LIGHTS ) Project 10 World Banks World Dev. OAK Ridges CDIAC DB; Indicator DB; UN WRI; GSSD; EPAs Statistic Division; CO2 Emission Statistics Bureaus 1990 2000 Countr y
GDP Pop GDP Pop YUG 698.3 23. 7 1627. 8 10. 6 BIH 13.6 3.9 HRV 266.9
4.5 MKD 608.7 2.0 GDP in billions local currency; SVN 7162 2.0 Population in millions Olsen (Web) Fro m To 1990 2000 10.5 67.267 Country 1990 2000
YUG 35604 1548 0 USD YU G BIH 1279 USD BIH 2.086 HRV 5405 USD 8.089 MKD
3378 HR V USD MK D 64.757 SVN In 1000 tons per 3981 year USD SV N Context 1 225.93 Context 2 Country Code
Currency CurCod e Country 1990 2000 1990 2000 Yugoslavia YUG New Yug. Dinar YUN CO2 35604 29523
35604 1548 0 GDP 66.5 104.8 66.5 24.2 Marka BAM CO2/capita 1.5 1.28 1.5 1.46 CO2/GDP
535 282 535 640 Bosnia and Herzegovia BIH Croatia HRV Kuna HRK Macedonia MKD Denar MKD Slovenia SVN Tolar needed:
SIT Many sources eanings in sources & users might GDP/ 2800 4560 2800 1100 Total CO2 in 1000 tons per year; GDP in Capita billions USD; CO2/Capita in tons per 11 person; CO2/GDP in tons per million USD; GDP/Capita in USD per person differ The 1999 Overture Unit-of-measure mixup tied to loss of $125Million Mars Orbiter NASAs Mars Climate Orbiter was lost because engineers did not make a simple
conversion from English units to metric, an embarrassing lapse that sent the $125 million craft off course. . . . . . . The navigators ( JPL ) assumed metric units of force per second, or newtons. In fact, the numbers were in pounds of force per second as supplied by Lockheed Martin ( the 12 contractor ). The Context Interchange Approach Concept: Length Meters f() meters feet Feet Shared Ontologies Source Context Conversion Libraries
Context Mediator Receiver Context part length Context Transformation 17 Source Context Management Application Select partlength From catalog Where partno=12AY Receiver 13 COIN Conceptual Model (Ontology) 14
Another Context Example (Basis for Demo) Company Name DAIMLER-BENZ Net Income 614,995 97,736,992 Sales Context Mediation Services * Datastream Company Name DAIMLER-BENZ AG Net Income 346,577 Sales 56,268,168 * WorldScope Company Name DAIMLER BENZ CORP Net Income 615,000,000 Sales 97,737,000,000 * Disclosure
O&A DEM-USD Exchange Rate 1.00 German Mark= 0.58 US Dollar as 12/31/93 * OANDA Web Server Users & Appl. Systems * Wrapper Services 15 Some Context Differences Context Definitions Disclosure Country of Incorporation Money Amount As_Of_Date 3 Letters Currency Used Currency Conversion Currency Symbols Scale Factor 1 Disclosure Names Company
Names American with / as Date Style separator Worldscope USD Money Amount As_Of_Date 3 Letters DataStream Country of Incorporation Money Amount As_Of_Date 2 Letters 1000 Worldscope Names 1000 DataStream Names American with / as separator European with - as separator Olsen (OANDA) Web Source uses 3 Letter Currency Symbols and EuropeanDate Style
with / as a separator 16 Domain Model ur toC company cou ntry I ing d n yE format dateF mt fro mC u cur ren c
y te Inheritance Attribute Modifier f countryName rrency officialCu currencyType companyFinancials S a txnD scaleFactor r rT cu
e yp string ym nco rp exchangeRate number date companyName Some currency context possibilities: Currency is stated explicitly as part of record Currency not stated, but the same for all (e.g., US $) 17 Currency not stated or constant, but inferred by country COIN System Architecture SERVER PROCESSES MEDIATOR PROCESSES CLIENT PROCESSES Web Client
WWW Gateway COIN Repository N ODBC-compliant Apps Results (e.g Microsoft Excel) ODBC-Driver HTTPD-Daemon Web-site Data Store for Intermediate Results 18 System Demonstration Single Source Queries with Mediation Q6. Scenario: Using Context Interchange, you can look at the Disclosure data using Datastream Context. Query: Find out from Disclosure what Net Income for DAIMLER-BENZ was. Use Datastream Context. Capabilities Demonstrated:
Ability to perform Scale Factor Conversion, Date Format Conversion, Company Name Conversion. 19 Demonstration @ context2.mit.edu Source Context 20 Context Metadata (Partial) 21 Conflict Detection and Mediation Mediated Query in Datalog Date convert Scale factor convert Name convert 22 Mediated SQL Query & Result Mediated SQL Query Adjust scale factor Date format conversion Name conversion
Final results from Disclosure but in Datastream context 23 More Complex Example (4 sources: DB + Web) Databases Web source select WorldcAF.TOTAL_ASSETS, DiscAF.NET_SALES, DiscAF.NET_INCOME, DStreamAF.TOTAL_EXTRAORD_ITEMS_PRE_TAX, quotes.Last from WorldcAF, DiscAF, DStreamAF, quotes where WorldcAF.COMPANY_NAME = "DAIMLER-BENZ AG" and DStreamAF.AS_OF_DATE = "01/05/94" and WorldcAF.COMPANY_NAME = DStreamAF.NAME 24 and WorldcAF.COMPANY_NAME = DiscAF.COMPANY_NAME and WorldcAF.COMPANY_NAME = quotes.Cname; Conflict Table (1st part) 25 Conflict Table (2nd part) 26 Generated SQL (1st Part)
select worldcaf.total_assets, discaf.net_sales, ((discaf.net_income*0.001)*olsen.rate), (dstreamaf2.total_extraord_items_pre_tax*olsen2.rate), quotes.Last from (select date1, 'European Style -', '01/05/94', 'American Style /' from datexform where format1='European Style -' and date2='01/05/94' and format2='American Style /') datexform, (select dt_names, 'DAIMLER-BENZ AG' from name_map_dt_ws where ws_names='DAIMLER-BENZ AG') name_map_dt_ws, (select ds_names, 'DAIMLER-BENZ AG' from name_map_ds_ws where ws_names='DAIMLER-BENZ AG') name_map_ds_ws, (select 'DAIMLER-BENZ AG', ticker, exc from ticker_lookup2 where comp_name='DAIMLER-BENZ AG') ticker_lookup2, (select 'DAIMLER-BENZ AG', latest_annual_financial_date, current_outstanding_shares, net_income, sales, total_assets, country_of_incorp from worldcaf where company_name='DAIMLER-BENZ AG') worldcaf, (select country, currency from currencytypes where currency <> 'USD') currencytypes, (select exchanged, 'USD', rate, date from olsen where expressed='USD') olsen, (select company_name, latest_annual_data, current_shares_outstanding, net_income, net_sales, total_assets, location_of_incorp from discaf) discaf, 27 Generated SQL (Continued - Partial)
(select as_of_date, name, total_sales, total_extraord_items_pre_tax, earned_for_ordinary, currency from dstreamaf) dstreamaf, (select as_of_date, name, total_sales, total_extraord_items_pre_tax, earned_for_ordinary, currency from dstreamaf) dstreamaf2, (select char3_currency, char2_currency from currency_map where char3_currency <> 'USD') currency_map, (select country, currency from currencytypes where currency <> 'USD') currencytypes2, (select exchanged, 'USD', rate, '01/05/94' from olsen where expressed='USD' and date='01/05/94') olsen2, (select Cname, Last from quotes) quotes where currencytypes.country = discaf.location_of_incorp and currencytypes.currency = olsen.exchanged and dstreamaf.currency = dstreamaf2.currency and dstreamaf2.currency = currency_map.char2_currency and olsen.date = discaf.latest_annual_data and currency_map.char3_currency = currencytypes2.currency and currencytypes2.currency = olsen2.exchanged and name_map_dt_ws.dt_names = dstreamaf2.name and name_map_ds_ws.ds_names = discaf.company_name and ticker_lookup2.ticker = quotes.Cname and datexform.date1 = dstreamaf2.as_of_date and currencytypes.currency <> 'USD' and currency_map.char3_currency <> 'USD'
union select worldcaf2.total_assets, discaf2.net_sales, ((discaf2.net_income*0.001)*olsen3.rate), dstreamaf4.total_extraord_items_pre_tax, quotes2.Last from (select date1, 'European Style -', '01/05/94', 'American Style /' from datexform where format1='European Style -' and date2='01/05/94' and format2='American Style /') datexform2, (select dt_names, 'DAIMLER-BENZ AG' from name_map_dt_ws where ws_names='DAIMLER-BENZ AG') name_map_dt_ws2, (select ds_names, 'DAIMLER-BENZ AG' from name_map_ds_ws where ws_names='DAIMLER-BENZ AG') name_map_ds_ws2, (select 'DAIMLER-BENZ AG', ticker, exc from ticker_lookup2 where comp_name='DAIMLER-BENZ AG') ticker_lookup22, (select 'DAIMLER-BENZ AG', latest_annual_financial_date, current_outstanding_shares, net_income, sales, total_assets, country_of_incorp from worldcaf where company_name='DAIMLER-BENZ AG') worldcaf2, (select country, currency from currencytypes where currency <> 'USD') currencytypes3, (select exchanged, 'USD', rate, date from olsen where expressed='USD') olsen3, (select company_name, latest_annual_data,
current_shares_outstanding, net_income, net_sales, total_assets, location_of_incorp from discaf) discaf2, (select as_of_date, name, total_sales, total_extraord_items_pre_tax, earned_for_ordinary, currency from dstreamaf) dstreamaf3, (select 'USD', char2_currency from currency_map where char3_currency='USD') currency_map2, etc 28 Final Result 29 Execution Trace (1st Part - Partials) Parallel Execution ... Retrieving data From Web source 30 Execution Trace (Continued - Partials) ... Stock price returned
From Web source Another Web source used (for currency conversion) ... 31 The 1805 Overture In 1805, the Austrian and Russian Emperors agreed to join forces against Napoleon. The Russians promised that their forces would be in the field in Bavaria by Oct. 20. The Austrian staff planned its campaign based on that date in the Gregorian calendar. Russia, however, still used the ancient Julian calendar, which lagged 10 days behind. The calendar difference allowed Napoleon to surround Austrian General Mack's army at Ulm and force its surrender on Oct. 21, well before the Russian forces could reach him, ultimately setting the stage for Austerlitz. Source: David Chandler, The Campaigns of Napoleon, New York: MacMillan 1966, pg. 390. 32 Summar y to gather and integrate Tremendous opportunity information from many diverse sources But need to overcome many context challenges Context-type metadata plays a critical role COIN technology can be an important aid for
semantically meaningful information integration: - Scalable - Extensible - Application Domain Merging - Reuse and extension of ontologies and contexts 33 Appendix Some Useful Reference Material Documents Overview: http://computer.org/conferen/proceed/meta/1999/papers/84/smadnick.html "Metadata Jones and the Tower of Babel: The Challenge of Large-Scale Semantic Heterogeneity", (IEEE Meta-Data Conference) Theory of COIN: http://web.mit.edu/smadnick/www/wp/1997-03.pdf Context Interchange: New Features and Formalisms for the Intelligent Integration of Information (ACM TOIS) Contact us for more Web sites Main COIN web site: http://context2.mit.edu Miscellaneous demos: http://context2.mit.edu/coin/demos/ Self-explanatory demo: http://interchange.mit.edu:8080/gcms_v4/airCarMergeTop.html (Airfare and Car rental applications, includes ontology merging. Caution: still under development) 34 Appendix: Sample Applications
Airfare, Car Rental and Merged Travel Weather Global Price Comparison Airfare Aggregation Disaster Relief TASC Financial Example Web Services Demo Corporate Householding 35 Appendix: User or Program (via SQL Query) COIN Web-Wrapper Technology Select Edgar.Net_income From Edgar Where Edgar.Ticker=intc and Edgar.Form=10-Q Web page spec file * SQL Side
Web Wrapper Generat or HTML Side Ticker Net Income INTC 1,983 Data record returned * Spec file contains: Schema, Navigation rules, 36 and Extraction rules.
Huddle Up! 2:05- 2:45 Daily re-teach/enrichment time Switch Classes Small groups Support Staff Kathleen MacLaren - Inclusion Jeanie Wylie - Math Interventionist Monica Spray - Reading Interventionist Gradespeed This year parents will be able to view student grades through Gradespeed....
This first module is an introduction to information about mild cognitive impairment (MCI) and the four most common types of dementia—notably: (1) Alzheimer's disease (AD), (2) vascular dementia (VaD), (3) Lewy body dementia (LBD)—which encompasses dementia with Lewy bodies (DLB)...
Northern Nevada Winter Storm Event 2017. The National Weather Service advised local, state and federal partners of a series of atmospheric river storms coming to the northern Nevada area January 4 through January 10. In preparation of the anticipated storms...
L'acqua risulta bluperché quando la luce del sole, che contiene tutti i colori, vi penetra, alcuni colori vengono assorbiti dalle sue molecole, in particolare esse assimilano maggiormente i colori arancione e rosso, quindi quando la luce arriva ai nostri occhi...
Introduction: "It can be easy to feel overwhelmed or to sometimes wonder to yourself 'do I really belong here?'" Independent reflective writing activity. Students listen to quotes from graduating seniors designed to highlight overcoming challenges and initial loneliness. Discuss essay...
Ready to download the document? Go ahead and hit continue!