Our Secret Data Mart: This and other misadventures in my ten ...

Our Secret Data Mart: This and other misadventures in my ten ...

Our Secret Data Mart: This and other misadventures in my ten years as a business analyst Presentation to the NEODWSIG 3/23/2006 by Michael Mina Agenda My two bios

The goal of this presentation Why "misadventures"? Lessons learned Some tools I use The story of our secret data mart Parting advice http://michaelmina.info Copyright 2006 Michael Mina 2 My real world bio Worked as a business analyst since 1995 Medical Mutual Antares Management Solutions

Cleveland State University National City Medical Mutual Completed ad hoc data requests for many different departments. Developed and programmed the disease management outcomes reporting process. Recognized by CEO for efforts resulting in NCQA accreditation. Coordinated and prioritized development of reporting, system modifications and resolution of reporting problems for the Underwriting division. Recognized by VP for helping save $400K annually. http://michaelmina.info Copyright 2006 Michael Mina

3 My real world bio (cont'd) Antares Management Solutions Working in IT, I developed reporting from multiple data warehouses for business units and partner corporations Developed web-enabled GUI front-end metadata repository/data dictionary. Developed and taught courses in business intelligence tools and the metadata repository for business units and partner corporations. Contributed to data warehouse redesign project. Conceived and developed Quality Assurance methodology for maintenance of dimension tables. Assisted in the development of data marts and reporting for several departments.

http://michaelmina.info Copyright 2006 Michael Mina 4 My real world bio (cont'd) Cleveland State University Managed staff and developed analyses, reporting processes and databases for various departments Managed the Institutional Research component of the twoyear PeopleSoft 8 upgrade project. Recognized by both President and CIO for outstanding contributions to the project. Chaired Data Stewardship Committee (25 members) to manage University data and ensure that the University possesses the accurate data it requires.

National City Currently working on various data warehouse-related projects, including data warehouse metrics. http://michaelmina.info Copyright 2006 Michael Mina 5 My real world bio (cont'd) Founder of GCPCUG Data Warehousing SIG (met monthly 1/2000 - 4/2002) Developed and currently teach "Business Intelligence & Data Warehousing" continuing education course at CSU

http://michaelmina.info Copyright 2006 Michael Mina 6 My Bizarro world bio I'm a citizen of Afghanistan, with Hispanic and Japanese ancestry. I'm eligible for the Medicare Rx benefit. My wife is also Hispanic (but not from Afghanistan.) She is interested in dating others, and has a newborn that I do not know about. That's what some companies think--My family demographic data needs serious revision. (For more about Bizarro world, see "This am Bizarro page" at http://theages.superman.ws/

Encyclopaedia/bizarro.php) http://michaelmina.info Copyright 2006 Michael Mina 7 My Bizarro world bio (cont'd) My point? Enterprises vary greatly in the degree to which they pursue data quality. How important is data quality? And to whom does it matter? Some departments actually consider data quality a threat, regardless of what they say (more on that later.)

http://michaelmina.info Copyright 2006 Michael Mina 8 The goal of this presentation Share some of my experiences and the experiences of others (with their permission) for your edification Focus your attention on matters I believe are not addressed frequently enough Help you survive and thrive in this line of business http://michaelmina.info Copyright 2006 Michael Mina

9 Why "misadventures"? "Accentuate the positive" or learn from mistakes? In his classic "The Data Warehouse Lifecycle Toolkit", Kimball clearly expects almost everything to go well. He's a highly paid consultant, so it's no surprise he gets the cooperation he needs. I had NO experiences that ran as smoothly as his. http://michaelmina.info Copyright 2006 Michael Mina 10 Why "misadventures"? (cont'd)

Since becoming a business analyst in 1995, I've reported to 8 different managers at 4 different companies. I've seen more missteps than successfully completed data warehouse-related projects. Projects running past deadline Chameleon-like project requirements Broken promises, false accusations Other unorthodox, unprofessional behaviors http://michaelmina.info Copyright 2006 Michael Mina 11 Why "misadventures"? (cont'd)

As a part-time career consultant for eight years, people have sought my advice for dealing with a variety of unpleasant situations. Experience is the best teacher and bad experience can be the best of the best--especially if it's someone else's. http://michaelmina.info Copyright 2006 Michael Mina 12 Lessons I've learned as a Business Analyst I want to share lessons that are data analysis-related and career-related The latter are important: Do not ignore your own best

interests. I've seen People hired on Monday who had their department downsized on Friday People told their department survived a downsizing, just before someone was terminated (not for cause) So with that in mind, here are some http://michaelmina.info Copyright 2006 Michael Mina 13 Lessons I've learned as a Business Analyst

People define "Data Warehouse" differently Some people refer to the location of physical servers as the DW Some people call questionably modeled data sets that are used for reporting purposes a DW Some even call a TNF copy of a TNF production database a DW Analyst impact: Make sure you understand which definition is being used This is especially important during your interviews with people unaware of these multiple definitions. Don't bother "correcting" people, its counterproductive http://michaelmina.info Copyright 2006 Michael Mina

15 True dimensional models are a rarity With one possible exception, every DW data model I've encountered is TNF, or somewhere between TNF and dimensional, but never purely dimensional Examples: TNF portions of DWs Every date dimension table I've seen uses date as the key rather than a surrogate key Many decisionmakers cannot be sold on the value of dimensional modeling One Senior VP claimed he did not want "his" data warehoused. He wanted reporting from the operational system - it was "closer to the source" Fact: The data governance group was controlled by his rival.

Was it ignorance, politics, or both? http://michaelmina.info Copyright 2006 Michael Mina 16 True dimensional models are a rarity (cont'd) Analyst impact: Modeling is often easier without a dimension model requirement The opportunities to fully develop dimensional modeling skills may be limited http://michaelmina.info Copyright 2006 Michael Mina

17 Companies love MSRE ("misery") Multi-Source Reporting Environments more common than DWs One version of the truth often unworkable Too expensive and time consuming Deployment of revenue generating systems (and the need for information therefrom) will not wait for the data to be warehoused More common than DW "one version of the truth": Source systems as system of record Several witnesses (data sources) testify, and their "testimony" is adjudicated. Hence, MSRE. http://michaelmina.info

Copyright 2006 Michael Mina 18 Companies love MSRE (cont'd) Analyst impact: More data sources to understand Newer analysts need more mentoring, even when there is effective metadata management Opportunity to become very valuable to employer by understanding these sources http://michaelmina.info Copyright 2006 Michael Mina

19 Data quality can be a threat Do NOT assume the desire for data quality is universal "Jim" found out that many dimension tables were missing values that were needed to allow analysts to join them to the fact tables. He took it upon himself to develop a process to systematically detect values in the fact tables that were missing from the dimension tables. It generated 60+ pages (single spaced) of missing codes and descriptions. Jim proudly showed his manager, who looked as if he had seen a ghost. He thanked Jim, but the system was not implemented. http://michaelmina.info

Copyright 2006 Michael Mina 20 Data quality can be a threat (cont'd) The problem: Jim later learned that his department created the problem, and that not too many users were aware of it. There was no glory to be had in cleaning up one's own mess, especially when few people knew about the mistake in the first place. OOPS! Analyst impact: Be more politically astute Weigh the risks of acting independently. In this case, Jim had no regrets. Any guesses why?

http://michaelmina.info Copyright 2006 Michael Mina 21 Cost-effectiveness can be career-ineffective "Dave" worked for a state government plagued by significant budget constraints. The directive from on high was to cut costs. Dave's boss wanted to have her annual department meeting at an expensive location off site. Dave reminded her of the cost cutting directive. It was not appreciated. When his department was tasked with developing an interactive PDF document containing standard

reports from a data mart, Dave's staff put together a very low cost, scalable prototype using Access, Excel and pdf995 (freeware). http://michaelmina.info Copyright 2006 Michael Mina 22 Cost-effectiveness can be career-ineffective (cont'd) His boss wound up having the system developed using SQL server, running on a $30K server, and paying $10K to graphic designers for the interactive PDF. It was completed 18 months after the prototype. The prototype was about 80% as functional as the final version.

The problem: Dave's boss believed that one of her staff would gain more job satisfaction using SQL Server. Also, she needed help from a peer who had purchased a $30K server that was underutilized. And, her peer needed to justify his purchase. And so, a friendship was born http://michaelmina.info Copyright 2006 Michael Mina 23 Cost-effectiveness can be career-ineffective (cont'd) Analyst impact: Understand differences between the public sector and

private sectors. Some people just like to spend taxpayer money, and you can't stop them. The 80/20 rule works both ways: if you are strapped for cash, sometimes it is better to settle for 80% of the functionality at 20% of the cost. What's cost-effective for your employer may not be careereffective for you. You must decide which is a higher priority for you. In most cases, those who try to work cost-effectively are appreciated. http://michaelmina.info Copyright 2006 Michael Mina 24 Requirements management = danger management Requirements management is the sine qua non of

business analysis. It is challenging at best, dangerous at worst. Challenging: Determining who has asthma when that information is not explicitly captured in the claims data. Determining who is an employee Cleveland State University could not provide one single answer because they were required to report to multiple agencies that had multiple specifications. There was a Dept of Education definition of Employee There was an Ohio Civil Rights Commission definition as well. http://michaelmina.info Copyright 2006 Michael Mina

25 Requirements management = danger management (cont'd) Challenging (cont'd) Suppose I wanted a list of all Cleveland customers. Do I want: Customers that live in Cleveland? Customers that live in the Cleveland sales region? Customers that live in the Cleveland administrative region? Customers that are assigned to the Cleveland sales region, regardless of where they actually live? You get the idea Large scale requirements management (e.g., using IBM Rational RequisitePro.) http://michaelmina.info

Copyright 2006 Michael Mina 26 Requirements management = danger management (cont'd) Dangerous #1: Variable requirements - fixed deadline Consider Rob's case: Rob's manager kept reversing himself on his data requests "Paint this wagon red." "You know, this wagon would look better blue." "Why is this wagon blue? Everyone knows wagons should be red." On one occasion, Rob's boss significantly changed project requirements the day before the deadline. At 5:00 pm on the day of the deadline (he usually left at 5:00), he wrote Rob an e-mail criticizing him for not having the project complete.

Rob went to talk to his boss, but he had gone for the day. People on Rob's staff even provided proof that Rob's boss reversed himself on requirements. http://michaelmina.info Copyright 2006 Michael Mina 27 Requirements management = danger management (cont'd) The problem: Rob reported to someone who had a reality-allergy. Rob was eventually let go, officially because of downsizing. The solution: Rob documented his case to his boss in writing, along with

supporting documentation. His boss toned down the criticism, but the situation did not revert to normalcy. Rob also saw an attorney, only to learn that as a result of special circumstances, and through no fault of his own, he didn't have a case. Rob was eventually able to convince his boss to write him an outstanding letter of recommendation. http://michaelmina.info Copyright 2006 Michael Mina 28 Requirements management = danger management (cont'd) Analyst impact: You must be able to clearly articulate your concerns about

requirements, especially to those who don't understand the complexities of your work. You must document your concerns in writing, preferably in nonrepudiable form (e.g., e-mail), at minimum in a personal log. You must keep excellent records of project requirements, changes in requirements, preferably in nonrepudiable form. http://michaelmina.info Copyright 2006 Michael Mina 29 Requirements management = danger management (cont'd) Analyst impact (cont'd): Even better: publish all changes to the requirements, and all

issues related to the requirements, and ensure that stakeholders receive this information. This prevents any pretense to ignorance. These must be records in your personal possession. General access records are fine as a supplement to personal records. You cannot be too attached to your job, or your employer. You too may suddenly find yourself reporting to realityaverse management. Pay as much attention to managing your career as to doing your job. Rob did, and found a better job. http://michaelmina.info Copyright 2006 Michael Mina 30 Requirements management = danger management (cont'd)

Dangerous #2: Determining that requirements have been met Simple? NO! Who has final say as to whether or not a requirement has been met? Does he/she play political games? Often need to negotiate to agree that stated requirements have been met The problem: Bonuses for some managers can depend upon completion of certain projects by certain dates. This may result in political pressure to declare that requirements were met. Also, missing a deadline almost always impacts one negatively. http://michaelmina.info

Copyright 2006 Michael Mina 31 Requirements management = danger management (cont'd) Analyst impact: Be aware that compromises that often occur. These include Restating the requirements until the work that has been done meets the revised requirements (often accompanied by pretending that these were what the requirements should have been all along.) Loosely interpreting the requirements (like penumbras, emanations and such) Note: I am indicating neither approval nor disapproval for these. Do not compromise on compliance reporting unless you are willing to go to prison to benefit other people's careers. And

DONT think youre immune to such pressure, a friend of mine was not. He did the right thing by refusing to cooperate with senior management. (Note: he was eventually promoted into another department.) http://michaelmina.info Copyright 2006 Michael Mina 32 "One-time-only" means "several-times-at-least" I have yet to develop a one-time-only process one time only! Even if the results are due once, the requirements are often developed interatively, and that can mean rework. I have never regretted building a repeatable process.

One example: "Chris" was assigned a mentor that had started a particular analytical project manually rather than programmatically. He was tasked by his manager to assume responsibility for that analysis and automate it. However, his mentor felt that continuing the analysis manually was the best course of action, even though Chris and his mentor had the same manager. http://michaelmina.info Copyright 2006 Michael Mina 33 "One-time-only" means "several-times-at-least" (cont'd) One example (cont'd) When Chris met with his mentor to discuss his progress on

the project, his mentor could not accept that he was to automate the process. She insisted that, had she continued working on the project, it would have been done by then. Even though Chris followed his manager's instructions, his mentor's negative attitude toward his progress threatened to damage his career. The reality: The requirements for the analysis changed significantly one day. Had the process been done manually, the rework costs would have been high. Two days later, the requirements changed significantly, including twice in the same morning. http://michaelmina.info Copyright 2006 Michael Mina

34 "One-time-only" means "several-times-at-least" (cont'd) The problem: Did Chris' mentor want the opportunity to complete the "onetime-only" analysis? Did he/she want to micromanage the project? Was he/she jealous that Chris was able to automate this process when the mentor could not? Who knows? http://michaelmina.info Copyright 2006 Michael Mina 35

"One-time-only" means "several-times-at-least" (cont'd) Analyst impact: Everyone likes to be right. Understand that some people like to be right, so they learn when they're wrong and adjust (I hope that's you.) Other people like to be right, and think they are always right. You need to document their incorrectness. Carefully document: The requirements given to you Your decisions regarding implementation Any shortcomings of the alternatives to repeatibility. http://michaelmina.info Copyright 2006 Michael Mina 36

"One-time-only" means "several-times-at-least" (cont'd) Analyst impact (cont'd) Business rules are often developed iteratively. Common sense suggests that a process to implement those rules also be developed and executed iteratively. Build repeatability into your processes. If you expect some opposition to this, then do not advertise your plans. Repeatable processes often make for better discussion in job interviews than "one-time-only" projects. http://michaelmina.info Copyright 2006 Michael Mina 37

Understanding ETL process(es) is important To become a more effective business analyst, it is not enough to be familiar with the data in the warehouse. You must also understand: How the data is loaded The systems from which data is extracted Into which warehouse tables that data is loaded Which data is not loaded and why This information is often not captured at a sufficient level of detail in metadata repositories ETL breaks or fixes can cause strange differences in reports from one month to the next. These may not be detected until after the warehouse is loaded. http://michaelmina.info

Copyright 2006 Michael Mina 38 Understanding ETL process(es) is important (cont'd) Analyst impact: More information to understand Newer analysts need more mentoring, even when there is effective metadata management Opportunity to become very valuable to employer by understanding these processes http://michaelmina.info Copyright 2006 Michael Mina 39

Beware of IT-focused decisionmaking We must not be IT-bashers, but: People who understand technology better than business should not make business decisions, and People who understand business better than technology should not make technology decisions. There is a subculture within IT, less dominant than before, that I call "business-agnostic" - they don't really understand their employer's business, and some of them don't want to. They tend to be technophilic. Some are actually businessphobic. When they have significant decisionmaking authority, they promote suboptimal outcomes.

http://michaelmina.info Copyright 2006 Michael Mina 40 Beware of IT-focused decisionmaking (cont'd) Examples of IT-focus to the detriment of businessfocus: Example 1 A trainer who makes it known that she enjoyed working as a trainer, "except for the people" (i.e., the students.) Example 2 IT tells its SQL trainer (me) not to teach business analysts about the LEFT JOIN and the RIGHT JOIN because their queries would adversely affect system performance.

http://michaelmina.info Copyright 2006 Michael Mina 41 Beware of IT-focused decisionmaking (cont'd) The problem: The dimension tables are incomplete, and IT is at fault. A friend of mine ran a report that was short $12 million as a result. IT must either accept responsibility for the incomplete dimension tables, or accept that business analysts must work around it somehow. http://michaelmina.info

Copyright 2006 Michael Mina 42 Beware of IT-focused decisionmaking (cont'd) Analyst impact: It is NOT acceptable to hide from business analysts the fact that using the INNER JOIN could cause their reports to be millions of dollars out of balance. Weigh the cost of additional CPU time vs. the cost of allowing senior management to receive incorrect information - this is a no-brainer. I explained that I would not comply for the above reasons, and because compliance would damage my reputation. Your reputation is a personal asset. You should not allow an employer to squander it any more than you would allow them to raid your savings account.

http://michaelmina.info Copyright 2006 Michael Mina 43 Beware of IT-focused decisionmaking (cont'd) Example 3 IT and business analysts are displeased with the level of effort required to perform complex analyses. Consequently, IT seeks a replacement for the company standard BI tool. The problem: The warehouse's data model was antiquated and no longer supported the complexity of the evolving analytical environment.

Replacement of the BI tool was contraindicated. The solution must involve data model remediation. The quick fix - changing BI tools - would have been a waste of time and money. http://michaelmina.info Copyright 2006 Michael Mina 44 Beware of IT-focused decisionmaking (cont'd) Example 4 IT tells business analysts to stop using an uppercase conversion function in the WHERE clause in SQL because it makes queries perform poorly.

The problem: Many fields on which analysts must query are mixed case. IT did not bother contacting analysts before issuing this "ruling". Analyst impact: Learn to identify business-agnosticism and make plans to deal with it in a manner that allows IT to save face. Gradually bring IT decisionmakers to the understanding that business-agnostic decisions undermine their reputation and authority. http://michaelmina.info Copyright 2006 Michael Mina 45

Network intelligently The problem: The career advice often given regarding networking is not sophisticated enough. "Network, network, network!" is not a strategy. Volume cannot always compensate for lack of a definite plan. Analyst impact: Like any other undertaking, you should determine how much effort you are willing to give, what you want to take, and how to effectively accomplish both. Why do you (audience) think I (presenter) am here? http://michaelmina.info Copyright 2006 Michael Mina

46 Network intelligently (cont'd) Analyst impact (cont'd): Primary goals in networking should include Building skills you otherwise might not get to develop. Meeting people likely to help you with your career objectives. The NEODWSIG meets both of these criteria Seek opportunities to work with people in other departments and/or companies to achieve primary goals. Volunteer at places that will help you accomplish those same goals. If your job does not afford you intelligent networking opportunities, consider finding another.

http://michaelmina.info Copyright 2006 Michael Mina 47 Don't become "VP of the Weather" Avoid like the plague getting in situations where your responsibility greatly exceeds your authority to execute. I call this being "Vice President of the Weather" - you are held responsible for actions you cannot control. Often stems from decisionmakers not being held to account for their own unreasonable behavior. http://michaelmina.info

Copyright 2006 Michael Mina 48 Don't become "VP of the Weather (contd) For example: "Tim" was accused by a "Sam," a Vice President of IT, of making a bad decision in product selection. In front of a large group, Sam demanded to know why Tim made the bad selection. Tim pulled out an e-mail from Sam proving that Sam made the product selection, and Tim answered "because you told me to." Note to e-mail users: Don't let this happen to you! http://michaelmina.info

Copyright 2006 Michael Mina 49 Don't become "VP of the Weather" (cont'd) The problem: Sam was my boss' boss' boss' boss, and he made me Vice President of the Weather on two occasions. Occasion 1 My manager and I were made responsible for completely understanding the data of a client. The problem: The client did not allow us to talk freely to their subject matter experts. We were not permitted to study the client's data freely, nor

were we allocated the time to do so. The solution: You think there is one? Fortunately, matters never came to a head. http://michaelmina.info Copyright 2006 Michael Mina 50 Don't become "VP of the Weather" (cont'd) Occasion 2 Sam tasks me with developing a large set of reports from a managed care system for a client. The problem:

I was forbidden to talk with ANYONE working for the client. I could not ask questions about the deliverables, or the source data. The solution: I used the Force. Seriously, I had experience developing reports like this from a similarly structured database. With the judicious use of assumptions, I was able to deliver. http://michaelmina.info Copyright 2006 Michael Mina 51 Don't become "VP of the Weather" (cont'd)

Analyst impact Always gather information to document the unreasonableness of the request or assignment in the event that things go wrong. The more times you pull a rabbit out of a hat, the more likely you will be looked upon as a wizard. This is usually bad because you will be relied upon to cover the incompetence of others. Determine when failure IS an option. Properly managed, limited failure can deter unreasonable requests in the future. Realistically, you should be responsible for Your own actions The actions of those who report to you The actions of those you've misled (e.g., by giving bad advice, false information, etc.) http://michaelmina.info

Copyright 2006 Michael Mina 52 Some tools I use as a Business Analyst Tools Source-to-target data maps Data flow charts Ad hoc validation Informal metadata

Creative cross joins Date dimension table Truth tables Using Excel for query construction Complex case catalog http://michaelmina.info Copyright 2006 Michael Mina 54 Source-to-target data maps A table (e.g., in Excel) showing columns needed for your analysis, and the tables from which you plan to source them.

Best used when your analysis sources data from many tables. http://michaelmina.info Copyright 2006 Michael Mina 55 Source-to-target data maps (cont'd) This an example: http://michaelmina.info Copyright 2006 Michael Mina 56

Source-to-target data maps (cont'd) Advantages: Complex data analyses could take much longer without this tool. Helps analysts quickly identify columns and tables already being used by the process, as well as best sources for additional columns. Excel allows filtering on the columns, allowing for more focused review of the data. Can serve as an addendum to a requirements document Disadvantages: Requires some additional effort to develop and maintain http://michaelmina.info

Copyright 2006 Michael Mina 57 Data flow charts Not the same as data flow diagrams Similar to both flow charts and UML activity diagrams, but not exactly like either http://michaelmina.info Copyright 2006 Michael Mina 58 Data flow charts (cont'd) Data sources are listed by

name. qnnn = query tnnn = intermediate table Queries are executed in numerical order. Data sources and intermediate tables point to queries that source data from them. Queries point to intermediate tables they create, update, etc. http://michaelmina.info Copyright 2006 Michael Mina 59

Data flow charts (cont'd) Advantages: Facilitates making inevitable changes to your process Gives a graphical overview of your process Helps illustrate where changes in your process will have their impacts Disadvantages: Symbology not well developed (e.g., is it a create table, update table, or append table?) Some additional time is required to develop and update the data flow chart. http://michaelmina.info Copyright 2006 Michael Mina

60 Ad hoc validation Run an audit process parallel to your primary process. Goal: systematically perform reasonableness checks at each point in your process. At a minimum, create a table listing The name of each table in your process, and Its row count. http://michaelmina.info Copyright 2006 Michael Mina 61

Ad hoc validation (contd) Advantages: Helps identify where improper logic was used. Helps identify points in your process about which you are likely to be questioned by the stakeholders. For example: Your row count drops after an inner join. Is that a desired or undesired result? Disadvantages: Requires some additional effort Requires additional CPU time http://michaelmina.info Copyright 2006 Michael Mina 62

Informal metadata Formal metadata Often found in metadata repositories Physical characteristics of tables, columns, etc. Definitions of tables, columns, etc. Informal metadata Often found in the memories of subject matter experts May not be easily stored and retrieved using traditional metamodels http://michaelmina.info Copyright 2006 Michael Mina 63

Informal metadata (cont'd) Certain types of metadata are better attached to topics than to tables or columns. Consider this situation: Suppose we want to determine if a customer account is open, and there is a closed_date field in table A, the table that is the main source of account information. One would think that if closed_date were null, then the account would be open. But suppose that is the case only for accounts processed on system X. Accounts processed on system Y, however, are always missing their closed date on table A for various reasons. To determine if accounts processed on system Y are open, you must verify that, in another table B, the field open_status = 'Y' for the account in question.

http://michaelmina.info Copyright 2006 Michael Mina 64 Informal metadata (cont'd) Question: To which columns or tables should this very valuable metadata (i.e., how to identify an open account) be attached? To closed_date? To open_status? To table A? To table B? Answer: To something else entirely This is an example of what I call topical metadata, the topic

being "open customer account." Note: the term "topical metadata" is used frequently in the context of the semantic web, and rarely, if ever, in the context of relational databases. Nevertheless, I believe it is appropriate here. http://michaelmina.info Copyright 2006 Michael Mina 65 And Now, It's Time For The Informal Metadata Game Show! The Informal Metadata Game Show Which weighs more? A pound of feathers or a pound

of gold? http://michaelmina.info Copyright 2006 Michael Mina 68 Answer: A pound of feathers weighs more Metadata is key to this riddle because "a pound" is not "a pound" Of course, a quantity of feathers that weighs the same as a quantity of gold will take up more volume than the gold, but this is not a matter of density. http://michaelmina.info

Copyright 2006 Michael Mina 69 Answer: A pound of feathers weighs more (cont'd) 1 "pound" of feathers = 1 avoirdupois pound This is equal to 7,000 grains See http://www.britannica.com/eb/article-9011455 1 "pound" of gold = 1 Troy pound One Troy pound is 5,760 of the same grains used in the definition of avoirdupois pound See http://www.24carat.co.uk/weightsframe.html 1 pound of feathers weighs more than 1 pound of gold because "pound" has different meanings in each context.

http://michaelmina.info Copyright 2006 Michael Mina 70 Informal metadata (cont'd) A column called "weight" may thus have a different meaning for a particular row if another column called item_desc has the value "gold in that row. http://michaelmina.info Copyright 2006 Michael Mina 71

Creative cross joins What is a cross join (aka Cartesian product)? Suppose Table1 has r1 rows and c1 columns Suppose Table2 has r2 rows and c2 columns Table1 cross join Table2 is the combination of every row of Table1 with every row of Table2. Table1 cross join Table2 has (r1 x r2) rows and (c1 + c2) columns The problem is not so much the c1 + c2 as the r1 x r2. I tell my students that cross joins are like snake venom: often deadly, but they can be used to make very powerful medicine in the right hands. http://michaelmina.info

Copyright 2006 Michael Mina 72 Creative cross joins (cont'd) Cross join applications Date dimension table Truth tables Any application where you need to exhaust all possible combinations of N quantities http://michaelmina.info Copyright 2006 Michael Mina 73

Date dimension table Cross joins can be used to help create a date dimension table for your data warehouse, or any other database. This is an example done using Microsoft Access Create (manually or otherwise) one table listing the digits 0 through 9. http://michaelmina.info Copyright 2006 Michael Mina 74 Date dimension table (cont'd)

Write a cross join query to create a Cartesian product of the table with itself four times. Our SELECT statement will use the Cartesian product to create the integers 0 through 10,000. The DateAdd function will add these integers to a starting date (in this case, 1/1/1990). Different databases may use a different function to add integers to dates. Save these results in a table. http://michaelmina.info Copyright 2006 Michael Mina 75 Date dimension table (cont'd)

SELECT 1000*A.Digit+100*B.Digit+10*C.Digit+D.Digit AS DateKey, DateAdd("d",1000*A.Digit+100*B.Digit+10*C.Digit+D.Digit ,#01/01/1990#) AS DateValue INTO DateValues FROM Digits AS A, Digits AS B, Digits AS C, Digits AS D ORDER BY 1; http://michaelmina.info Copyright 2006 Michael Mina 76 Date dimension table (cont'd) Then, create a date dimension table using the previous table.

Use the row functions provided by your database to add the information you need. Remember to create a primary key on the Time dimension table. http://michaelmina.info Copyright 2006 Michael Mina 77 Date dimension table (cont'd) SELECT DateKey, DateValue, Day(DateValue) AS Day, Month(DateValue) AS Month, Int((Month(DateValue)+2)/3) AS Quarter, Year(DateValue) AS Year INTO TimeDimension

FROM DateValues http://michaelmina.info Copyright 2006 Michael Mina 78 Truth tables Truth tables can be used to help determine and validate business rules. They are created by Identifying a set of inputs exhausting all combinations of True and False for those inputs evaluating a business rule for each combination

This is an example done using Microsoft Access. http://michaelmina.info Copyright 2006 Michael Mina 79 Truth tables (cont'd) Create a two-column, two-row table like so: Cross join the tables (SQL on next slide) http://michaelmina.info Copyright 2006 Michael Mina

80 Truth tables (cont'd) SELECT tblTrueFalse.status_boolean, tblTrueFalse_1.status_boolean, tblTrueFalse_2.status_boolean, tblTrueFalse_3.status_boolean FROM tblTrueFalse, tblTrueFalse AS tblTrueFalse_1, tblTrueFalse AS tblTrueFalse_2, tblTrueFalse AS tblTrueFalse_3 ORDER BY tblTrueFalse.status_boolean, tblTrueFalse_1.status_boolean, tblTrueFalse_2.status_boolean, tblTrueFalse_3.status_boolean; http://michaelmina.info

Copyright 2006 Michael Mina 81 Truth tables (cont'd) This is the output http://michaelmina.info Copyright 2006 Michael Mina 82 Truth tables (cont'd) Select the Analyze It with Microsoft Excel option http://michaelmina.info

Copyright 2006 Michael Mina 83 Truth tables (cont'd) Add your business rules and evaluate them to make sure they correspond to reality. http://michaelmina.info Copyright 2006 Michael Mina 84 Truth tables (cont'd) Advantages:

Useful when you must persuade others of the validity of decision rules Useful in identifying holes in decision logic Disadvantages: Requires effort to develop http://michaelmina.info Copyright 2006 Michael Mina 85 Using Excel for query construction Text-related functions in Excel can be used to write SQL. Advantages:

Useful when you have complex and/or volatile decision rules Useful when you have a set of queries you would like to have coded consistently Disadvantages: Time consuming http://michaelmina.info Copyright 2006 Michael Mina 86 Using Excel for query construction (cont'd) Input some basic information in this section

http://michaelmina.info Copyright 2006 Michael Mina 87 Using Excel for query construction (cont'd) This is the main workspace. Query parameters are outlined. http://michaelmina.info Copyright 2006 Michael Mina 88 Using Excel for query construction (cont'd)

This is the first query, ready to be copied into Access. http://michaelmina.info Copyright 2006 Michael Mina 89 Using Excel for query construction (cont'd) This is the second query, ready to be copied into Access. http://michaelmina.info Copyright 2006 Michael Mina 90

Complex case catalog Create a catalog of complex cases you've worked on over the course of your career Business problem statements, specifications, personal notes, queries, code samples, etc. Advantages: Provides a portable knowledgebase to which you can refer regardless of your employer. Facilitates resume and interview preparation. Can provide ideas for writing articles. Disadvantages: Requires effort to maintain Legal issues may be associated with keeping copies http://michaelmina.info

Copyright 2006 Michael Mina 91 The story of our secret data mart The story of our secret data mart A coworker friend and I built a data mart without official authorization. Background: I was developing the first disease management outcomes reporting process for my employer. The problem: I was running up against deadlines and system resource issues.

My manager offered next to no support. In fact, she was getting angry with me for system problems I could not control (that dreaded Vice President of the Weather thing.) http://michaelmina.info Copyright 2006 Michael Mina 93 The story of our secret data mart (cont'd) The solution I spoke with a good friend in IT about the issue, and we decided to build a data mart. We knew that formal project authorization would not be forthcoming, so we worked on it behind the scenes.

Another problem: We had to account for our time. The solution I was "working with IT to help improve system performance" for my project My friend's time was spent on generic "performance improvement" http://michaelmina.info Copyright 2006 Michael Mina 94 The story of our secret data mart (cont'd) Another problem:

How do we suddenly reveal the existence of a data mart? The solution Evolutionary terminology From "IT thinks they need to create a summarized data set for us." To "IT is summarizing data for us to try to improve performance." To "I'm working with IT to decide which columns from which tables we need to summarize." To "It's really like a data mart" To "we used our data mart" Our managers either didn't notice, or pretended not to notice. http://michaelmina.info Copyright 2006 Michael Mina

95 The story of our secret data mart (cont'd) We got the work done by deadline. After that, people seemed to pretend that building a data mart was the plan all along. It's amazing how they love a winner. We planned additional data marts for additional diseases, but it was not to be. How did it end? Shortly thereafter, my friend was promoted into another department. Sometime after that, I was promoted into my friend's former department. Within one year, my former manager was demoted for unclear reasons.

http://michaelmina.info Copyright 2006 Michael Mina 96 The story of our secret data mart (cont'd) Analyst impact: Consider very carefully how willing you are to bend the rules. If your manager wants you to do a job with one hand tied behind your back, you need to ask why. We risked damaging our careers to help managers who lacked foresight accomplish their goals. I would not advise this, nor do it again. http://michaelmina.info

Copyright 2006 Michael Mina 97 Parting advice Deepen and expand your skill set. It increases Your marketability Your authority within your organization Seek opportunities to achieve, and to understand the impact of those achievements Develop a career portfolio Network effectively "As you do good work for your employer, whoever it may be, always, always ask 'what's in it for me?' " You owe it to yourself, your spouse/ family/ causes you support Think twice before building data marts, or anything

else, in secret. I lucked out once. I wouldnt do it again. http://michaelmina.info Copyright 2006 Michael Mina 98 Thank you!

Recently Viewed Presentations

  • New LARP Magnet System WBS

    New LARP Magnet System WBS

    Traces w/o copper plating . Shortest coil in MQXFAP1 and AP1b. Discrepancy Reports in next talk. No feedback from MQXFS01b & S3 analysis. 6/6/2019. ... We cannot yet finally distinguish between NT or T side of the coil. Observations. 12/7/2016....
  • Placement of Aeroneb with HFOV

    Placement of Aeroneb with HFOV

    Placement of Aeroneb with HFOV Jim Fink, PhD, RRT, FAARC [email protected] High Frequency Oscillation Ventilation : In Vitro Setup Infant Ventilation Demers et al. ATS San Diego, 2005 Aerosol Deposition 10.25±6.0% Effect of Nebulizer Position on Aerosol Delivery with HFOV...
  • Spatial Computation

    Spatial Computation

    Ideal Architecture Spatial Computation Thesis committee: Seth Goldstein Peter Lee Todd Mowry Babak Falsafi Nevin Heintze Ph.D. Thesis defense, December 8, 2003 Thesis Statement Application-Specific Hardware (ASH): can be synthesized by adapting software compilation for predicated architectures, provides high-performance for...
  • Objective - To factor trinomials in the form,

    Objective - To factor trinomials in the form,

    Objective - To factor trinomials in the form,, where a is an integer > 1. Multiply. Factor. ( )( ) x x 2 2 1 3 + + sum Factoring Polynomials Objective - To factor trinomials in the form, Factoring...
  • MMU CPD Cluster Twilight Developing high quality inclusive

    MMU CPD Cluster Twilight Developing high quality inclusive

    MMU CPD Cluster Twilight Developing high quality inclusive provision through ITT placements at Priestnall School By Gareth D Morewood Director of Curriculum Support, Priestnall School, & Rachel J Wyllie and Emily J Watson MMU PGCE Students 17th January 2013 *...
  • Transformational Invariants Recognizing Patterns Perspective ...

    Transformational Invariants Recognizing Patterns Perspective ...

    Palindromes/ Symmetry Segmental Elements BOB STATS Metrical Schemes 000- - - 000 Size/Gender Frequency Comprehension Time Domain and Frequency Domain Representations of the syllable /ma/ Blue=Original Red=Compressed(75%) Spectral Compression TFT ALGORITHM (Hurtig & Turner, Patented 2003) Compressed Environmental Sounds Compressed...
  • Redefining Writing in the Twenty-first Century:

    Redefining Writing in the Twenty-first Century:

    Tertiary Audience: As with the secondary audience, the tertiary audience will need to see the pros of changing something that is standard and will need to see that the benefits of changing how we write will be worth the change....
  • Unmanned Aircraft Systems (UAS) 101 Presented to: National

    Unmanned Aircraft Systems (UAS) 101 Presented to: National

    Part 61 certificate holders can take online training at faasafety.gov instead of the knowledge exam. Must undergo TSA background security screening. UAS operators who fly under the small UAS rule must obtain a Remote Pilot Certificate (RPC). The Remote Pilot...