# Chapter 11 - Association Rules - Washburn University

Chapter 13 Association Rules Data Mining for Business Intelligence Shmueli, Patel & Bruce 1 What are Association Rules? Study of what goes with what Customers who bought X also bought Y What symptoms go with what diagnosis Transaction-based or event-based Also called market basket analysis and affinity analysis Originated with study of customer transactions databases to determine associations among items purchased 2 Used in many recommender systems 3 Generating Rules 4

Terms IF part = antecedent THEN part = consequent Item set = the items (e.g., products) comprising the antecedent or consequent Antecedent and consequent are disjoint (i.e., have no items in common) 5 Tiny Example: Phone Faceplates 6 Many Rules are Possible For example: Transaction 1 supports several rules, such as If red, then white (If a red faceplate is purchased, then so is a white one) If white, then red If red and white, then green + several more 7 Frequent Item Sets Ideally, we want to create all possible

combinations of items Problem: computation time grows exponentially as # items increases Solution: consider only frequent item sets Criterion for frequent: support 8 Support Support = # (or percent) of transactions that include both the antecedent and the consequent Example: support for the item set {red, white} is 4 out of 10 transactions, or 40% 9 Apriori Algorithm 10 Generating Frequent Item Sets For k products 1. User sets a minimum support criterion 2. Next, generate list of one-item sets that meet the support criterion

3. Use the list of one-item sets to generate list of two-item sets that meet the support criterion 4. Use list of two-item sets to generate list of three-item sets 5. Continue up through k-item sets 11 Measures of Performance Confidence: the % of antecedent transactions that also have the consequent item set Lift = confidence/(benchmark confidence) Benchmark confidence = transactions with consequent as % of all transactions Lift > 1 indicates a rule that is useful in finding consequent items sets (i.e., more useful than just selecting transactions randomly) 12 Alternate Data Format: Binary Matrix 13 Process of Rule Selection Generate all rules that meet specified support & confidence Find frequent item sets (those with sufficient support see above)

From these item sets, generate rules with sufficient confidence 14 Example: Rules from {red, white, green} {red, white} > {green} with confidence = 2/4 = 50% [(support {red, white, green})/(support {red, white})] {red, green} > {white} with confidence = 2/2 = 100% [(support {red, white, green})/(support {red, green})] Plus 4 more with confidence of 100%, 33%, 29% & 100% If confidence criterion is 70%, report only rules 2, 3 and 6 15 All Rules (XLMiner Output) Rule # 1 2 3 4 5 6 16 Conf. % Antecedent (a)

100 Green=> 100 Green=> 100 Green, White=> 100 Green=> 100 Green, Red=> 100 Orange=> Consequent (c) Red, White Red Red White White White Support(a) 2 2 2 2 2 2 Support(c) 4 6 6 7 7 7

Support(a U c) 2 2 2 2 2 2 Lift Ratio 2.5 1.666667 1.666667 1.428571 1.428571 1.428571 Interpretation Lift ratio shows how effective the rule is in finding consequents (useful if finding particular consequents is important) Confidence shows the rate at which consequents will be found (useful in learning costs of promotion) Support measures overall impact 17

Caution: The Role of Chance Random data can generate apparently interesting association rules The more rules you produce, the greater this danger Rules based on large numbers of records are less subject to this danger 18 Example: Charles Book Club ChildBks YouthBks CookBks DoItYBks RefBks ArtBks GeogBks ItalCook ItalAtlas ItalArt Florence

0 1 0 1 0 0 1 0 0 0 0 1 0 0 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

0 0 0 1 1 1 0 1 0 1 0 0 0 0 0

0 1 0 0 0 1 0 0 0 0 1 0 0 0 0

1 0 0 0 0 1 0 1 0 0 0 0 0 0 0

0 0 0 1 0 0 1 0 0 0 0 0 0 1 0

0 1 0 0 0 0 0 0 0 1 1 1 0 0 0

1 0 0 0 0 0 0 0 0 0 0 0 0 0 0

0 0 0 1 0 0 0 1 0 0 0 0 Row 1, e.g., is a transaction in which books were bought in the following categories: Youth, Do it Yourself, Geography 19 XLMiner Output

Rule # 1 2 3 4 5 6 7 8 9 10 11 12 Conf. % Antecedent (a) 100 ItalCook=> 62.77 ArtBks, ChildBks=> 54.13 CookBks, DoItYBks=> 61.98 ArtBks, CookBks=> 53.77 CookBks, GeogBks=> 57.11 RefBks=> 52.31 ChildBks, GeogBks=> 60.78 ArtBks, CookBks=> 58.4 ChildBks, CookBks=> 54.17 GeogBks=> 57.87 CookBks, DoItYBks=> 56.79 ChildBks, DoItYBks=> Consequent (c) CookBks

GeogBks ArtBks GeogBks ArtBks ChildBks, CookBks ArtBks DoItYBks GeogBks ChildBks, CookBks GeogBks GeogBks Support(a) 227 325 375 334 385 429 390 334 512 552 375 368 Support(c) 862 552 482

552 482 512 482 564 552 512 552 552 Support(a U c) 227 204 203 207 207 245 204 203 299 299 217 209 Rules arrayed in order of lift Information can be compressed e.g., rules 2 and 7 have same trio of books 20 Lift Ratio

2.320186 2.274247 2.246196 2.245509 2.230964 2.230842 2.170444 2.155264 2.115885 2.115885 2.096618 2.057735 Summary Association rules (or affinity analysis, or market basket analysis) produce rules on associations between items from a database of transactions Widely used in recommender systems Most popular method is Apriori algorithm To reduce computation, we consider only frequent item sets (=support) Performance is measured by confidence and lift Can produce a profusion of rules; review is required to identify useful rules and to reduce redundancy 21

## Recently Viewed Presentations

• PARA VI What is a good synonym for update, follow? ... COMMERCE COMMERCIAL UNCOMMERCIAL REVOLUTION REVOLUTIONARY REVOLUTIONISE REPUTE REPUTATION REPUTED REPUTABLE DISREPUTABLE COMFORT DISCOMFORT COMFORTABLE UNCOMFORTABLE RETAIL RETAILING RETAILER * Breaking into a new market (p. ...
• Relationship and Addiction. In the end, addiction destroys relationship - with self, with others and with community. Recovery has to involve relationship building - otherwise it dooms those in recovery to being, at best 'dry drunks'.
• Use of Social Media and the Case of Daesh:An Information Warfare Perspective. Joseph Shaheen. NATO STRATCOM Fellow. Prepared For OUSMCC, June 2016
• Snapshot: California's Early Childhood Workforce. October 17, 2020. Lea J.E. Austin, Ed.D. Co-director. Center for the Study of Child Care Employment
• Book Antiqua Arial Lucida Sans Wingdings 2 Wingdings Wingdings 3 Calibri Times New Roman Apex 1_Apex OLCOTT MEMORIAL HIGH SCHOOL Overview Background Initiatives Teachers: The School's Lifeline Improved health and nutrition Updates for 2009 - 2010 Updates for 2009 -...
• This 19 year old girl had a two-week history of a painless swelling in the left jugulo digastric region. FNAC demonstrated benign squamous cells, cellular debris and cholesterol crystals. CT scan demonstrated a well circumscribed cystic mass, anterior to the...
• Here is the multiplying effect on our dollars from the two available matches. Each \$1 we raise is matched by one from this foundation up to \$10,000.
• ATMOSFÄÄR (ÜLDKÜSIMUSED)‏ LOENG 12-13 04. - 11. MAI 2007 ATMOSFÄÄR Maad ümbritsev gaasiline kiht (ülapiir ei ole täpselt määratav) (Meteoroloogias ülapiiri kõrguseks 1000 - 1200 km) Kihilise ehitusega temperatuuri muutumise järgi vertikaalsuunas: troposfäär 7 - 17 km kõrgusele; meie laiuskraadil...