IBM Research Whats in your wallet? Opportunity modeling approaches and applications Claudia Perlich Chief Scientist Formerly: IBM Research Collaborators: Saharon Rosset, Rick Lawrence, Srujana Merugu, et al. 2006 IBM Corporation Predictive Modeling Group Mathematical Sciences IBM Research Publications & Recognition 2009 Finalist in the INFORMS Edelman competition 2007 Data Mining Practice Prize at KDD 2007, Predictive modeling for marketing, Runner Up 2007 IBM Outstanding Technical Award, Opportunity models and validation for the Market Alignment Program (MAP) 2005 IBM Research Award for contributions to Market Alignment Program (MAP) Operations Research Improves Sales Force Productivity at IBM R. Lawrence, C.Perlich, S.Rosset, et al. Forthcoming INFORMS Journal on Computing Analytics-driven solutions for customer targeting and sales force allocation, J. Arroyo, M. Callahan, M. Collins, A. Ershov, I. Khabibrakhmanov, R. Lawrence, S.Mahatma, M. Niemaszyk, C. Perlich, S. Rosset, S. Weiss. IBM Systems Journal 46 (4) (2007) A Data Mining Case Study: Analytics-driven solutions for customer targeting and sales force allocation R. Lawrence, C. Perlich, S. Rosset, I. Khabibrakhmanov, S. Mahatma, S. Weiss. Second Workshop on Data Mining Case Studies and Practice Prize at SIGKDD 2007 High Quantile Modeling for Customer Wallet Estimation with Other Applications Perlich, C., S. Rosset, R.

Lawrence, and B. Zadrozny, 13th SIGKDD International Conference on Knowledge Discovery and Data Mining 2007 Quantile Modeling for Marketing, Perlich, C., S. Rosset and B. Zadrozny. Workshop on Data Mining for Business Applications at 12th SIGKDD International Conference on Knowledge Discovery and Data Mining 2006 A New Multi-View Regression Approach with an Application to Customer Wallet Estimation Merugu, S. S.Rosset and C. Perlich. 12th SIGKDD International Conference on Knowledge Discovery and Data Mining 2006 Wallet Estimation Models Rosset, S., C. Perlich, B. Zadrozny, S. Merugu, S. Weiss and R. Lawrence. International Workshop on Customer Relationship Management: Data Mining Meets Marketing, NYU 2005 Modeling Quantiles Perlich, C., S. Rosset and B.Zadrozny. In Encyclopedia of Data Warehousing and Mining, Second Edition Copyright IBM Corporation 2010 Presentation Outline Wallet Definitions and Business Considerations Modeling Approaches Evaluation of Wallet Models Business Impact Market Alignment Project (MAP) 2011 IBM Corporation What is Wallet/Opportunity? Total amount of money that the customer (company) can spend in a certain product category in a given period Company Revenue Company Revenue IT Wallet IBM Sales

IBM sales IT wallet Company revenue 2011 IBM Corporation Why Are We Interested in Wallet? Customer targeting Focus on acquiring customers with high wallet Evaluate customers growth potential by combining wallet estimates and sales history For existing customers, focus on high wallet, low share-of-wallet customers Sales force management Make resource assignment decisions Concentrate resources on untapped Evaluate success of sales personnel and sales channel by share-ofwallet they attain 2011 IBM Corporation Wallet Modeling Challenge The customer wallet is never observed Nothing to fit a model Even if you have a model, how do you evaluate it? Need a predictive approach from available data Firmographics (Sales, Industry, Employees) IBM Sales and transaction history 2011 IBM Corporation

Existing Approaches to Wallet Modeling Bottom up: learn a model for individual companies Get true wallet values through surveys Very expensive Small, typically not representative sample Unreliable because ill defined Coarse level of IT categories Top down: this approach was used by IBM Market Intelligence in North America (called ITEM) Use econometric models to assign total opportunity to segment (e.g., industry geography) Assign to companies in segment proportional to their size Completely Ad hoc without any validation 2011 IBM Corporation Multiple Wallet Definitions TOTAL: Total customer available budget in the relevant area (e.g., total IT) Can we really hope to attain all of it? SERVED: Total customer spending on IT products covered by IBM Better definition for our marketing purposes REALISTIC: IBM spending of the best similar customers REALISTIC SERVED TOTAL Company Revenue TOTAL

SERVED REALISTIC IBM Sales 2011 IBM Corporation We formulate the problem as Quantile Estimation Imagine 1,000 customers with identical customer features Consider the distribution of the IBM Sales to these customers: Best Customers IBM Sales Opportunity is High Quantile 2011 IBM Corporation Slide 9 Formally: Percentile of Conditional Distribution of IBM sales s to the customer given customer attributes x: s|x ~ f,x E(s|x) REALISTIC Two obvious ways to get at the pth percentile:

Estimate the conditional by integrating over a neighborhood of similar customers Take pth percentile of spending in neighborhood Create a global model for pth percentile Build global regression models, e.g., 2011 IBM Corporation s | x ~ N ( x, 2 ) Overview of analytical approaches Ad HOC kNN Optimization Quantile Regression Decomposition -Industry - Size General kNN -K - Distance - Features Model Form - Linear - Decision Tree

- Quanting Evaluation and Validation - Quantile Loss - MAP Feedback 2011 IBM Corporation - Linear Model - Adjustment Universe of IBM customers with D&B information K-Nearest Neighbor Distance metric: Industry match Industry Euclidean distance on firmographics and past IBM sales Target company i Employees Neighborhood sizes (k): Neighborhood of target company

Quantile of firms in the neighborhood Frequency Neighborhood size has significant effect on prediction quality Prediction: Re v en ue Scaling issung Wallet Estimate IBM Sales 2011 IBM Corporation Global Estimation: the Quantile Loss Function The mean minimizes a sum of squared residuals: n min ( y i ) 2 i 1 The median minimizes a sum of absolute residuals.

n min m | y i m | i 1 4 The p-th quantile minimizes an asymmetrically weighted sum of absolute residuals: p=0.8 n 3 min yi Lp ( yi , y i ) p=0.5 (absolute loss) 1 0 if y y p ( y y ) L p ( y, y ) (1 p) ( y y ) if y y 2 i 1

-3 2011 IBM Corporation -2 -1 0 1 2 3 Quantile Regression Traditional Regression: Estimation of conditional expected value by minimizing sum of n squares: 2 min (y i

f ( xi , )) i 1 Quantile Regression: Minimize Quantile loss: n min Lp ( yi , f ( xi , )) i 1 if y y p ( y y ) L p ( y, y ) (1 p) ( y y ) if y y quantile regression loss function Implementation: assume linear function 2011 IBM Corporation y x , solution using linear programming

Linear Quantile Regression (Koenker) 9 8 IBMRevenue Revenue IBM 7 6 Opportunity for C 2 Opportunity for C 1 Opportunity for C 1 5 4 C2 3 C C1 2 1 10

20 30 40 50 60 Company Firm Sales Sales 70 80 if y y p ( y y ) L p ( y, y ) (1 p) ( y y ) if y y 2011 IBM Corporation Slide 15 Quantile Regression Tree Motivation: Identify a locally optimal definition of neighborhood Inherently nonlinear Adjustments of M5/CART for Quantile prediction:

Predict the percentile rather than the mean of the leaf Splitting/pruning criteria: Quantile or squared error loss? Industry = Banking no yes Sales<100K Frequency Frequency yes no IBM Rev 2003>10K Wallet Estimate Wallet Estimate yes no IBM Sales

2011 IBM Corporation Wallet Estimate IBM Sales Frequency Frequency IBM Sales Wallet Estimate IBM Sales Quanting Transform the quantile regression into a series of classification non-linearity, if non-linear classifiers are used theoretical guarantee: if the classifiers minimize the expected classification error, the quanting algorithm minimizes the quantile loss Training Each classifier is trained to decide whether or not the conditional quantile is above a threshold T Original observations are re-labeled and re-weighted to train each classifier appropriately similar to the quantile loss Prediction Find the threshold where the classifier predictions switch from one to zero

C1000 C2000 C3000 C4000 C5000 C6000 1 1 0 0 0 0 1 1 1 0 0 0 2011 IBM Corporation Prediction 250 350 (Graphical model approach to SERVED Wallets) Company firmographics SERVED Wallet IT spend with IBM

Historical relationship with IBM Wallet is unobserved, all other variables are Two families of variables --- firmographics and IBM relationship are conditionally independent given wallet We develop inference procedures and demonstrate them Theoretically attractive, practically questionable 2011 IBM Corporation Empirical Evaluation of Quantile Estimation Setup Four domains with relevant quantile modeling problems Performance on test set in terms of 0.9 quantile loss Approaches: Linear quantile regression, Q-kNN, Quantile trees, Bagged quantile trees, Quanting Baselines Best constant Traditional regression models for expected values, adjusted under Gaussian assumption (+1.28) 2011 IBM Corporation 19

Performance on Quantile Loss Best result in BOLD, variance in parenthesis Observations Regression + 1.28 is not competitive (because the residuals are not normal) Splitting criterion is irrelevant Q-kNN is not competitive Quanting (using decision trees) and bagged quantile tree perform comparably 2011 IBM Corporation 20 Additional Insights Irrelevance of splitting criterion Good news! Because squared error is much more efficient Reason: SSE measures the decrease of the conditional variance SSE measures the goodness of the local neighborhood Good estimate of the conditional distribution -> good quantile Linear model does well on IBM and KDD-CUP98 Match of model bias Both domains have strong autocorrelation Last years donation/revenue is a great predictor of this years Hard for tree-based models to express linear relationships 2011 IBM Corporation

21 Evaluating REALISTIC Wallet We still dont know the truth Quantile loss only evaluates the ability to predict quantile but is a quantile a good wallet? Which quantile 80%, 90%, 99%? Distribution is highly skewed Most error measure are very sensitive to outliers What is the right scale ? Log? Even good survey data is not the truth Not available on a IBM product level Probably irrelevant for the REALISTIC wallet 2011 IBM Corporation MAP: Market Alignment Program Re-deploying IBM sales resources Old Sales Process New Sales Process Using MAP ... Use prior-year revenue as proxy for future revenue generation Use OR models to develop forward-looking view of

opportunity by client Assign quota based largely on recent revenue history Assign quota based on future opportunity and productivity Focused on Existing Relationship Slide 23 Focused on Future Opportunity Predictive Modeling Group Mathematical Sciences IBM Research The MAP process and components MAP Workshops IBM Sales Team Interviews MAP Web Interface Model Estimates Expert Feedback

Modeled Opportunity MAP Models Integrated Data Data Model Validated Opportunity Realign Sales Resources Slide 24 Copyright IBM Corporation 2010 Predictive Modeling Group Mathematical Sciences IBM Research Explanatory features are extracted from multiple sources Dun & Bradstreet (D&B) Data IBM Client Transactions

Entity Matching Feature Extraction D&B Features Industry Revenue (Rank) Employees State D&B Structure Code IBM Transactional Features Prior-year revenue in other product brands Long-term revenue in other product brands Train model against current year revenue based on previous year Apply model by rolling forward to current year and predicting future opportunity Slide 25 Copyright IBM Corporation 2010 Predictive Modeling Group Mathematical Sciences IBM Research MAP Validation and Expert Feedback

Expert-validated Opportunity (log) Expert Validates Opportunity Expert Feedback 20 Experts accept opportunity (45%) 18 16 Increase (17%) 14 12 Experts change opportunity (40%) 10 Decrease (23%) 8 6

4 2 0 0 2 4 6 8 10 12 14 16 18 20 Experts reduced opportunity to 0 (15%)

MODEL_OPPTY Opportunity ModelkNN Opportunity (log) Slide 26 Copyright IBM Corporation 2010 Predictive Modeling Group Mathematical Sciences IBM Research Observations Many accounts are set for external reasons to zero Exclude from evaluation since no model can predict the competitive environment Exponential distribution of opportunities Evaluation on the original (non-log) scale suffers from huge outliers Experts seem to make percentage adjustments Consider log scale evaluation in addition to original scale and root as intermediate Suspect strong anchoring bias, 45% of opportunities were not touched

Copyright IBM Corporation 2010 Evaluation Measures Different scales to avoid outlier artifacts Original: e = model - expert Root: e = root(model) - root(expert) Log: e = log(model) - log(expert) Statistics on the distribution of the errors Mean of e2 Mean of |e| Total of 6 criteria 2011 IBM Corporation Model Comparison Results We count how often a model scores within the top 10 and 20 for each of the 6 measures: Model Rational

DB2 Tivoli Displayed Model (kNN) 6 6 4 5 6 6 Max 03-05 Revenue 1 1 0 3 1

4 Linear Quantile 0.8 5 6 2 4 3 5 Regression Tree 1 3 2 4 1 2

Q-kNN 50 + flooring 2 3 6 6 4 6 Decomposition Center 0 0 3 5 0 4

Quantile Tree 0.8 0 1 2 4 1 4 2011 IBM Corporation (Anchoring) (Best) MAP Experiments Conclusions Q-kNN performs very well after flooring but is typically inferior prior to flooring 80th percentile Linear quantile regression performs consistently well (flooring has a minor effect) Experts are strongly influenced by displayed opportunity (and displayed revenue of previous years) Models without last years revenue dont perform well

Use Linear Quantile Regression with q=0.8 in MAP 06 2011 IBM Corporation Predictive Modeling Group Mathematical Sciences IBM Research Scope and some of the tedious details 3 Million customers 20 Brands (Product categories) 4 Markets Annual model refresh The Quantile is chosen for each brand and market separately based on market insights on IBM market share Whitespace model for customers with no prior IBM revenue are build using the same methodology but only D&B features Entity matching between IBM customer records and D&B hierarchy is HARD Evaluation remains somewhat subjective and we collect feedback Slide 31 Copyright IBM Corporation 2010 Predictive Modeling Group Mathematicaland Sciences IBM Research In 2008 MAP

covered 50+ countries ~100% of IBM revenue and opportunity g 2005 2006 2007 , 2008 Resources shifted to high growth Markets and Accounts Shifted resources performed >10 pts better Slide 32 Copyright IBM Corporation 2010 Validated Revenue Opportunity Modeling Group segmentation Mathematical Sciencesand

IBM resource Research MAP outputPredictive drives account allocation decisions Invest High growth potential Opportunistic Small Accounts Core Growth Modest growth potential Sellers shifted Resource implications Shift resources to Core Growth and Invest Accounts Reduce resource overlap 8,000 sellers shifted

(2006 2009 ) Core Optimize Flat or declining Prior Year Actual Revenue Slide 33 Copyright IBM Corporation 2010 Predictive Modeling Group Mathematical Sciences IBM Research Validated Revenue Opportunity MAP drove significant revenue impact in 2008 Invest Core Growth $53B of Revenue 3,000 sellers shifted (2008) 30,000 sellers Opportunistic

Core Optimize $9B of Revenue Prior Year Actual Revenue [3,000 Sellers] x [$2M Revenue / Seller] x [10% Performance Improvement] = $600M (2008 Revenue Impact) Slide 34 Copyright IBM Corporation 2010 Predictive Modeling Group Mathematical Sciences IBM Research MAP Take away Interesting predictive modeling task that calls for an unorthodox loss function Combination of data mining AND expert feedback Integration into the annual sales management cycle Significant effort on data collection and preparation Many additional analytical tools were build on top of MAP Territory definition and assignment Quota assignment Substantial impact on the bottom line Copyright IBM Corporation 2010

Predictive Modeling Group Mathematical Sciences IBM Research Questions? Copyright IBM Corporation 2010