Data science for service change - DataSF | Office of the ...

Data science for service change - DataSF | Office of the ...

Data science for service change Presented by DataSF | datasf.org/science City and County of San Francisco What is data science? Data Science Service Change Applying advanced statistical tools to existing data to generate new insights Converting new data insights into (often small) changes to business processes Smarter Work More efficient and effective use of staff and resources What complements data science? (and is really good stuff to do) Approach

Process Outcome Examples Performance Management Define, visualize, often using dashboards, and manage to KPIs Meet goals and KPI targets SF Scorecard, PublicWorks Stat & Stat starter kit Evaluation Assess a project, program or policy design or results Better investment of

resources; Better policy decisions Evaluation of transitionalkindergarten in SF Policy Analysis Define and assess alternatives using a broad range of tools Report or memo with policy or program recommendations Shape Up SF Policy Analysis Easier data sharing and reporting, new tools or services built on data SFPUC Adopt a Drain Open Data Publish civic data for

use by the City and the public Smarter work on the ground in real time See rest of deck! DataScienceSF Identify insights using advanced statistics tied to a service change What complements data science? (and is really good stuff to do) Approach Performance Management Evaluation Policy Analysis Open Data

DataScienceSF All approaches can lead to service improvement. Its about choosing the right tool for the job (and sometimes combining them)! Whats in the DataScienceSF Toolkit? Statistical Methods Sentiment analysis Tools Time series analysis Multilevel modeling Survival analysis AB testing User Experience Research Data mining Missing data imputations Pattern recognition

Machine learning Propensity score matching Classification and clustering Logistic, multinomial and multiple linear regression techniques Principal component and factor analysis Forecasting Network analysis Whats in the DataScienceSF Toolkit? Statistical Methods Languages Python R SQL Javascript NodeJS Libraries

SciPy Pandas Scikit-learn GPText OpenNLP Mahout +many others Tools User Experience Research Data Engineering Profiling ETL Job notices APIs Optimized data pipelines Optimized data storage/access Visualization D3.js Gephi R Leaflet PowerBI

ggplot2 shiny Whats in the DataScienceSF Toolkit? Statistical Methods Iterative Prototyping Tools Photo journaling and documenting User Experience Research Service blueprinting Journey mapping Ride-alongs Ethnographic field research and user observation Process mapping

Usability testing What is NOT data science? This Not that Service change Academic research Small changes Major overhauls / service disruptions Use existing data Collecting new data (mostly ;) Data Science Project Types Project Type: Find the needle in the haystack What to target?

Data Science Service Change Target areas Target categories Target individuals Service Issue: Difficult to identify targets in a population Data Science Process: Use existing data and predictive modeling to identify targets Service Change: Engage with target subset of population Result: Department resources are spent where most needed Examples: Free fire alarms in New Orleans Service Issue Fire alarms to homes that have them

Data Science ID homes with high prob. of no alarm Service Change Use list to shape outreach Result 2x increase in hit rate Service Issue Data Science New Orleans Fire Alarms New Orleans Fire Department (Nola FD) distributes free fire alarms to homes. But many homes they visited already had them, wasting Nola FDs resources. Nolas analytics team used public

data to identify homes with a high probability of not having a fire alarm and provided Nola FD with a list. Nola FD used the list to determine where to offer fire alarms. With no increase in resources or patrols, Nola FD increased the hit rate of homes needing smoke alarms by 2x. New York City Tax Compliance Examples: Find the needle in the haystack Service Change Result

New York City (NYC) conducts corporate tax audits. They are time consuming and 37% have no findings. They want to increase findings but maintain their number of audits. NYC analyzed historical audit records and identified patterns of businesses. Outliers were flagged as possible audit targets. The audit team targeted the flagged cases for audits. With the same staff levels, the audit team decreased the percent of cases

with no finding from 37 to 22%, leading to increased revenues. Project Type: Prioritize your backlog What to prioritize? Service Issue: Backlog is tackled via first in, first out (FIFO) Data Science Data Science Process: Create a model to categorize and group past and current cases Service Change Service Change: Prioritize cases based on categories in order of risk, need or opportunity Result: Department addresses high priority cases first

Examples: Blight backlog in New Orleans Service Issue Backlog in blight enforcement Data Science Use data to grade cases per prior decisions Service Change Result created abatement tool Result 1500+ case backlog gone in 100 days Examples: Prioritize your backlog Service Change Result Boston Complaints Data Science In Boston, they have a large list of residences with

anti-social complaints filed against them. The analytics team pooled data from housing, police, and tax agencies to gauge the nature of complaints and identify the biggest contributors to complaints. The Air Pollution Control Commission expedited enforcement with the biggest contributors. With no change in resources, Boston saw a 55% reduction in police calls associated with the targeted

residences. New Orleans Blight Service Issue New Orleans (Nola) faced a significant backlog in blight enforcement due in part to bottlenecks in the decision making process and missing information. Nola used data on the outcomes of previous blight cases to grade cases in the backlog and to recommend additional data to collect by field teams. The enforcement

team used the results as an abatement decision tool to speed the decision-making process of whether to demolish or foreclose a home. Nola eliminated the 1,500+ case backlog in less than 100 days. Project Type: Flag stuff early How to detect? Service Issue: Hard to predict future condition which leads to reactive services Data Science Data Science Process: Use historical and current data to create estimate ranges for

potential outcomes Service Change Service Change: Use estimates to change and tailor intervention points Result: Department provides pro-active early interventions Examples: Use of force alerts in Charlotte Service Issue Excessive force have neg. impact on community Data Science Identify patterns to refine early warning Service Change Flagged recurring complaints Result Accuracy up 20%; False positives down 55% Examples: Flag stuff early Service Change

Result Charlotte Police Violence Data Science Excessive force violations by police officers have huge negative repercussions in the community and for police careers. The analytics team refined an early warning system, identifying patterns that often led to officers having negative interactions with the public. The department flagged recurring complaints against

officers and notified supervisors when certain thresholds were reached. The CMPD system increased accuracy by 15-20% while reducing false positives by 55%. Lead Poisoning in Chicago Service Issue In Chicago, a large number of children are thought to be exposed to lead paint in older houses. The analytics team built a model of exposure using data on homes, history of childrens

exposure at that address and conditions of neighborhood. They conducted targeted inspections and provided remediation funding to homes identified in the model. Chicago reached the most vulnerable families before severe health effects from lead contamination manifest. Project Type: A/B test something Which form? Data Science 62%

respond Service Issue: Costly outreach methods are not tested before implementation Service Change 78% respond Data Science Process: Statistical testing on outreach methods to identify which, when, and to whom to send Result: Department increases response rates Service Change: Use statistically validated outreach method Examples: NYC Summons Redesign Service Issue 40% cited no-show

leading to costly arrest Data Science Redesigned and tested summons form Service Change Deployed new form and rescheduled timelines Result Currently evaluating impact Service Issue Data Science NOLA Community Health Program In New Orleans, they have a low take up rate of free primary care appointments. The analytics team tested different SMS reminders to those eligible for

appointments. The department implemented the most successful SMS text. 60% increase in clients using free primary care appointments NYC Summons Redesign Examples: A/B test something Service Change Result 40% of those cited for low-level violations did not take required next steps, leading to issuance of arrest warrants.

Experiment and test redesign of summons process Reschedule court timelines to facilitate greater access Evaluating impact on use of costly arrest warrants (Project currently in progress) Project Type: Optimize your resources How to distribute? Service Issue: Difficult to identify where to place or distribute resources to be most effective Data Science Data Science Process: Use geospatial and/or

other data to identify optimal distribution of resources Service Change Service Change: Re-allocates resources to optimal distribution Result: Department decreases response times; increases volume Examples: Chicago Pest Control Service Issue Challenging to predict outbreaks Data Science Analyze data associated with outbreaks Service Change Proactive targeting of leading indicators Result 15% drop in requests for service Examples: Optimize your resources Service Change

Result Chicago Pest Control Data Science Chicagos rodent baiting program finds it challenging to predict rodent outbreaks and locations leading to spikes in 311 complaints. Predicted potential danger of outbreaks by using leading indicators and other data correlated with previous outbreaks. Directed rodent baiting to areas identified by

leading indicators, including events, like water main breaks. Resident requests for rodent control services dropped by 15% NOLA Ambulance Stand-by Location Service Issue In New Orleans, ambulance standby locations are chosen based on dispatcher habits or instincts. Analytics team used city wide analysis of data on accident patterns, traffic patterns, and crew readiness to

identify optimal standby locations Ambulances deployed at new optimized locations Targeting short response times to EMS calls (Project currently in progress) What was the service change? From that To This Fire Alarms Random List Prioritized List Blight Staff evaluates all cases

Tool evaluates easy cases Early Warning Focus on that set of officers Focus on this set of officers Summons Send Original Form Send new form Control Arrive at location X too late Arrive at location X early Service Change = Small Business Process Change Summary: The five project types Find the needle in the haystack Prioritize your backlog

Some combination Flag stuff early A/B test something Optimize your resources Something else DataScienceSF Cohort 1 ASR: Increase property tax revenues Service Issue When a property sells in SF, we either accept the sales price or modify it to collect property taxes. So which sales should you accept and which should you dig into? Data Science Our regression model identifies which sale prices are unusual for the location, time and property details http://www.markersf.com/blog/ Service Change The model splits properties into two lists: normal sale prices to enroll directly in tax collection and outlier sales for manual review by appraisers Result

Expected: Increased revenue and time to revenue, reduced backlog, and more consistency in assessments Prioritize your backlog Full write up at datasf.org/showcase/datascience/ Evictions: Pro-actively prevent evictions Service Issue How can we make eviction prevention more proactive by identifying the most problematic eviction notices in real time? Data Science An algorithm combines data sources to identify eviction notice filings that are outside the norm Service Change A list of flagged eviction notices is sent to eviction prevention services to proactively review for service outreach Result Expected: Targeted eviction prevention that keeps residents in their homes Find the needle in the haystack Flag stuff early

Full write up at datasf.org/showcase/datascience/ ENV: Find new clients to help green our City Service Issue SF Environment offers financial incentives and technical assistance to help our constituents upgrade their lighting & refrigeration systems. But their list of leads is dwindling - how can they find new leads? Data Science Mashed together multiple data sources to identify characteristics of stronger leads Service Change New and longer list of property leads with enriched data for targeting marketing campaigns Result Expected: New customers and increased uptake of green subsidies Find the needle Optimize your resources in the haystack Full write up at datasf.org/showcase/datascience/ DPH WIC: Help moms and babies stay in nutrition program Service Issue Since 2011, DPH has seen an increase in mothers dropping out of their nutrition program. Which moms

are most at risk of dropout? Data Science Built a predictive model that identified moms and infants who are at greatest risk for dropping out Service Change Using the high-risk client profiles to conduct targeted interviews to identify program barriers and make service changes Result Expected: Reduce the dropout rate of moms, infants and children, leading to healthier outcomes for both Flag stuff early Full write up at datasf.org/showcase/datascience/ DPH BHS: Improve results and reduce costs in mental health care Service Issue A small fraction of mental health patients use a large % of resources. Can we identify high users early to improve their outcomes and reduce costs? Data Science Build predictive model to identify clients at greatest risk for becoming high users Service Change Expected: Targeted service model to direct high users to more stable and preventative services

Result Expected: Reduction in high cost clients and use of high cost emergency services Find the needle in the haystack Flag stuff early TTX: Increase response to tax letter Service Issue TTX wanted to use behavioral economics and A/B test to increase effectiveness of collection letter for unsecured personal property (a difficult type to collect on). Data Science DataSF helped organize a Behavioral Insights Training (BIT) workshop and provided guidance on A/B test Service Change Use whichever letter gets the best response Result Improved response rate by 17%. TTX continuing to apply BIT principles to other taxpayer communications A/B test something Full write up at datasf.org/showcase/datascience/

ART: Preserve City art for the future Service Issue The Arts Commission needs to accurately and efficiently project long-term costs to budget for art preservation Data Science Revised cost formula and new tool to provide long-term projections and prioritization of conservation projects on demand Service Change Use tool to model cost scenarios instead of manual, one time process Result Expected: Reduction in staff time, more accurate cost estimates, and earlier identification of pieces in need of conservation Optimize your resources Full write up at datasf.org/showcase/datascience/ Overview of Phases Cohort 2: Jan June Solicitation Oct Nov Selection Nov 22

Nov 27 Dec 13 Application due Project refining Dec 13 Dec Notify applicants Present January - May Analysis & service change June Phase: Solicitation Opportunities to learn more Brown bags Office hours Invited presentations Dates at datasf.org/science

April May May May Mid May June July - November Dec Phase: Solicitation How to prepare Brainstorm projects using the project types Identify possible service changes Review data that could help Identify key staff members Learn more at datasf.org/science April May May May

Mid May June July - November Dec Phase: Application Available at datasf.org/science Brief online form Problem statement (200 word max) Impact statement (100 words max) Service change statement Data overview Project champion April May May May Mid

May June July - November Dec Phase: Application Criteria to keep in mind Above all else: A viable path to service change Question / problem answerable by data science Solvable within cohort time frame Impact Department commitment Data readiness April May May May

Mid May June July - November Dec Phase: Selection Process Initial review Criteria assessment Application scoring Department follow-ups, as needed Be available for questions (email or in person) Estimating 5-10 projects per Cohort April May May May Mid May

June July - November Dec Phase: Winners Announced And gentle off-ramps for the rest Some projects may not be appropriate for data science or for our timeline. We will help identify other opportunities that may be a better fit: Civic Bridge pro bono opportunities via the Mayors Office of Civic Innovation STIR startup technology engagements via the Mayors Office of Civic Innovation DataSF Dashboarding Services Controller's Performance Unit Data Academy classes External Data Science groups or volunteers Other technical assistance April May

May May Mid May June July - November Dec Phase: Project refining During this phase, we will: Meet to refine the scope Optionally, do initial site visits/interviews Prepare data for analysis Outputs Project charter Data exchanges and agreements, as needed April May

May May Mid May June July - November Dec Phase: Analysis and service change During this phase, we will: Conduct site visits, ride-alongs and interviews, as appropriate Conduct iterative analysis Implementation testing Handoff and training April May May May

Mid May June July - November Service Analysis Plan Review Dec Phase: Analysis and service change What DataSF Brings Statistical Methods Tools User Experience Research Issue expertise What You Bring A good question & data

Project champion Final Product is Algorithm + Tool: Algorithms that are scripted and automated (real time if needed) tied to some service change tool (e.g. list, service, alert) implemented together and maintained by department Phase: Present (& Disseminate) During this phase, we will: Present and celebrate the results with cohort As appropriate, write an article for DataSF Speaks (datasf.org/blog) and/or other venues Disseminate method and approach (not data) for other departments and cities to learn Data Scientist will continue to be available during office hours for continued support April May May May Mid

May June July - November Dec Visit datasf.org/science At datasf.org/science: This powerpoint 1 pager Sign up for office hours Sign up for brown bag Apply! Other Resources: Civic Bridge THANK YOU @datasf | datasf.org |datasf.org/blog Activity Take 5 minutes by yourself Brainstorm ideas Take your best idea and complete the form With your neighbors Review each top idea and refine/iterate

Report out

Recently Viewed Presentations

  • Roman Republic The Twelve Tables 1. I N

    Roman Republic The Twelve Tables 1. I N

    The Roman Kingdom. What is a kingdom? In Rome, it was a monarchical government that ruled over Rome and its neighbors. There was a Senate, but no elections, only appointments by the King. In 509, the last king (Tarquinius) is...
  • Biology of Malaria Vectors, and Parasite-Vector Relationships

    Biology of Malaria Vectors, and Parasite-Vector Relationships

    HOST, AGENT & ENVIROMENT MALARIA Oleh Nurhalina, SKM,M.Epid Analis Kesehatan UM Palangkaraya * ookinete oocyst gametocytes macrogametocyte microgametocyte zygote salivary glands oocyst with sporozoites sporozoites salivary glands sporozoites midgut infected with oocysts cross section of oocyst Plasmodium Development in An
  • 4A - Find your seat please - West Ada School District / Homepage

    4A - Find your seat please - West Ada School District / Homepage

    I'm not really a fan of Carrie Underwood, but I like to go to the mall, so I went. Everyone wore Carrie Underwood shirts except me because I hate Carrie Underwood. I challenged one of them to a dance battle...
  • Ventilation - Trent Global

    Ventilation - Trent Global

    Partially Centralised Air/Water Systems . Fan Coil Units. Air/Water Systems. Features. Heating/Cooling load met by hot and chilled water distributed through the building . Ventilation fresh air supplied by ductwork system. Moisture added or removed by fresh air system.
  • Forecasting the Income Statement

    Forecasting the Income Statement

    To calculate the interest expense, apply the interest rate to the total debt forecast. Forecasting depreciation and amortization. Will depend on the following. The forecast size (in terms of total assets) of your company.
  • What is culture? Standard 10.1 - The World of Teaching

    What is culture? Standard 10.1 - The World of Teaching

    The word culture, from the Latin colo, -ere, with its root meaning "to cultivate". Culture refers to the universal human capacity to classify, and communicate their experiences symbolically. What is culture? Standard 10.1 Culture - has been called "the way...
  • Paragraph Writing, Week 10

    Paragraph Writing, Week 10

    This will be a multi-paragraph assignment that builds on topics and structures acquired in previous weeks. In this first week, students will brainstorm ideas as a class on three topics. They will then select one these topics and begin to...
  • Very Rare Ancient Roman Erotic Phallic Amulet Large

    Very Rare Ancient Roman Erotic Phallic Amulet Large

    Pompeii Cave Canem - Be Beware of the Dog Triton mosaic, women bathroom floor Mosaic, Herculaneum A mosaic from Pompeii, "If one subscribed to Epicurean philosophy, this mosaic might be appropriate, since Epicureans considered death not an evil but a...