# Income Forecasting Income Forecasting Introduction For some, budgeting while planning for the future can be difficult. In general, people don't know exactly what to expect of their income in the future. Because of this we decided to apply time series prediction algorithms to forecast probable future income. Abstract - This project was decided to be a income forecasting program to find the probable income of a budget given the income history of the budget. - We looked into several algorithms to forecast over a time series including: - Forward-Backward - Moving average model (SMA) - Autoregressive integrated moving average (ARIMA)

- Our goal is to measure the accuracy of their predictions over the time series, as well as compare their time complexity Problem Statement Natural Language: Given income history, forecast the probable future income using different algorithm approaches, while measuring their accuracy and execution time. Formal: The forecasting algorithms are measured by comparing their level of error between the forecasted values and the actual values. This difference between the forecast value and the actual value for the corresponding period is expressed as. where the error (E) at period (i) is the absolute value of forecast (F) at period (i) minus the actual value (Y) at period (i). The lower the value E the more accurate the algorithm is for the given period (i)

Problem Statement(cont.) The mean of these absolute error values, known as the mean absolute error(MAE), is given by the equation: Where the number (n) is the absolute errors calculated and (E) is the absolute error calculated at period (i). The MAE shows the average accuracy of each algorithm for a given set of data for the sake of comparison. Introducing AR, MA and ARMA Auto-Regressive: Model in which Y(t) depends only on its own past values. Y(t) = 1Y(t1) + 2Y(t2) + . . . + pY(tp) + Y(t1Y(t1) + 2Y(t2) + . . . + pY(tp) + ) + 2Y(t2) + . . . + pY(tp) + Y(t2Y(t2) + . . . + pY(tp) + ) + . . . + pY(tp) + t . Moving Average: Model in which Y(t) depends only on the random error terms, which follows the white noise process. Earthquake

Example. Y(t) = t + 1Y(t1) + 2Y(t2) + . . . + pY(tp) + (t1Y(t1) + 2Y(t2) + . . . + pY(tp) + ) + . . . + q(tq). ARMA(p,q) model: Provides simple tool in time series modeling. Pure ARIMA Models General statistical model which is widely used in the field of time series analysis. Given a time series data Y(t), mathematically the pure model is written: Autoregressive Integrated Moving Average(ARIMA) ARIMA is an acronym for AutoRegressive Integrated Moving-Average. The order of an ARIMA model is usually denoted by ARIMA(p,d,q) Where: p (AR)

is the order of the autoregressive part d (I) is the order of differencing q (MA) is the order of the moving-average process Implementing a Time Series Analysis 1Y(t1) + 2Y(t2) + . . . + pY(tp) + ) Visualize the time series 2Y(t2) + . . . + pY(tp) + ) Stationarize the series 3) Plot ACF/PACF charts and find optimal parameters 4) Build the Arima Model 5)Make predictions

Using ARIMA model to forecast values RunTime = 0.05153418 secs Years Income(\$) 2005 200,969 2006

213,189 2007 221,205 2008 215,543 2009 204,402 2010

213,178 2011 228,438 226,425 2012 239,493 236,729 2013

246,313 245,369 2014 250,775 252,692 2015 Forecast 259,003

Simple Moving average model (SMA) In a time series, it takes an average of the most recent (Y) values, for some integer n.This is the so-called moving average model (SMA), and its equation for predicting the value of Y at time t+1Y(t1) + 2Y(t2) + . . . + pY(tp) + based on data up to time t is: n=number of periods in the moving average n can have 3, 5 Periods or more depending on size of the data set y=demand in periods of time Simple Moving Average Cont. Sdfd RunTime =2Y(t2) + . . . + pY(tp) + .706e-06 3 time periods were used for calculations

SMA Complexity= O(n) Years income 2003 178,694 2004 190,253 2005

200,969 2006 213,189 189,972 2007 221,543 201,470 2008

215,543 211,900 2009 204,402 216,758 2010 213,178 213,829

2011 228,438 211,041 2012 239,493 215,339 2013 246,313

227,036 2014 250,775 238,081 2015 Unknown Forecast 245,527

Forward-Backward Algorithm Also known as hidden Markov models(HMM), evaluates the probability of a sequence of observations occurring when following a given sequence of states. This can be stated as: Where A and B are matrices. x= sequence of states y=number of observation I=current state k= is the number of time steps Xj=i=probability of being in state i at time j Forward - Backward (HMM) Cont. HMM is one in which you observe a sequence of emissions, but do not know the sequence of states the model went through to generate the emissions. Analyses of HMM seek to recover the sequence of states from the observed data.

Forward-backward implementation states = ('Future', 'Past') end_state = 'E' Years Income Forecast (p) 2012 239,493

225,245 2013 246,313 240,625 2014 250,775 246,621 observations = ('2Y(t2) + . . . + pY(tp) + 60', '2Y(t2) + . . . + pY(tp) + 46', '2Y(t2) + . . . + pY(tp) + 50') start_probability = {'Future': 0.97, 'Past': 0.3}

transition_probability = { 'Future' : {'Future': 0.69, 'Past': 0.3, 'E': 0.01Y(t1) + 2Y(t2) + . . . + pY(tp) + }, 'Past' : {'Future': 0.4, 'Past': 0.59, 'E': 0.01Y(t1) + 2Y(t2) + . . . + pY(tp) + }, } emission_probability = { 'Future' : {'2Y(t2) + . . . + pY(tp) + 60': 0.5, '2Y(t2) + . . . + pY(tp) + 46': 0.4, '2Y(t2) + . . . + pY(tp) + 50': 0.1Y(t1) + 2Y(t2) + . . . + pY(tp) + }, 'Past' : {'2Y(t2) + . . . + pY(tp) + 60': 0.1Y(t1) + 2Y(t2) + . . . + pY(tp) + , '2Y(t2) + . . . + pY(tp) + 46': 0.3, '2Y(t2) + . . . + pY(tp) + 50': 0.6}, } 2015 P(x,y) 255.682 (years) these are the probabilities 0f 1Y(t1) + 2Y(t2) + . . . + pY(tp) + 5 states: {'Past': 0.001Y(t1) + 2Y(t2) + . . . + pY(tp) + 09578, 'Future': 0.001Y(t1) + 2Y(t2) + . . . + pY(tp) + 041Y(t1) + 2Y(t2) + . . . + pY(tp) + 8399999999998} {'Past': 0.00394,

'Future': 0.002Y(t2) + . . . + pY(tp) + 49} {'Past': 0.01Y(t1) + 2Y(t2) + . . . + pY(tp) + , 'Future': 0.01Y(t1) + 2Y(t2) + . . . + pY(tp) + } these are the probabilities : {'Past': 0.1Y(t1) + 2Y(t2) + . . . + pY(tp) + 2Y(t2) + . . . + pY(tp) + 2Y(t2) + . . . + pY(tp) + 988962Y(t2) + . . . + pY(tp) + 442Y(t2) + . . . + pY(tp) + 6741Y(t1) + 2Y(t2) + . . . + pY(tp) + , 'Future': 0.87701Y(t1) + 2Y(t2) + . . . + pY(tp) + 1Y(t1) + 2Y(t2) + . . . + pY(tp) + 03755732Y(t2) + . . . + pY(tp) + 59} {'Past': 0.376771Y(t1) + 2Y(t2) + . . . + pY(tp) + 9690490461Y(t1) + 2Y(t2) + . . . + pY(tp) + , 'Future': 0.62Y(t2) + . . . + pY(tp) + 32Y(t2) + . . . + pY(tp) + 2Y(t2) + . . . + pY(tp) + 8030950954} {'Past': 0.7890472Y(t2) + . . . + pY(tp) + 951Y(t1) + 2Y(t2) + . . . + pY(tp) + 586943, 'Future': 0.2Y(t2) + . . . + pY(tp) + 1Y(t1) + 2Y(t2) + . . . + pY(tp) + 0952Y(t2) + . . . + pY(tp) + 704841Y(t1) + 2Y(t2) + . . . + pY(tp) + 3057} Run Time =0.031Y(t1) + 2Y(t2) + . . . + pY(tp) + 8363 sec Forward - backward cont Backward probability Assume that we start in a particular state (Xt=xi), T=transition Uses a column vector Years Income Forecast (p)

2012 239,493 225,245 2013 246,313 240,625 2014 250,775

246,621 2015 P(x,y) 255.682 Forecast Comparison Error Comparison MSE for SMA: 1Y(t1) + 2Y(t2) + . . . + pY(tp) + 8,380.5 MSE for ARIMA: 1Y(t1) + 2Y(t2) + . . . + pY(tp) + ,909.5 MSE for FB: 8,02Y(t2) + . . . + pY(tp) + 9.67 Conclusion - ARIMA was the more accurate out of the three algorithms.

- The ARIMA algorithm can be used to predict fairly accurate yearly income predictions. - -HMM FB was second best Model, however the results can vary depending on the initial states. Therefore the result can differ depending on the probability of reaching the next state. HMM is suitable for other types of predictions such as weather, shuffling or stocks since the states change constantly. Other Work in Income Forecasting - Digit has an algorithm that learns a user's spending and earning patterns - A website called Buxfer offers personal finance forecasting as well - There has been much work in comparing the algorithms themselves such as Time Series Prediction Algorithms by Kumara M.P.T.R.

Question Q: What is the difference between using the ARMA and the ARIMA models? A: The ARIMA model converts non-stationary data to stationary before operating on it. Question Q: Why is the simple moving average(SMA) less accurate than ARIMA? A: Arima takes more constraints into account: central moving averages(CMA), weighted moving average (WMA), seasonality and lag (failure to maintain a desired pace) , Question Q: What is the Big(O) of HMM? A: n is the number of hidden or latent variables

m is the number of observed sequences of observed variables Questions? References 1Y(t1) + 2Y(t2) + . . . + pY(tp) + . https://en.wikipedia.org/wiki/Forecasting 2Y(t2) + . . . + pY(tp) + . http://www.slideshare.net/tharindurusira/time-series-prediction-algor ithms-literature-review 3.http://www.bloomberg.com/news/articles/2Y(t2) + . . . + pY(tp) + 01Y(t1) + 2Y(t2) + . . . + pY(tp) + 5-07-1Y(t1) + 2Y(t2) + . . . + pY(tp) + 4/should-you-let -an-algorithm-do-your-saving-for-you4.https://catalog.data.gov/dataset?groups=finance3432Y(t2) + . . . + pY(tp) + #topic=financ e_navigation 5.https://people.csail.mit.edu/rameshvs/content/hmms.pdf References cont. http://web.stanford.edu/class/cs227/Readings/ConstraintsSurveyByKumar.pdf http://docs.roguewave.com/imsl/java/6.1/manual/WordDocuments/api/com/imsl/stat/AutoARIMAEx2.htm