MCS 507 --- Individual MATLAB Problem 2 --- Fall 2004
Method of Least Squares for S&P 500 Index:
MATLAB Comparisons: Slash, Polyfit and SVD


MATLAB Output is Due Wednesday 20 October 2004 in Class.


General Problem Objectives:

Often in Industry, a worker has to test numerical procedures before selecting which one will be used in production numerical procedures. In MATLAB you can fit dependent ordinate data (e.g. in y) as a function of the independent coordinate x by two (2) of three (3) methods as, long as the function being fitted is a polynomial. The model polynomials will be cubics (degree 3) and will be used to fit separately quarterly Stock Market Index Log-Return Means and quarterly Log-Return Volatilities (Standard Deviations) for Standard and Poor's 500 Data for the 16 years from 1988-2003.

Choose at least one of the first two fit methods in addition to SVD:

  1. polyfit and polyval polynomial functions (use MATLAB help polyfit and help polyfit commands).

  2. \ back-slash or (pseudo-)inverse operator (See MATLAB help SLASH).

  3. svd or Singular Value Decomposition function ( See MATLAB help svd).
Remark: If you are familiar with the statistical packages SAS or SPSS, you can substitute for "svd" the GLM or Reg function of SAS or the Regression function of SPSS, similarly with public-domain statistics packages like R.)

The objective is to compare two of three methods by fitting to existing data to see which one of the two you should recommend to the boss.


Problem Statement:

The problem is to compare two of the three methods on the Standard and Poor's 500 Stock Index Data for the 16 years from 1988 to 2003 by fitting both the quarterly stock log-return means and log-return volatilities (standard deviations).

  1. Log-Returns:

    The data in raw form is available at Yahoo! Finance, ``S&P 500, Symbol ^GSPC, Historical Prices,'' at starting at URL:

    then Set Date Range -> Get Prices -> Download to Spreadsheet. The data is a table of Date, Open, High, Low, Close, Volume, Adj. Close items in reverse chronological order, but has been converted and edited to plain text format listing the date and closings only at

    with a convenient slash field delimiters for the date file with other MS/Excl manipulations to simplify the input. A sample record or single line is the first three lines of the date-closing file:

    and the last three lines:

    representing the "Day/Month/Year   Close" fields, respectively. The very last line is for the last trading day in 1987 to allow calculation of the change for the first day of 1988. Also, the data comes in backward time order, so that the data will have to be put in forward order.

    The number of trading days is about 250 per year, but varies slightly, while the market is closed on weekends and holidays, so that these factors need to be taken into account to avoid bias when the data is used in investment models. Hence, the days of each year must be re numbered to exclude non-trading days and renormalized as the fraction of trading days for each year. Reading the needed data is also complicated, so if desired the following MATLAB script can be used to read the numerical data fields separately:

    Note that the data is stored as one long single row vector of length ndatecol which is the total number of trading days. It is suggested that you separate the long row vectors into numerical vectors for each Day, Month, Year, Close in proper time order.

  2. Log-Returns:

    The objective in this problem is to compute the log-returns (roughly the relative return for small changes) of all the closing vector, for example

  3. Floating Point Time Conversion:

    For each of the 4*16 = 64 quarters take the midpoint of the quarter

    noting "(1988-1)" has replaced "1988" since iy=1 should start TM with 1988, for iy = 1:16 years and iqy = 1:4 quarters (Jan-Mar, Apr-Jun, Jul-Sep, Oct-Dec) for each year. A better calculation would just count official trading day and take the fraction representing the midpoint of the quarter's trading days converted into fractions of a year, but the above formula should be used for simplicity.

  4. quarterly Means and Volatilities:

    The ultimate objective is to fit the means for each quarter using the MATLAB mean function and the volatilities for each quarter using the MATLAB mean function std function.

  5. Fitting quarterly Means and Volatilities:

    Two cubic polynomial fit models are required, one for quarterly log-return means and one for quarterly log-return volatilities. Assign the quarterly means and volatilities to the time at

  6. Better Conditioned Mid-Year Variable:

    Using polyfit, MATLAB will rightly complain that the fit is ill-conditioned, so you must center (c) and scale (s) the quarterly time at mid-quarter, such that


Some Hints on Methods:

  1. Use the MATLAB ones and size functions to construct the matrix of coefficients A of the vector polynomial coefficients a = [a1 a2 a3 a4] in the cubic case:

    noting that array or component-wise exponentiation operation (.^) is needed rather than regular matrix exponentiation (^) (WATCH those periods!).

  2. Use polyfit to get the fit coefficient vector

      "a = apoly",

    where "a" is the coefficient notation used in class, from the input

      "x = tpoly"

    vector data and the corresponding output

      "y = y-data"

    vector data, then use polyval to compute the the predicted values of

      "ypred = ypoly".

  3. Find out how to use the vector output arguments of

    and

    to plot the 95% confidence intervals plotting the upper and lower bounding curves

    against "the Time variable" for appropriate values of the multiplier "c", which is the ratio of "delpoly95" to "delpoly50 = delpoly", where "delpoly95" is increment above and below the fit for a 95% confidence interval. Stuct is a structure form that is used to estimate the errors in the fit and is wise to use polyval to calculuate the predicted polynomial values since the MATLAB coefficient vector is not the same used in class. See for specification on the 50% error bounds. Caution: One MATLAB guide says that "c" should be "2", but "c" is very close to "3" for the usual normal distribution assumption.

  4. Use the back-slash function to solve the A*a=y problem with

    say, since when A has more rows (m) that columns (n) the back-slash also finds the least squares solution instead of the inverse when m = n. Do not forget that you have to find the predicted values y=yslash given tpoly.

  5. Use svd (see help svd) to get the Singular Value Decomposition of where V' is MATLAB for transpose of A. Do to the unusual format of the MATLAB svd, you will have to do the SVD inversions with extra parenthesis or extra steps to avoid MATLAB matrix algebra confusion, e.g.,

    Again you need to find the predicted y-values, say ysvd and the corresponding values, say ypoly, yslash, ysvd.


General Instructions::

For the two methods fit cubic polynomial models to both quarterly means and quarterly volatilities, you must also present documented output for

  1. Plots using the MATLAB plot function comparing "y", "ypoly", "yslash" and "ysvd" against the original decade vector "tpoly", with appropriate labels, where "y" is either the quarterly mean or volatility vector.

  2. Standard deviations (see help std) of residual or deviation vectors that are the difference between the two of three ypoly, yslash and ysvd predicted vectors from the original vector y, where y is either the the quarterly mean or volatility vector, i.e., find the least squares of the differences between the quarterly data and the cubic model at the same mid-quarter times.

  3. Standard deviations (std) of the two of three differences (deviations) between each of predicted vectors from each of the two of three methods. Is there much difference between the two of three methods considering the numerical precision in MATLAB (put this answer in your problem comments)? Make of table summarizing the variances of the differences: ypoly versus yslash, yslash versus ysvd and ysvd versus ypoly, where y is either the the quarterly mean or volatility vector.

  4. Use the etime (see help etime) function of MATLAB to time each of the three methods, adding the etime to calculate the normal matrix A to only the methods of "slash" and "svd". The purpose here is to measure the efficiency of your MATLAB code. (Caution: New version of MATLAB does not have flops, until a fix is prepared.)


Project Report:

Your professional, individal report needs the following parts:

  1. Cover Page: Put a project title, your name, your affiliation, date and other identifying information on this individual computer project. What you submit must be your own work (this in NOT a group project) and points will be deducted for similar work.

  2. Executive Summary: This is about a page summarizing the project and your results for a busy boss. This should be in the form of a outline or itemized list for easy and fast reading. Also, a summary graph would be helpful.

  3. Project Description or Introduction: Describe the project in your own words as an introduction to your report, in sufficient depth so that a reader such as yourself would understand it.

  4. Methods: Describe the mathematics and the algoritms behind these methods used to solve the problem, giving both advantages and disadvantages in a fair manner.

  5. Results: Describe the nature of the results and illustrate them with appropriate tables or plots. You can use MATLAB for plotting your results. Clearly label tables and plot figures in a professional manner.

  6. Discuss: Discuss the results, including how they can be used elsewhere for different industrial applications. Explain how and why methods differ or do not differ.

  7. Acknowledgements: Acknowledge what resources you used in this project, including what versions of MATLAB that you used, the operating system, the computer or hardware platform, persons consulted (important: grade is discounted for similar reports and unacknowledged use of other sources), and any other resources (references are listed in the next section) used.

  8. Conclusions: List what you have learned from this project and explain why it is significant.

  9. References: Cite all books, scientific papers, web-sites and other library or web resources that you used. Give author, title, journal name or book publisher or URL where appropriate, and date of publication or web access.

  10. Appendices: Include MATLAB documented source code and output.


Project Resources:

  1. UIC PCLabs MATLAB available software. Also check out your departmental computers such as in EECS and MSCS.

  2. MATLAB Help Page (Hanson).


Web Source: http://www.math.uic.edu/~hanson/mcs507/cp2f04.html

MCS 507 HomePage: http://www.math.uic.edu/~hanson/mcs507/

Email Comments or Questions (MCS 507 only please) to Professor Hanson, hanson A T uic edu