Assignment 1 for CS9860A, 2013

Assignment 1 asks you to use Octave (MatLab, WEKA, or any other open source machine learning software) to perform linear regression, logistic regression, and regression with regularization. You may need to write some simple codes to do data transformation or cleaning before using the software directly.

Download at least one dataset for regression, and one for classification from UCI Machine Learning Repository. The dataset should have at least 100 examples, and 10 attributes. You could also create such datasets by yourself.

a. Apply linear regression on the regression dataset, and logistic regression on the classification dataset. If the dataset is already almost perfectly linear, add some random noise on the Y (target) variable.

b. Analyze the training error of a above. Add new transformation attributes (such as x square) to reduce the error.

c. Observe that doing b above (to extreme) can cause overfiting of the training data. Add regularization to see how it prevents the overfiting.

d. For at least one of the above, use both gradient descent and normal equation to solve it, and compare the results.

The deadline for submitting Assignment 1 is Nov 5. How to submit: TBD.