Project 3 - Linear Regression (Due 11/09)
Objective:
This project focuses on regression as compared to classification tasks you've been working on in the previous two projects. You will implement and compare various linear regression approaches.
An active Kaggle competition dataset is used so that you'd be aware how peers do across the world.
Data Sets:
The House Prices - Advanced Regression Techniques from https://www.kaggle.com/competitions/house-prices-advanced-regression-techniques
Tasks:
- Task 1 (20 pts): Visualize the dataset on a 2-D plane using PCA and t-SNE. You need to calculate the error rate for PCA and play around to get the best combinations of parameters used in t-SNE. Describe the process leading to the best visual effect. Note that standardization becomes a default from now on even without mentioning. If you need to convert the regression problem to a classification problem, based on the visualization, how many classes should there be? Write the pseudo-code to realize this process.
- Task 2 (45 pts): Implement multiple linear regression, polynomial regression, and logistic regression yourself. Compare and analyze results. That is, why certain regression works better than others for this dataset.
- Task 3 (25 pts): Call the ridge regression, LASSO regression, ElasticNet regression, decision tree regression, and SVM regression routines in sklearn and discuss the results.
- Task 4 (10 pts): Study the leaderboard and describe the approaches that the top 3 took. Discuss how you may want to improve your approach.