Subject Code & Title :- DTSC12-200 Logistic Regression
Assessment Type :- Assignment 2
Assignment description :-
The same bank has hired you again and the management now wants to analyse the same data you analysed before for potential customers using a different algorithm. They have heard about Logistic Regression and they are keen to see if this would improve their under standing about the data.
DTSC12-200 Logistic Regression Assignment 2
The dataset can be found at UCI:
The dataset has 48842 instances, already split into training and test sets. The format of the datasets is in csv we need to rename the files to create the extension .csv. There are 14 attributes, and there are missing values in some columns/rows. The missing values can be considered a category on their own with in that column OR the rows with missing values can be deleted from the dataset. See the explanation and the accuracy obtained by the author in the file adult.name(.txt). The first 14 columns are the attributes features and the last column is the labelled class.
The dataset is already split into train-test (2/3, 1/3 random). The 48842 instances are split into train=32561 and test=16281. If the unknown values are removed, then 45222 instances are left, split into train=30162, test=15060.
DTSC12-200 Logistic Regression Assignment 2
Your task in this assignment is to use Logistic Regression to model the problem, and to come up with the best possible classification for the test set. The classifier needs to classify each instance into either >50K or <=50K for the predicted income. Remember that during training, only the training data can be used to build the decision trees. The classifier metrics should always be based on the test set.
Deliverables: 2 pdf reports and the corresponding rmd files
The submission should include 2 pdf files both produced with rmd files (e.g., with different chunk options). The 2 pdf files are described below:
– Technical Report: all the code and the results including partial results should be visible. This would be used by the data analytics team at the bank to compare to other methods they may use.
– Management Report :- a partial report with only the necessary items to help manager under stand how the model might work for them. The management report should have no more than 3 pages.
DTSC12-200 Logistic Regression Assignment 2
Remember to ensure that you use rmd files to produce both reports. Reports not created directly from the rmd may be penalised.
Tips :-
You should try different ways to use the logistic regression. A threshold can be adopted depending on which objective the bank has. E.g., which error type is favoured by the bank? You can propose your own threshold.
DTSC12-200 Logistic Regression Assignment 2
– Compare the performance using different thresholds and inform which one you adopted (and why).
– In the reports, explain the reasons you have chosen a particular threshold. Remember that often metrics can contradict each other (e.g., total accuracy is different than recall or precision, which will favour one class over the other). Show which goals you are achieving by comparing confusion matrices.
– For the management report try to use simple words avoid jargon and lots of graphs/tables visualisations.
– You are allowed to reuse code from the workshops. You can also look for help on the Internet but remember to follow the Academic Integrity Guidelines in Coding (pdf file in the assessment section
– For DTSC71-200, compare at least 3 different models and assess them with extra metrics beyond accuracy and confusion matrices e.g., precision and recall.
Excellent Assignment Help
We Aim At:
- Lowest Price.
- 100% Uniqueness.
- Assignment Fastest Delivery.