Course Case Part II In this assignment you are tasked with using the information in our course case to build a predictive model on a binary response variable (Y-variable). This assignment encompasses feature engineering model preparation variable selection and model development. This assessment is worth 25% of the total grade for this course. Note that parts of this deliverable will be reused in the Analysis Report to Management assignment. Although there is no grade for insights in this deliverable you are strongly encouraged to begin formulating and articulating them as this will help save time later in the course.
A) Deliverable: Jupyter Notebook or .zip file containing your Jupyter Notebook. Please use the following naming convention: FamilyName_FirstName_A1_Classification_Analysis.ipynb
B) Modeling Criteria and Violation Penalties Note that some criteria are different than in A1: Regression Model Development (Individual). Your deliverable needs to meet the following criteria. Failure to meet the criteria listed below will result and a reduction in your model points score. Your grade will be determined by the performance of your final model in scikit-learn as follows: Final Model Points = Final Model Test Score (AUC) â€“ Modeling Violation Penalties Grading Table for Regression Model Development Grade Final Model Points A 0.84 – 0.91 B 0.77 – 0.8399 C 0.70 – 0.7699 or 0.9101 – 1.00 D 0.50 – 0.6999 F less than 0.50 Criterion 1 â€“ Response Variable Usage The response variable cannot used in any form as an explanatory variable (the Y-variable cannot be used on the X-side).
Violation Penalty Both of the following will occur if the response variable was used as an explanatory variable: Criterion 2 â€“ Model Types Model types are appropriate for the task at hand and come from statsmodels or scikit-learn (other packages and engines are not permitted). Permitted Model Types Note that you are permitted to adjust the optional arguments of the permitted model types. Violation Penalty Final models that are not in the list of permitted model types will be discarded and the last appropriate model that ran in your code will be used as your final model. Final model points will be reduced by 0.025. Criterion 3 â€“ Code is Well-Commented and Runs Without Errors For this assignment aim for a minimum of one quality comment for every 10 lines of code. Note that this criterion is less strict than in the Python for Data Science course as you have developed further in your coding journey. Violation Penalty Criterion 4 â€“ Code Processing Speed Your code must process from beginning to end in 240 seconds or less based on your computerâ€™s processing speed. Watch out for your hyperparameter tuning! Violation Penalty Criterion 5 â€“ Model Output Model results are outputted as a dynamic string (i.e. f-string) at the end of your script. This must be the last thing that your Jupyter Notebook outputs. Writing this as markdown or exporting as an Excel file are not acceptable (must be a dynamic string). Output table of candidate models is well-formatted and contains the following information: Violation Penalty Not including all of the above information in a well-formatted dynamic string will result in a 0.25 reduction in model points. If it is unclear which model was selected as the final model the following will occur: Criterion 6 â€“ Model Parameter Requirements The following criteria must be met when developing your models: Violation Penalty Not following the model parameter requirements will result in a one grade band reduction and the random_state test_size and stratify arguments will be set appropriately before rerunning your model. Also your model will be rerun with a max_depth of 8. Criterion 7 â€“ X-variable usage The original and logarithmic versions of an x-variable may not be used in the same model. This does not include engineered features based on these variables. Violation Penalty Using both the original and logarithmic versions of an x-variable will result in the logarithmic version of the x-variable being removed from the model and the model will be run again. Also this will result in a 0.25 reduction in model points. Criterion 8 – Full Dataset Usage You are not permitted to remove or modify any observations from the original dataset with the exception of imputing missing values (you are not permitted to remove observations with missing values). Also your Jupyter Notebook must be able to be run from the original dataset (no feature engineering or alterations in Excel or other tools are permitted). Violation Penalty If the above is violated your Jupyter Notebook will be rerun from the original dataset and any errors that result from this will be subject to Criterion 4 above. Additionally your final model points will be reduced by 0.025. Criterion 9 – Max AUC Score Given the classification model types we are using throughout our course it is possible that you may experience label leakage. This is where your training accuracy testing accuracy and AUC become unrealistically high which is caused by your model tuning too closely to the y-variable you are trying to predict. For this reason the max AUC score you may attain in the A grade band is 0.91. Classification Model Development Classification Model Development This criterion is linked to a Learning OutcomeDescription of criterion A 0.84 and above B 0.77 and above C 0.70 and above D 0.55 and above F less than 0.55 4 pts Total Points: 4
The model will be rerun after this variable has been removed. Your final model points will be reduced by 0.025. Logistic Regression K-Nearest Neighbors Classification (KNN) Classification Trees Random Forest (Classification) Gradient Boosted Models (GBM) Not being well-commented will reduce your final model points by 0.025. Submitting a code with at least one error will reduce your final model points by 0.025. Not meeting this criteria will reduce your final model points by 0.025. Model Name Training Accuracy Testing Accuracy AUC Score Confusion Matrix It is clear which model is your final model (label it accordingly) The last model in the candidate model output will be utilized. random_state is set to 219 test_size is set to 0.25 target variable is stratified max_depth for classification tree random forest and gradient boosted machine (GBM) models is less than or equal to 8
Delivering a high-quality product at a reasonable price is not enough anymore.
That’s why we have developed 5 beneficial guarantees that will make your experience with our service enjoyable, easy, and safe.
You have to be 100% sure of the quality of your product to give a money-back guarantee. This describes us perfectly. Make sure that this guarantee is totally transparent.Read more
Each paper is composed from scratch, according to your instructions. It is then checked by our plagiarism-detection software. There is no gap where plagiarism could squeeze in.Read more
Thanks to our free revisions, there is no way for you to be unsatisfied. We will work on your paper until you are completely happy with the result.Read more
Your email is safe, as we store it according to international data protection rules. Your bank details are secure, as we use only reliable payment systems.Read more
By sending us your money, you buy the service we provide. Check out our terms and conditions if you prefer business talks to be laid out in official language.Read more