# Strategy for building a “good” predictive model

By Ian Morton. Ian worked in credit risk for big banks for a number of years. He learnt about how to (and how not to) build “good” statistical models in the form of scorecards using the SAS Language.

Read original post and similar articles here. I thing Ian's list below is a good starting point. I would add a few steps such as deployment, maintenance at the end, and gathering requirements, understanding goal and success metrics at the top.

**Initial investigations**

2. What is the outcome ? is it yes / no ? is it continuous ?

3. Decide upon the model required (logistic ! for yes / no outcome)

**Getting the data ready**

5. summary statistics to understand the distribution of the continuous variables

6. Ask questions about data quality:

- remove these variables from any potential models ? or,
- think about imputation ? or,
- obtain accurate data ?

7. Convert continuous variables into categorical variables

**Modelling**

8. Check for multi-colinearity / correlation between variables (variance inflation factors), or correlation tests

9. Check for interactions

10. Choose type of logistic approach (e.g. forward, backward, stepwise)

11. Choose the baseline attribute for each categorical variable

12. Create a random variable – mustn’t step into the model - something is wrong if it does step into the model

13. Split the dataset into two parts (ratio 80%/20%)

- using random selection without replacement
- the larger sample is the
*build*dataset - the smaller sample is the
*test*dataset

*build*dataset (including interactions and the random variable) into the model and run it

- Check odds ratios – do they make sense ?, and
- Check the coefficients – do they make sense ?

**Check the model**

15. Do diagnostic checks and plots of the fit (e.g. Somers D, residuals etc., etc.)

16. Put all variables from the *test* dataset (including interactions and the random variable) into a new model and run it

- Are the coefficients the same as the model it was built on ? and
- Are the odds ratios the same as the model it was built on ?

**Start again**

17. Back to the start, fine tune the grouping of the data, put variables in or take variables out.

**Related articles**

- Great statistical analysis: forecasting meteorite hits
- Data Science Dictionary
- Four different ways to solve a data science problem - case study
- Building a good predictive model for credit risk
- Data Science eBook
- Data Science Apprenticeship
- 66 job interview questions for data scientists