Regression Diagnostics Outlier Detection [Python]

I took some code that transplanted R’s regression diagnostic plots to Python and I augmented it to match what I had done previously to it in R.

Basically highlighting the row names of outlier’s as opposed to simply printing off their #, as well as adding in guiding lines.

This plot shows T student residuals mapped against leverage with their respective outlier flagged range limits plus if the cook’s distance exceeds .1 (blue) or .5 (red) which based on it’s F probability distribution score.


I’ve been working on my outlier detection methods and found a useful way to highlight them. My code now outputs at the very end of the regression process the outlier’s in question

This is based on Federal Reserve data as available from their API. The regression in question is based on the DGS10 and not surprisingly correlated significant terms are other measures that are directly related to mortgages and interest rates. I have more interaction terms but due to the way I’m using matches from the non interacted terms, only these showed up.

What I like about this is I KNOW those years are when we had economic crisis (2008 housing bubble burst, and the 2020 covid crisis) and what the outlier detection is flagging is the economic tomfoolery that was necessary to sustain the economy.

The notebook is available here

Leave a Reply

Your email address will not be published. Required fields are marked *