Partial Correlation Robust Dimension Reduction using KFolds

I wrote some python code that tabulates partial correlation significance across k folds for robust dimension reduction

This has been on my to do list for a while.

I wrote this for a few reasons: one being my stock correlation analysis flipping signs on me across partitions, but also because in the bayes regression tutorial they mentioned using correlation analysis for factor reduction (seems to be pretty common across any type of supervised method), but my kfold modification was applied because it makes sense to sample correlations similar to how Bayes does for distributions with Markov chains. Not exactly the same type of sampling, just the concept of sampling to identify a metric. I’ve learned correlations can give false positives (only occasionally significant, happens often w a variable when using subsamples) and almost never does an article discuss finding significant coefficients of determination.

Gist: https://gist.github.com/thistleknot/ce1fc38ea9fcb1a8dafcfe6e0d8af475

Notebook with full model and some added purity scoring (tabulation of sign / significance as a percentage): https://github.com/thistleknot/python-ml/blob/master/code/pcorr-significance.ipynb

Leave a Reply

Your email address will not be published. Required fields are marked *