I wrote some python code that tabulates partial correlation significance across k folds for robust dimension reduction
This has been on my to do list for a while.
I wrote this for a few reasons: one being my stock correlation analysis flipping signs on me across partitions, but also because in the bayes regression tutorial they mentioned using correlation analysis for factor reduction (seems to be pretty common across any type of supervised method), but my kfold modification was applied because it makes sense to sample correlations similar to how Bayes does for distributions with Markov chains. Not exactly the same type of sampling, just the concept of sampling to identify a metric. I’ve learned correlations can give false positives (only occasionally significant, happens often w a variable when using subsamples) and almost never does an article discuss finding significant coefficients of determination.
Notebook with full model and some added purity scoring (tabulation of sign / significance as a percentage): https://github.com/thistleknot/python-ml/blob/master/code/pcorr-significance.ipynb