Using ICPSR polling data of 8th & 10th grade Americans. I transform from a set of predictor terms into what I call a “semiotic grid” of 1’s and 0’s which are then used to identify a class of 1’s and 0’s of desired outcomes of 3 specific response terms. GPA, gang fights, and (gasp) presence of psychedelic drug use.
I use monte carlo resampling to achieve class balancing and do a modified bestglm algorithm to get a wider set of terms via cross validation then through Cross Validated holdout analysis then tabulated. That’s just for initial factor reduction/pooling potential candidates. Then these terms go through more class balancing, cross validation once more using actual bestglm unmodified to arrive at a final regression formula as well as terms that are always population significant & closing with ROC.
I am offering the project as a type of open house to potential employers to determine if my skillset would be a good fit for what you hope to do with numbers.
This model was derived from samples only. I did not dare touch the population until I was ready to do so. I used cross validation on training/validation and then ran a set of factors through a holdout partition doing cv as well. I then boiled these factors up across multiple samples and kept the common elements and then derived a population extraction of those samples and what you see is a scientifically reproducible significant factor finder. If I increase the specificity or change the seed, different elements will surface to the top, but the overall patterns should be the same. I’m really excited. I’ve been wanting to do this for a really long time. The only thing I have next to do is finish classification matrix and do more work on time series forecasting and then I will have learned what I really wanted to at Fullerton. This is for GPA. And those factors are
Based on ICPSR's "Monitoring the Future: A Continuing Study of American Youth (8th- and 10th-Grade Surveys), 2012[-2017]"