Category Archives: My Work

Markowitz Profiles (Modern Portfolio Theory)

I’ve significantly improved on my Markowitz algorithm.

It’s a merge of two tutorials (but I’ve read 3). Currently I derive weights based on the optimal sharpe ratio. ATM I don’t take into consideration any interest free rate (i.e. its simply return/volatility).

I was trying two different solver methods. Minimize was taking a really long time, so I tried this other tutorial that made use of cvxopt and a quadratic equation (which I use to draw the efficient frontier rapidly). I was unable to find the optimal value based on sharpe ratio using the cvxopt solver… (it would only solve either the bottom or topmost portion of the line) so I looked at finquant and it uses a montecarlo of 5000 to derive the optimal points (i.e. max return, min volatility, and max sharpe ratio).

So I fell back on monte carlo to do it for me. i.e. “The path of least resistance” and the margin of error is acceptable.

The import thing is it’s very fast/ straightfoward (KISS). So I will use this to backtest with.

What I like about Markowitz is it assumes past behavior is a good model for future (t tests in stocks are used to determine if two partitions of returns are equal). It’s been said that yesterdays price is the best predictor for tomorrows. Same for returns.


Trying to land a job in Data Science

“𝐁𝐞𝐥𝐢𝐞𝐯𝐞 𝐢𝐧 𝐲𝐨𝐮𝐫𝐬𝐞𝐥𝐟. 𝐃𝐨 𝐰𝐡𝐚𝐭 𝐲𝐨𝐮 𝐥𝐢𝐤𝐞. 𝐃𝐨 𝐧𝐨𝐭 𝐰𝐨𝐫𝐤 𝐨𝐫 𝐥𝐞𝐚𝐫𝐧 𝐣𝐮𝐬𝐭 𝐛𝐞𝐜𝐚𝐮𝐬𝐞 𝐲𝐨𝐮 𝐰𝐚𝐧𝐭 𝐚 𝐣𝐨𝐛 𝐨𝐫 𝐲𝐨𝐮 𝐰𝐚𝐧𝐭 𝐭𝐨 𝐞𝐚𝐫𝐧 𝐦𝐨𝐧𝐞𝐲. 𝐋𝐞𝐚𝐫𝐧 𝐛𝐞𝐜𝐚𝐮𝐬𝐞 𝐲𝐨𝐮 𝐰𝐚𝐧𝐭 𝐭𝐨 𝐠𝐫𝐨𝐰”. – Kurtis Pykes

Thank you for sharing your journey which not only depicted your low days but also highlighted how you overcame and worked on yourself.
The best takeaway from the session for me was “𝐓𝐚𝐤𝐞 𝐭𝐡𝐞 𝐜𝐫𝐢𝐭𝐢𝐜𝐢𝐬𝐦 𝐚𝐬 𝐚 𝐰𝐚𝐲 𝐨𝐟 𝐥𝐞𝐚𝐫𝐧𝐢𝐧𝐠 𝐚𝐧𝐝 𝐢𝐦𝐩𝐫𝐨𝐯𝐢𝐧𝐠 𝐲𝐨𝐮𝐫𝐬𝐞𝐥𝐟 𝐫𝐚𝐭𝐡𝐞𝐫 𝐭𝐡𝐚𝐧 𝐡𝐚𝐫𝐬𝐡 𝐟𝐞𝐞𝐝𝐛𝐚𝐜𝐤.” – Manpreet Budhraja

Disclaimer: this is just me trying to make sense of my position.

I’ve struggled with inferiority complex, or what others might call imposter syndrome. I’m certainly not trying to fake it till you make it, nor do I think I’m simply being a dilettante about my passions.

Those concepts though related differ from each other. For example, I struggle with achieving to arrive with where I want to be (data science role) while at the same time I’m intimidated by some of the big data processing capabilities of my peers. Ultimately I do want that good cheese money but I dont consider not having that as disqualifying me to lay claim to that title. A woman told me a long time ago, you are what you say you are, but I believe what Jung said moreso.

“You are not what you say you are, you are what you repeatedly do” (which follows on William Durant’s take on Aristotle)
“Excellence is not an act, but a habit.”

Take for example my housing projects. I didn’t need a license to be qualified. I simply needed to do it.

Add to that mix being a bit more than a touch sensitive to criticisms. But I’ve been working on that last part. Trying to listen without judgement. Merely accepting the points of views as areas to grow into. Another idea of Aristotle’s, no man is an island. The polis is the teacher of man. Meaning you are not perfect & must learn from others.

I blame a set of factors for not being in the role I thought I’d be in given my credentials

Companies vet higher level candidates more stringently. The bar is raised. Different companies have different expectations for what it means to be a data scientist. Ex. Math knowledge? Tools used? How advanced are you in them (sql is a big one, but so are network data frameworks)? Its kind of like class elitism but with skills but there is an element of class in it as well. They don’t expect nor should they that fresh college graduates have them. Basically you may have trained on some formal level but evaluated as not good enough… often. More so than other types of roles.

“How many years experience would you say you have?” & if it doesn’t match your career length on your resume. You will be evaluated as too old. Companies want to hire malleable fresh college graduates to mold & shape.

Kaggle projects have become the norm to evaluate candidates.
Higher level pay means less past the gate

Entry level requirements
3 years paid experience is considered minimal for entry level positions (& data science is often a mid to senior level).

Trying to strive for some level of formal recognition is misguided, at least in the belief its going to land you the position you want. I with my masters presumed it would be an easy transfer to data science but that has not been the case. In 2 years I’ve had 5 interviews for data science roles (thats the Gatekeeping). One I passed but it was for correcting statistical scholarly publications which I turned down after some advice from a former professor. Companies will sell you on “titles” that you strive to earn to feel qualified that ultimately doesn’t get you where you want (sophists). What really matters is not buying into the hype and simply practicing it. I know many people who worked at AT&T for software engineering that didn’t have formal education in software concepts. They were simply bright people who applied themselves. Which is the point. Education can be beneficial but its not a cure all. What excellence really is, is a habit

My advice is twofold.

First and foremost. You have to do you. You can’t be in a job you don’t love. Don’t be discouraged. Always strive for me.
Second, be practical. You need 3 trade skills. One you strive for, one that you are practical/efficient in, and one that is a fallback.
Mine happen to be Data Science, System Administration, and Housing Construction.
But they weren’t always those. At one time it was system admin/dev, pc repair, and dishwashing.
But as you progress/evolve you pivot into higher tiers. My fallback plan isn’t so much a means to make money, but a means to avoid becoming destitute (i.e. I can live in the houses I’ve made). But the point is. I didnt wait for anyone to qualify me. I just did them and eventually made something out of my experience and opportunities. But you need to be ready for when the skill you’ve been striving for presents itself as an opportunity which you’ve already habitualized (whether having recieved formal “qualified” training/recognition or not)

Regression Diagnostics Outlier Detection [Python]

I took some code that transplanted R’s regression diagnostic plots to Python and I augmented it to match what I had done previously to it in R.

Basically highlighting the row names of outlier’s as opposed to simply printing off their #, as well as adding in guiding lines.

This plot shows T student residuals mapped against leverage with their respective outlier flagged range limits plus if the cook’s distance exceeds .1 (blue) or .5 (red) which based on it’s F probability distribution score.


I’ve been working on my outlier detection methods and found a useful way to highlight them. My code now outputs at the very end of the regression process the outlier’s in question

This is based on Federal Reserve data as available from their API. The regression in question is based on the DGS10 and not surprisingly correlated significant terms are other measures that are directly related to mortgages and interest rates. I have more interaction terms but due to the way I’m using matches from the non interacted terms, only these showed up.

What I like about this is I KNOW those years are when we had economic crisis (2008 housing bubble burst, and the 2020 covid crisis) and what the outlier detection is flagging is the economic tomfoolery that was necessary to sustain the economy.

The notebook is available here

2010 Census Data Cluster Analysis

Aggregated the clustering work I’ve been working on.

Based on 2010 Census Data, a Statistical Abstract of the United States

Optimal k for clusters as found by NbClust’s 30 tests was 2 (majority rule). I used box-cox transformed PCA Whitened and weighted variables as well as fviz_dist to visualize the similarities of clusters.

Violin plots to aggregate the actual clusters

A 3d PCA plot showing the top 3 principal components which made up 77% of the proportion of variance

And finally pairplot to show the 2 clusters mapped against every possible scatterplot of the original data.

factoextra fviz_dist

Using 2010 Census Data (Statistical Abstract of the United States)

I’ve scaled the data according to PCA variance.

this is a graph of factoextra’s fviz_dist
red = ~0 Euclidean distance (for clustering)
Blue = disimilar


[1] 0.5639438
Closer to 1 is very clusterable

This is very interesting.

PCA Clustergram Violin Plots with Pairplot

Based on last census data (2010?)

I used Clustergram with PCA scaling to create the cluster labels and it did a great job of separating the features (i.e. unsupervised). The best performing k based on fitness was 2 groups (which definitely makes things easier to see)

I’ve arranged the violin plots next to each other for easy comparison.

This would be nice to have in a dashboard. To create custom segments based on various k

I would like to do ANOVA using TSS, WSS, BSS to confirm the populations are different.