Files

https://github.com/thistleknot/Python-Stock/blob/master/ANN.ipynb

https://github.com/thistleknot/Python-Stock/blob/master/RNN.ipynb

Leave a reply

This single file does everything you need to do in Machine Learning for non time series data. (Reposted after I did some major code cleanup). The Caret library is awesome!

What does it do?

* Derives interactions, and squared predictor terms.

* Sets a training and holdout partition.

* Uses cross validation for all models

* Ensemble comparison: Reports RMSE of each model (caret/models handles the factor reduction!) and gives summary information on the best model

* Prints out best forecast compared with holdout

* Shows correlation of forecast/holdout as goodness of fit

* Iterate through every column of original dataset as the independent term

Models applied

* KNN

* ANN (Nueral network)

* Random Forest

* XGBoost

* Elastic Net

* Best subset using leaps

It runs very fast.

You can easily retrofit this for any of your own data and derive your own inferences

#machinelearning

Code: https://github.com/thistleknot/matrixMultiplicationRegression/blob/master/caret%20elastic%20nnet%20xg%20leaps%20models.R

Based on the following tutorials

https://rpubs.com/sergiomora123/Bitcoin_nnet

https://medium.com/@salsabilabasalamah/cross-validation-of-an-artificial-neural-network-f72a879ea6d5

Github: https://github.com/thistleknot/ANN_trading

I noticed some glaring mistakes in the tutorials I was following. For example they apply normalization to ALL the data vs extracting normalization parameter’s to the training data and then using THIS to normalize and denormalize the the test predictions. Which are things I went over during my self imposed sabattical.

I’m really really excited to see how this plays out when I do stepforward on my stocks.

What you see here is a regression line of predicted vs actual (testdata) without using the testdata in my model building. These are RETURNS

Threshold of ideation: when sensory input hits memory space hits what libet calls conscious mental field theory

Consciousness is merely reflection. I came to this understanding from the idea of integration (what vars do I have?) and how we see something after sensory input (when we collect all the pieces together). Idea was confirmed with a little reading from McEvilley as well as Sun Moon motif (moon reflects sun’s light). A platonic idea to me is a regression equation, a cognit. A minimal “planck” idea. What this means is a certain mix of elements (factors, think Aristotle Categories) with their own ranges each. Make up a mix of an expression (instance) of an idea. This is an idea (class) instance (particular, many, material reality). These are what we are aware of as ideas when we integrate the elements of the factors and draw the connection to an idea and bingo reflection (integration of various elements to factors to form/idea is a method/mode of integration/reflection). This is what we term stream of consciousness (eidatic memory) as ideas flit across either as instances or as forms (if we arise to platonists).

Anyways. It’s not hard to create this reflection in computers. Since decision logic at the point of ideation is imo consciousness.

This isnt a new theory. This is somewhat a merge between plotinus view of ideas as well as all systems (beings) have some level of consciousness (I term as physics and their reductionist interactions which qualify as a type of being performing integration of elements of factors pertaining to that entities being. E.g. a rock probably has chemical properties of dirt to attend to) and as Chalmers explained integrated information theory (not his theory) and phi.

An artificial neural network is merely an additive polynomial equation. So… I think consciousness can be mimicked with some type of adaptive regression equation modeling

I’m trying to do it right. I’m trying to do CV over a bootstrapped sample and I got it.

set.seed(256) preset_rng <- sample(nrow(MyData), replace=F) divisions=10 for (i in 1:divisions) { #iterator constant (for testing) #i = 1 #preset1 = train index #preset2 = text index final_end=length(preset_rng) unitsize=floor(final_end*.1) initial_start=1+((i-1)*unitsize) initial_start distance=final_end-initial_start #90% part_size=floor(final_end*.9) difference=(distance-part_size) if (distance>=part_size) { end_position=initial_start+part_size preset1=(c(preset_rng[initial_start:end_position])) } if (distance<part_size) { end_position=final_end left=part_size-(end_position-initial_start) left preset1=(c(preset_rng[initial_start:end_position],preset_rng[1:left])) } #setdiff gives the set difference #preset2 <- setdiff(1:nrow(MyData), preset1) preset2 = preset_rng[!preset_rng %in% c(preset1)] }

Made this paper for a Data Mining class

I did further work on the paper that you see here:

I dropped GDP for residential total construction seasonally adjusted and lagged ot by 7 or 8 months and lagged consumer confidence index by 3 months and achieved and r^2 of .99 and then I did the linear forecast of independent variables plugged it into a multiple regression algorithm of the past 6 months

and found something striking.

- Further discussion:
- Need to do co-variance testing on the dependent variables to ensure I’m not measuring co-linearity (someone from Reddit advised)
- A simple linear regression of the independent variable did do better than the neural network algorithm!
- Which means that my neural network isn’t really doing much, it’s performing worse than a simple linear regression.
- It would be one thing if the simple linear regression of the independent variable produced no predictive value (say less than 50%) and my ANN produced greater, but the ANN produced lower than a simple linear regression analysis would have yielded.

Takeaways

- Use different domains. I’d have to maybe measure the price of something other than housing that has a horribly linear regression prediction rate and see if I could beat it.

Excerpts from my reddit post

…Coded in c# so requires .net framework 4.5+

The neural network is a

c# rewrite of David Millers vimeo code (written by my classmate Ronald Swain),

Uses Marcus Cuda’s Fred API

Thank you to Jeff Heatons books on neural networks helped me when I made sense of David Millers code in my own c++ version (also on Github)

I got at least a 70% prediction rate. **If one prunes standard deviations that are over the 3rd Quartile (of the simulation runs, i.e. final.csv rows), one gets a 77% success rate of up/down movement since 2000**! edit: might not be very useful considering that this is across dates vs a specific date. Edit 2: **arbitrary check for stdev’s above .1 = 83% success rate!**)

Requires http://gnuwin32.sourceforge.net/packages/coreutils.htm; map your sysdm.cpl “path” variable to the /bin of the installed coreutils directory (sorry newb’s!) for the *paste* and *cut* [external] **gnu binaries** (i.e. exec’s that are used in the batch file)

Program: [updated at 12/14 7:33 PST, now predicts future prices beyond just 1 month!)

Use **RunMove.bat**

(Run.bat produces *prices* which are close, but are obviously way off in other aspects. So I decided to use a simpler up/down prediction method while keeping the old price prediction method in place.)

both batch files (*run vs runmove*) export to a file called **final.csv**

program accepts default values (i.e. just keep pushing enter until you get a prompt again), correction: you have to specify how many simulations you want to run (last input)

Lastly, the code for both files are on github:

https://github.com/thistleknot/FredAPI

&

https://github.com/thistleknot/CIS480Project

misc:

Sliding Window breakdown: http://imgur.com/gallery/nxhtgaQ/new

Sliding window code: http://pastebin.com/6hrczUkq

Here is where I’ll be discussing my ventures into Artificial Intelligence.

So far I have a working prototype of Feed Forward and BackPropagation thanks to Jeff Heaton’s book, Introduction to the Math of Neural Networks, as well as many other sources