Machine Learning is a Well Executed Scam
I’m a computer programmer that really enjoys what I do (Editor's Note: aka, a nerd).
I love creating solutions to problems that would normally be impossible for one person to solve without the power of code and a computer behind them.
For the past couple of years, I’ve done a deep dive on machine learning and I’ve learned one thing – in its current form it will never be anything very useful for anything other than the most basic of problem solving tasks.
What is machine learning? It seems hard for the general public to grasp, but machine learning is not as complicated as it seems.
Let’s say you want to predict the price of a house. You’d have some factors that would go into deciding that price such as square footage, number of bathrooms and bedrooms, the neighborhood where the house is located, etc. We also would have a list of homes that recently sold that includes the price it sold for, the number of bedrooms and bathrooms, the neighborhood, etc. The computer assigns a weight to each house feature, runs it through a function, and gets a price. If the price isn’t close, it tweaks the value assigned as a weight to each feature and runs it through again. When it finally gets close it then has the weights for each feature and now when you feed in information about a house you’ll get a price that’s fairly accurate.
This is essentially brute force learning.
After millions and millions of repetitions of changing weights, running those through a function, and comparing predicted prices to known prices you’ll have a decent predictive algorithm. On a simpler scale it’s like when you’re trying to find decent reception for a radio station. If I turn the dial one way does it make the station’s reception better? If yes, I’ll keep turning that way. If not, I know I need to turn the other way. I’ll repeat the process until I’ve found the optimal reception.
This process works for something easy like pricing a house, but when you’re dealing with more complicated subjects there are many areas where mistakes can be introduced and it becomes extremely likely your predictive function will be useless.
What if we don’t fully understand the factors that go into the question we’re trying to answer?
For example – a person wants to find an algorithm for the price of sports cars but doesn’t know that the number of cylinders in the engine can be an important factor and so doesn’t include that data point.
Now it doesn’t matter how much data you throw at it – you’re most likely not going to get a good predictive function.
Another area where it’s easy to introduce error is using bad data to train the function. Using the car example again if I have the wrong number of cylinders for a type of car I’ll never be able to get a very good predictive price if the data says a specific car has 6 cylinders when in reality it has 8. This sort of thing is actually extremely common in the data science world. When you have a database with millions and millions of records it’s just a fact that there are going to be values that are incorrect or even completely missing.
After I was comfortable with machine learning I tried to implement a real life project.
I setup a program that each day would download historical stock data in 10 second intervals for the past 60 trading days. Using that data I would calculate if you bought a call or put option on each 10 second interval what would be your profit or loss. I then used technical analysis functions to create values as data points. It would then train overnight on that data. Next day, using a live stream from Interactive Brokers it would create the technical analysis data points and then create a recommendation of if it should buy a call or put or just do nothing. Using my test account on Interactive Brokers it would then buy and sell the option depending on the recommendation.
Obviously, since I’m sitting here telling you about my project and not keeping my super secret money making project under wraps – it didn’t work that well.
The machine learning bot would win big on some days and lose big on others. I’d go compare the data points when it would buy an option and yep – I could see from past historical data that it had gone well before but for whatever reason it went the other way this time.
This shows that really complicated systems are extremely difficult to model and there’s no real “learning” going on there. It’s just brute repetitive force learning a pattern and in this case because it doesn’t have absolutely every piece of data – it simply can’t work. For example, did Trump say something that made the market swing that day? It couldn’t account for that.
But even if it did have absolutely data point you could throw at it including a comment by the former President – you’d still not be able to train your model because now you’ve exponentially increased your training time. You wouldn’t be able to have a model ready for the next trading day.
So what’s the solution?
I think quantum computing is going to be a game changer where we’ll eventually be able to train a model on massive amounts of data relatively quickly. Also, eventually we need plug-in modules for problems so we can work on really complicated problems using smaller, highly accurate models that work together to form a bigger solution.
Think of building a program you can talk to but rather than doing everything from scratch you combine together a voice recognition module and a language parser so that it understands when it hears “Let’s eat, Grandma” it knows that you didn’t mean you wanted to eat your grandmother.
Eventually, we’ll have something that really will be more akin to learning than the brute force process we have now.
In the meantime, be suspicious when you read how a company is going to revolutionize something by using machine learning. Unless the data team fully understands the subject, has all the necessary data, and all necessary examples to train on – they’re not actually going to get very far.