History will look back on our time as the beginning of the artificial intelligence revolution. Today, artificial intelligence is beating us at Go, translating and inventing languages, helping us decide what to buy, writing for us, and composing music. As you might expect, the endpoint security industry is benefiting greatly from AI — using it for everything, from detecting threats to unusual network activity.
However, sometimes the problem with a complex, new technology, apart from actually inventing and building it, is figuring out how to explain it to customers — how does it work and why is it valuable.
History will look back on our time as the beginning of the artificial intelligence revolution. Today, artificial intelligence is beating us at Go, translating and inventing languages, helping us decide what to buy, writing for us, and composing music. As you might expect, the endpoint security industry is benefiting greatly from AI — using it for everything, from detecting threats to unusual network activity.
However, sometimes the problem with a complex, new technology, apart from actually inventing and building it, is figuring out how to explain it to customers — how does it work and why is it valuable.
The secret sauce is not the algorithm; it is the data.
Everyone wants to say they use deep learning and neural networks. The specific machine learning algorithm used is irrelevant. The details of how the algorithm is written is less important than what the algorithm does — any machine learning algorithm is good if it produces useful, efficient models. The most important factor in making a useful model is the quality of the data. You can have machine learning without sophisticated algorithms, but not without good data. For malware, this means having a diverse and representative set of both malicious and benign files users are likely to experience.
Understanding the numbers
The three numbers are precision, recall, and accuracy. Precision is the model’s false positive rate. Recall tells you how much of the bad stuff the model detects. Accuracy blends precision and recall.
But, the numbers do not matter.
Now that you understand the numbers behind model performance and that performance is more important than the algorithm, it is time for one last curveball: the numbers do not matter either. The reason is simple: the model is ultimately only a single component of a larger system.
So what matters?
So how do you evaluate a security product holistically? You might think it is by measuring detection rates, but you probably don’t have a representative test set of recent, diverse, live, and relevant malware. Even if you had a good test set, malware is always changing, and you need to know if the product vendor responds quickly to new threats. In other words, you do not just want a product with great detection; you want a product with great detection that’s always improving and keeping up with a constantly evolving threat landscape.
In Summary
Don’t be distracted by talk about specific algorithms. While machine learning and AI is useful, it does not matter if a product uses a neural network, stochastic gradient descent, adaptive boosted random forests, or whatever. Instead, focus on what matters: threat response time. At SentinelOne, they are confident they can keep up with new threats that they offer a ransomware protection warranty which provides customers with financial support of $1,000 per endpoint, or up to $1 million per company.
* This article was originally posted on the SentinelOne blog and has been reposted and edited with permission.
Comments on Machine Learning – What Really Matters