Anton Chuvakin wrote a great blog about the future of machine-learning in cyber-security. Alex Vaystikh wrote a great commentary on Anton’s post. I followed Alex’s lead and posted my comment to his comment in this blog…
I’d argue though that “confidence” is not a politically-correct from of “correctness”, pardon the pun. “Correct” is a binary term, and “70% correctness” is just a shorthand for getting 70% of the answers right, and 30% of them wrong, which implies an absolute knowledge of what’s right and wrong, non-existent in the world of big, sparse, ugly data. “Confidence”, on the other hand, is model-oriented, and represents a measure of trust the analyst can put in the result of the computation, based on the quality of data, goodness of fit, and reliability of the source.
“Fast” is a function of confidence level desired, at least in the behavioral analytics field: higher confidence comes at a cost of speed, and in most practical applications is not worth it, considering typically poor quality of the data we deal with. Instead of sweating teraflops to improve confidence from 90% to 92%, I’d rather apply multi-dimensional validators, such as cross-domain correlation and peer group analysis, to amplify the signal and reduce false positives.
Which leaves us with “explainable”, and I cannot emphasize strong enough its importance. As I mentioned in my post on Machine Learning, Prescriptive Analytics is still in the distant future, and human analyst is going to be our ultimate customer for quite a while. Non-deterministic, non-supervised, and inexplainable algorithm is not going to gain much trust with the analyst, no matter how cool the algorithm is. Transparency, intuitiveness, and relevance to the analyst’s experience are the keys to successful product.