Artificial Intelligence and Cyber Security — Part 1
Tools, techniques, and procedures (TTP) used by cybercriminals today have far outpaced the traditional detection methods predicated on a defensive mindset and optimized around the organizational perimeter with layered defenses. With the cyber threat landscape shifting to Advanced Persistent Threats (APTs), Ransomware, and Insider Threats, the methods to prevent, detect, and respond have undergone a paradigm shift. Signature-based detection, scanning endpoints for malware, and maintaining good system hygiene is not enough to keep organizations safe from these high velocity and variety attacks. The acute shortage of highly skilled cybersecurity professionals is not helping either.
The other big change happening in the market is the dominance of Artificial Intelligence and machine learning in all aspects of the business affecting every industry from healthcare to manufacturing, retail to telecom, and banking among others. Machine learning models are helping in customer acquisition, customer management, content management, logistics, supply chain, portfolio management, risk management, there is virtually no business function untouched by AI. Having said that, Cyber Security naturally becomes one of the biggest use cases for applying AI not only for threat monitoring, detection, response, remediation but also for more traditional areas like compliance, privacy, and identity- access management.
But what are the business benefits of applying AI to cybersecurity? The answer in one word is ‘substantial’. AI can improve productivity, reduce cost, increase revenues, correlate millions of data points effortlessly to reduce risk, improve accuracy, and provide consistency in response.
The next question that comes to mind is — What type of machine learning is used for cybersecurity? The answer is simple — data is data, it doesn’t matter. AI doesn’t differentiate between weather, financial, medical, or cyber data. All generic AI types are equally applicable to cybersecurity data including
- Unsupervised Learning — Detecting anomalies, data labeling, identifying patterns of normal behavior, clustering, and communalizing
- Supervised Learning- Classification — Categorization of threats, sources, data classification, deep learning for image categorization
- Supervised Learning– Regression — Predicting threat priority, severity, data volumes
- Recommendation Systems — Association between events, recommendations, and advisory
- Natural Language Processing — building knowledge graphs, analyzing social media sentiments, policy and rules semantic analysis
- Reinforcement Learning — Risk and reward for a response, remediation action
Having seen the business benefits and AI types, we would like to know the specific cybersecurity use cases for applying machine learning/AI. These are quite a few ranging from Network traffic analysis, Malware analysis, Threat analysis to Consumer Web analytics (fraud), and User behavior analysis. Few of the key use cases are listed as under:
- Malicious Traffic Identification using a risk score of traffic flows
- Fraudulent and/or Risky user identification using behavioral analytics and user risk scores
- Identifying phishing websites using page ranking, community detection, image classification
- Identifying Botnet domains using domain risk score and reputation metrics
- Database attacks using abnormal activity detection
- Security intelligence consolidation and correlation using knowledge graphs
- Automated threat disposition by learning analyst behavior
- Threat categorization and attack phase determination by learning threat frameworks
- Data classification and crown jewels identification
- Account take over and fraud detection using reputation and risk scores
We shall be looking into details of each of the above use cases in the upcoming parts of this series.
While AI’s application to cybersecurity sounds like a ‘silver bullet’ or a ‘unicorn with a rainbow’, several issues and challenges are preventing a ‘perfect marriage’. For instance, data is high volume, dirty, and often not labeled. This poses a great challenge to apply any kind of supervised learning. Secondly, the output models are essentially ‘black boxes’ that provide a label prediction and confidence but with limited reasoning or interpretability. With limited explainability and probabilistic output, the recommendations are also hard to audit. Last but not the least, machine learning models, particularly the unsupervised ones (used for anomaly detection, pattern matching) are quite noisy and have a penchant for producing lots of false positives.
In conclusion, the application of AI/ Machine Learning to cybersecurity has already taken mainstage and there are several new startups in this domain. The next stage of this ‘cat and mouse’ game between organizations and adversaries will include adversarial AI to attack AI and non-AI systems including threat detection systems. Writing is on the wall, either the organizations embrace AI technologies in cybersecurity to defend against these attacks or perish in oblivion.