A Grey-Box Approach for Detecting Malicious User Interactions in Web Applications

This research leverage a graph-based representation of known attack scenarios (sequences of URLs from syslogs) to train an algorithm to classify user interactions as benign or malicious. Isomorphism in subgraph of the new interactions are verified to find logic flaws

Context

Moder applications implement a multi-tier model involving multiple application providers that interface together in order to supply a service. For example, an e-commerce may leverage on an authentication service and on a payment service, and a blog may leverage on a service to handle a comment or forum section.

when there is a flow that combine multiple services, more subtle vulnerabilities may emerge, such as logic workflow bypass
these vulnerabilities are mostly managed manually
logic vulnerability and BFT (business flow tampering) may occur in these scenario, exploiting the complicated flow of the application

Approach

A black-box testing approach offers good opportunities to detect flaws in a service workflow, but (i) active fuzzing techniques have a limited coverage, while (ii) passive learning techniques have usually a high false positive rate

white-box testing techniques instead can leverage the source code but (i) don’t take into account service workflows or the user interactions with the GUI (graphical user interface), (ii) suffer from high false negatives rates

The approach of the authors is in between:

we first propose a black-box approach that leverages a set of real world user interactions with the service, and further applies unsupervised clustering techniques such as co-clustering in order to infer a behavioral model for the service
This model is represented through a behavior graph where nodes represent elementary user requests, and edges represent logical transitions between requests occuring within the same session with the service
Then, our system leverages information about possible security flaws that may have been detected using standard white-box techniques
These flaws are represented as attack graphs where nodes capture possible vulnerabilities and edges capture attack paths through the target service
Our system uses the information in the attack graph in order to prune the behavior graph by eliminating from the model possible attacks that may have occurred during the learning phase
Therefore we proceed with an intermediary step where we automatically convert the attack graph into an intermediate event graph (see ESG (event sequence graph))
Finally, we prune the behavior model from possible attacks by applying standard graph matching and subgraph isomorphism to both behavior graph and the event graph

Components of the model

Behavior graph generator

a set of real world user interactions are processed. These are http requests, or access logs
unsupervised learning is applied to infer behavior classes
common pattern are associated with observable features that are further represented into a statistical model. This model captures all user behaviors when they connect to the service. The different actions are mapped as a set of URLs
the output of this module is a set of different user behaviors

Attack graph mediator

in order to separate benign and malicious behaviors, our system leverages a semi-supervised learning approach, using ground truth attack scenarios represented within an attack graph
this module “translate” the ground truth attack graph into an intermediate event graph that is usable by the next module

Behavior graph pruning This module applies sub-graph isomorphism to both the event graph and the user behavior graph. The purpose of this module is:

identifying and isolating malicious behaviors and attacks that may have occurred during the learning phase, and that are similar to known attack scenarios represented in the previous event graph
identifying all benign user interactions with the service, and associates them with different behavior classes
unknown pattern are classified as new possible attack behavior

User interactions as a sequence of URLs

URLs are used as a way to represent user behaviors

Testing the model

a case study was presented. The authors collaborated with a service provider and collected data over a period of six weeks, including 33.7 GB of data in syslog format that consists of 1.3 million active user sessions
each log includes: timestamp, anonymized user identity and remote IP address, URL of the service
the service provider informed the authors of certain malicious patterns (user interaction) that they knew could be exploited. This was the output of a risk analysis. These malicious interactions could lead to CSRF (cross-site request forgery), cookie theft, identity theft
this information were then mapped into a set of malicious attack graph paths, used then in the attack graph mediator (see attack examples)
the new interactions were therefore classified into benign or malicious, given the ground truth

Attack examples

Video Game Purchase Fraud

an attacker proceeds first by executing a cookie hijack and then impersonates the identity of another victim user
after connecting to the service, the attacker is able to edit the user info, and then the attacker changes his email address
Later, the attacker commands a game, and receives the confirmation payment. By exploiting the vulnerability of the service, the attacker will let the victim user be charged for the game, and delivered to the new email address

phishing with IP spoofing

The attacker connects to the WiFi of the victim user, and behaves as the legitimate user (IP spoofing)
Then he proceeds with an email phishing exploit
In this scenario, the attacker connects to the service, may access all the account details for the victim user, and may then send as many phishing emails as needed on behalf of the victim user account

Media access without paying

The attacker connects to the media service, and he exploits a vulnerability in the application that allows him to access from a media content to another without access to the billing service

Results

32 classes of real user behavior were identified. The most interesting ones includes:

Class 1: This class behavior represents user interactions when connecting to the billing service
Class 2: This class behavior captures the workflow for the application that provides access to users’ calling history and their internet consumption
Class 3: This class behavior represents the way users access to the email service, including their ability to send any number of emails (counter)
Class 4: This class behavior depicts the way users access to the media service, that allows them to pay when accessing contents
Class 5: This class behavior represents the flow used to buy a new phone

The authors were able to identify the malicious interactions, and even new malicious behavior, by recognizing same or similar behavioral patterns that could lead to the same outcomes (isomorphic sub graphs)

References

(Ben Jaballah, Kheir, et al., 2016)

GUI testing wiki

Explorer

A Grey-Box Approach for Detecting Malicious User Interactions in Web Applications

Context

Approach

Components of the model

Testing the model

Attack examples

Results

References

Graph View

Table of Contents

Backlinks