Extending The Value of Security Testing by Adopting Variant Analysis

2020-08-26

CodeQL

Continuous Security Testing and Reduction in Security Debt

Security Testing is a key activity in a security program. Given the business requirements for continuous delivery, security testing is required to be repetitive and continuous with specific metrics identifying success of the initiative. Organizations with external or internal teams for application security testing usually have a vulnerability management process that allows them to respond to identified vulnerabilities.

While this is a reactive approach, it is still needed as automated tools are not matured enough to be as effective as a professional security analyst with some experience and context (IMHO). A security issue identified during testing often re-appear in a different feature or code path. Coverage is non-trivial to measure during a blackbox or graybox application security testing.

Given this situation, it is difficult to measure the value of security testing. Many a times, security testing causes friction or delay in engineering processes due to blocker issues discovered late in the release cycle.

Engineering teams can significantly increase the value of application security testing if they can build a system that ensures

All instances of a given vulnerability is identified and fixed
Appropriate security gates are implemented in SDLC to prevent re-introducing similar vulnerabilities

This solves a key problem for engineering teams, to ensure they don’t just fix vulnerabilities, but fix a class of vulnerabilities in their product, thus continuously reducing security debt by using security testing output effectively.

While [2] is beyond the scope of this article, I would personally look at Gitlab CI Application Security Capabilities for quick wins in this regard.

In this post, I will focus on how to identify all instances of a vulnerability using variant analysis

What is Variant Analysis

Variant analysis is the process of using a known vulnerability as a seed to find similar problems in your code. Security engineers typically perform variant analysis to identify similar possible vulnerabilities and to ensure these threats are properly fixed across multiple codebases.

https://semmle.com/variant-analysis

For example, consider this

You have a web application exposed to your external users
Security team reports an authorization issue where a malicious user can manipulate a URL such as https://app.example.com/projects/:project_id/issues to read resources outside its access scope

This issue can be fixed by ensuring appropriate authorization controls. A simple mitigation example can be to ensure that user scoped queries are used while looking up a resource from database

SELECT * FROM projects WHERE project_id = :project_id AND user_id = :user_id

Adopting variant analysis for this example will require

Model the security issue in a form that can be applied to a program representation. Example: Control Flow Graph
Scan product codebase(s) for all instances of the same security issue using the model created in [1]
Ensure your security issue model is added in a repository and used for continuous scanning as part of CI security pipeline

The benefits will be

Every iteration of the above exercise will enrich the security models repository for the product
Effective discovery of security issues with very high degree of context for the product as part of SDLC
Security testing exercise not just finds issue but enriches this repository of valuable security issue models for the product
Ship releases that are free from past issue variants with a fairly high degree of confidence
Prevent new code from re-introducing variants of past vulnerabilities

Variant Analysis with CodeQL

By automating variant analysis, CodeQL enables product security teams to find zero-days and variants of critical vulnerabilities.

https://semmle.com/codeql

CodeQL is available for free from Github

Get started with CodeQL
I really recommend you play the game
I use CodeQL locally with Visual Studio Code
CI automation can be setup using CodeQL CLI

How does it looks like

import java

from Callable call
where call.getName() = "isEmpty"
select call.getAReference()

Example CodeQL query to find all references to a call to isEmpty method in Java code base

[Example] Using CodeQL with Damn Vulnerable Java Application (DVJA)

We will use Damn Vulnerable Java Application to demonstrate use of variant analysis using CodeQL. There is an Insecure Direct Object Reference vulnerability in DVJA which is documented in detail as reference.

The issue occurs in the following line of code where the controller retrieves a User record from database using user supplied input without validation.

user = userService.find(getUserId());

https://github.com/abhisek/dvja/blob/597ece1ab79ffffea7289d49b0c443bb2ebcbd16/src/main/java/com/appsecco/dvja/controllers/UserAction.java#L88

The find method in UserService is defined as below

public User find(int id) {
    return entityManager.find(User.class, id);
}

https://github.com/abhisek/dvja/blob/597ece1ab79ffffea7289d49b0c443bb2ebcbd16/src/main/java/com/appsecco/dvja/services/UserService.java#L42

While DVJA is a very small code base and all instances of this issue can be identified trivially with grep or manual code review, this will not be a case with sufficiently large code base with complex code paths.

Defining a Model for Generic Detection

We can apply Taint Propagation to model this condition in a fairly generic way but for our product context

Any class which is a child of ActionSupport is an action class (Controller)
Any variable in an action class is a potential user supplied value set by Struts2 framework
Any user supplied value reaching the 2nd parameter of entityManager.find without validation is a potential security vulnerability

I wrote a fairly verbose CodeQL query to model the above constraints which you can read and test on LGTM. It produces expected results

Variant Anlaysis DVJA Results

Using this query, we can find a new variant of the same vulnerability (IDOR) in the ProductAction class well.

I will breakdown the query here to reduce complexity during explanation. CodeQL queries can be a bit complex to start with, however it becomes easier to read and write as we become familiar with program structure, control flow and CodeQL APIs.

CodeQL Query Explanation

We will do the following to write a CodeQL query to implement our model described earlier in this post

Define CodeQL class for source of user supplied input
Define CodeQL class for destination of insecure database query
Define taint propagation configuration to find code paths between [1] and [2]

Source

Start by defining some base classes (CodeQL) using which I will define the source of user supplied input as per our model

class ActionSupport extends ClassOrInterface {
  ActionSupport() {
    exists(ClassOrInterface i | i.hasName("ActionSupport") and this = i)
  }
}

class BaseController extends ClassOrInterface {
  BaseController() {
    exists(ClassOrInterface i, ActionSupport a |
      i.hasName("BaseController") and
      i.hasSupertype(a) and
      this = i
    )
  }
}

Next we will define the input class that matches our constraint for a potential source of user supplied input

class ControllerClassUserInput extends Variable {
  ControllerClassUserInput() {
    exists(Variable v, Class c, BaseController a|
      c.contains(v) and
      c.hasSupertype(a) and
      this = v
    )
  }
}

Sink

Now we will define the classes that models our vulnerable code i.e. a call to entityManager.find(...) with user supplied parameter

class EntityManager extends ClassOrInterface {
  EntityManager() {
    exists(ClassOrInterface i | i.hasName("EntityManager") and this = i )
  }
}

class EntityManagerCall extends Callable {
  EntityManagerCall() {
    exists(Callable c |
      c.hasName("find") and c.getDeclaringType() instanceof EntityManager and this = c
    )
  }
}

Taint Propagation Configuration

Finally we will define the taint tracking configuration that will find all possible code paths from source to sink

class Config extends TaintTracking::Configuration {
  Config() {
    this = "EntityManagerFindIDOR"
  }

  override predicate isSource(DataFlow::Node source) {
    exists(ControllerClassUserInput ci |
      source.asExpr() = ci.getAnAccess()
    )
  }

  override predicate isSink(DataFlow::Node sink) {
    exists(EntityManagerCall fn |
      fn.getAReference().getAnArgument() = sink.asExpr()
    )
  }
}

Result

The full query is available here for testing

https://lgtm.com/query/3584843823142271495/

The query can be further improved by using Guards to model validation logic which will avoid false positives.

Thanks for reading, I can be reached on Twitter for queries or flame :) If you work in a product security team, I would love to know if you think this approach is feasible for you.