Monthly ArchiveOctober 2018

JASP Dashboards

Matt Konda No Comments

JASP is a platform for security automation.  We currently focus on monitoring AWS environments for potential security issues.

Throughout September and October, we have been refining JASP dashboards.  The goal is to give user’s the simplest possible summary view of how they are doing.  We wanted to help convey a sense of how a user’s environment stacks up.  We found that people want to know not just what their issues are but how they are doing … relative to other companies and relative to where they should be.  To do that, we did some number crunching and analysis of typical security issues we find.

To implement this, we drew a bit from SSL Labs Grades and Github contributions.

The goal was to show:

  • Roughly how you are doing in simple terms.  (Nobody wants a “D”!)
  • How we calculated your grade and some history so you can see improvement or quality sliding right there in the dashboard
  • Our “The One Thing™” idea – the one thing you should go fix today.

Below is an example from one of the intentionally insecure environments we test with.  Note that the services are clickable and drill through to the specific security issues that are causing the grade for the given service.

JASP users should be able to see their dashboards now at:  We are adding notifications now to alert users if their grade changes and to ensure that we communicate what their grades are periodically.

Live Coding a Glue Task at AppSecUSA – Video

Matt Konda No Comments

Here is the video from the Glue and live coding talk at AppSecUSA.

Live Coding a New Glue Task at AppSecUSA

Matt Konda No Comments

At AppSecUSA, OWASP Glue, a project we contribute heavily to, was asked to present in the project showcase.  I put together an overview talk about how the tool is architected and what we use it for.  Then, we added a new task live during the talk.  I thought that was enough fun that it was worth blogging about.  I would like to start by acknowledging Omer Levi Hevroni, of Soluto, who has also contributed and continued support for Glue over the past year +.

Why Glue?

We started working on Glue because we found that most security tools don’t integrate all that cleanly with CI/CD or other developer oriented systems.  Our goal is always to get the checking as close to the time of code authorship as possible.  Further, we thought that having it wrap open source tools would make them more accessible.  Philosophically, I’m not that excited about tools (see earlier post) but we should definitely be using them as much as we can provided they don’t get in our way.  Ultimately, we wrote Glue to make it easy to run a few security tools and have the output dump into Jira or a CSV.

Glue Architecture

So we defined Glue with the following core concepts:

  • Mounters – these make the code available.  Could be pulling from GitHub or opening a docker image.
  • Tasks:  Files – these analyze files.  Examples are ClamAV and hashdeep.
  • Tasks:  Code – these are some level of code analysis.  Some are first class static tools, others are searching for dependencies or secrets.
  • Tasks:  Live – these are running some kind of live tool like Zap against a system.
  • Filters – filters take the list of findings and eliminate some based on whatever the filter criteria are.  An example is limiting ZAP X-Frame-Options findings to 1 per scan instead of 1 per page.
  • Reporters – reporters take the list of findings and push them wherever you want them to end up.  JIRA, TeamCity, Pivotal Tracker, CSV, JSON, etc.

Live Coding

OK, so a common task is to say

“Hey, this is cool but I want to do this with a tool that Glue doesn’t yet support.”

Great.  That should be easy.  Let’s walk through the steps.  We’ll do this with a python dependency checker:  safety.

Set Up the Task

A good way to start a new task is to copy an existing one.  In this case, we’re going to do a python based task so let’s copy the bandit task and alter to match:

require 'glue/tasks/base_task'  # We need these libraries.  The base_task gives us a report method.
require 'json'
require 'glue/util'
require 'pathname'

# This was written live during AppSecUSA 2018.
class Glue::SafetyCheck < Glue::BaseTask  # Extend the base task.

Glue::Tasks.add self  # Glue is dynamic and discovers tasks based on a static list.  This adds this task to the list.

def initialize(trigger, tracker)
  super(trigger, tracker)
  @name="SafetyCheck"                             # This is the name of the check.  -t safetycheck (lowered)
  @description="Source analysis for Python"
  @stage= :code                                   # Stage indicates when the check should run.
  @labels<<"code"<<"python"                       # The labels allow a user to run python related tasks.  -l python

Now we have the start of a class that defines our SafetyCheck task.

Run the Tool :  Safety Check

Here we just need to implement the run method and tell the task how to call the tool we want to run.  In this case, we want it to create json.  The resulting code is:

def run
  rootpath = @trigger.path
  @result=runsystem(true, "safety", "check", "--json", "-r", "#{rootpath}/requirements.txt")

The @trigger is set when the Task is initialized (see above) and includes information about where the code is.  We use that to know where the path that we want to analyze is.  Then we use one of the provided util methods runsystem to invoke safety check with the parameters we want.

Note that we are putting the result in the @result instance variable.

Parse the Results

Once the tool runs, we have the output in our @result variable.  So we can look at it and parse out the JSON as follows:

 def analyze
    puts @result
    results = clean_result(@result)
      parsed = JSON.parse(results)
      parsed.each do |item|  
        source = { :scanner => @name, :file => "#{item[0]} #{item[2]} from #{@trigger.path}/requirements.txt", :line => nil, :code => nil }
        report "Library #{item[0]} has known vulnerabilities.", item[3], source, severity("medium"), fingerprint(item[3]) 
    rescue Exception => e
      Glue.warn e.message
      Glue.warn e.backtrace
      Glue.warn "Raw result: #{@result}"

Here, we call a clean_result method on the result first.  You can look here for detail, but it is just pulling the warnings that the tool emits that make the output not valid JSON.  We do this to make sure the JSON is parseable for our output.  This is a common issue with open source tools.  I don’t underestimate the value of making these things just work.

The magic here is really in the two lines that set the source and then report it.  The source in Glue terms is the thing that found the issue and where it found it.  In this case, we’re setting it to be our Task (SafetyCheck) and the library name in the output from file containing the dependencies. (requirements.txt)

The report method is supplied by the parent BaseTask class and takes as arguments:  description, detail, source, severity and fingerprint.

def report description, detail, source, severity, fingerprint
    finding = @trigger.appname, description, detail, source, severity, fingerprint, )
    @findings << finding

You can see if you look closely that we set all of the severities to Medium here because safety check doesn’t give us better information.  We also use the supplied fingerprint method to make sure that we know if we have a duplicate.  You can also see that the result of calling report is that we have a new finding and the finding is added to the array of findings that were created by this task.  We get the trigger name, the check name, a timestamp, etc. just by using the common Finding and BaseTask classes.

Making Sure the Tool is Available

In addition to run and analyze the other method we expect to have in our Task is a method called supported?.  The purpose of this method is to check that the tool is available.  Here’s the implementation we came up with for safety check, which doesn’t have a version call from the CLI.

 def supported?
    supported=runsystem(true, "safety", "check", "--help")
    if supported =~ /command not found/
      Glue.notify "Install python and pip."
      Glue.notify "Run: pip install safety"
      Glue.notify "See:"
      return false
      return true

The idea here is to run the tool in a way that tests if it is available and alerts the user if it is not.  Graceful degredation as it were…

The Meta

Here is the code from the Tasks ruby file that runs all the tasks.

if task.stage == stage and task.supported?
  if task.labels.intersect? tracker.options[:labels] or      # Only run tasks with labels
    ( run_tasks and run_tasks.include? task_name.downcase )  # or that are explicitly requested.

    Glue.notify "#{stage} - #{task_name} - #{task.labels}"

Here you see the supported?, run and analyze methods getting called.  You also see the labels and tasks being applied.  Its not magic, but it might look weird when you first jump in.


We wanted to create a new task.  We did it in about 20 minutes in a talk.  Of course, I wrote the code around it so it was easy.  But it does illustrate how simple it is to add a new task to Glue.

If you want to see this check in action, you can run something like:

bin/glue -t safetycheck

We’re getting the Docker image updated, but once available you can run:

docker run owasp/glue -t safetycheck

We hope people can digest and understand Glue.  We think it is an easy way to get a lot of automation running quickly.  We even run a cluster of Kubernetes nodes doing analysis.  This in addition to integrating into Jenkins builds or git hooks.  Happy Open Source coding!!!

How it Works: TOTP Based MFA

Aaron Bedra No Comments


Multi-Factor Authentication has become a requirement for any application that values security. In fact, it has become a regulatory requirement in some industries and is being adopted as a requirement in several others. We often discover misconceptions or downright misunderstandings about how MFA works and think this is a topic worth diving into. This particular article will focus on one of the most common second factors, Time based One Time Password, or TOTP.

TOTP authentication uses a combination of a secret and the current time to derive a predictable multi-digit value. The secret is shared between the issuer and the user in order to compare generated values to determine if the user in fact posses the required secret. You may have heard this incorrectly referred to as “Google Authenticator”. While Google had a major part in popularizing this method, it has nothing to do with how TOTP actually works. Any site may create and issue tokens and any mobile application with a correct implementation of TOTP generation may produce a one time value. In this article we will implement server side TOTP token issuing and discuss its security requirements.

To read more about TOTP token generation, please take a look at RFC 6238.

The example code in this article is written in Java. This task can be accomplished in any programming language that supports the underlying cryptographic functions.

Establishing a Seed

The foundation for the security of a TOTP token begins with the seed. This value is used in conjunction with the current time to derive the instance of the token. Because time can be calculated it is not suitable as the only value for our token. Choosing a seed is incredibly important and should not be left up to the user. Seeds should be randomly generated using a Cryptographically Secure Pseudo Random Number Generator. You can determine the number of bytes you want to use, and in this example we are using 64. We will use the SecureRandom implementation provided by the Java language.

static String generateSeed() {
    SecureRandom random = new SecureRandom();
    byte[] randomBytes = new byte[SEED_LENGTH_IN_BYTES];
    return printHexBinary(randomBytes);

In order to cosume the token we will return the hex representation of the bytes generated. This allows us to pass the value around a bit easier.

Establishing a counter

The other side of TOTP token generation relies on the current time. We take the current time represented as a long, which is the number of seconds since epoch. This can be derived using System.currentTimeMillis() / 1000L. Next, we take the value and divide it by our period, or the number of seconds the token will be valid before rotating. We will use a value of 30 in our example, which is the recommended setting. Finally, we need to put the value into a byte array. There are a few ways to do this, but the following method is on the conservative side accounting for non 64 bit longs and possible endian differences.

private static byte[] counterToBytes(long time) {
    long counter = time / PERIOD;
    buffer = new byte[Long.SIZE / Byte.SIZE];
    for (int i = 7; i >= 0; i--) {
        buffer[i] = (byte)(counter & 0xff);
        counter = counter >> 8;
    return buffer;

Once we have this value we can execute our hmac operation to produce the long form of our OTP value.

Generating a Value

Using the seed as a key and the counter as a message, we will derive our long form OTP value. The value returned from our HMac operation will be truncated in order to produce the 6 digit value we will compare against in the end. The following code is an HMacSHA1 operation using the standard Java encryption libraries. While RFC 6238 describes the possible options of HMacSHA256 and HMacSHA512, they are not viable when distributing the secret for use on most mobile authenticator applications.

private static byte[] hash(final byte[] key, final byte[] message) {
    try {
        Mac hmac = Mac.getInstance("HmacSHA1");
        SecretKeySpec keySpec = new SecretKeySpec(key, "RAW");
        return hmac.doFinal(message);
    } catch (NoSuchAlgorithmException | InvalidKeyException e) {
        log.error(e.getMessage(), e);
        return null;

With the ability to hash our seed and message we can now derive our TOTP value:

static String generateInstance(final String seed, final byte[] counter) {
    byte[] key = hexToBytes(seed);
    byte[] result = hash(key, counter);

    if (result == null) {
        throw new RuntimeException("Could not produce OTP value");

    int offset = result[result.length - 1] & 0xf;
    int binary = ((result[offset]     & 0x7f) << 24) |
                 ((result[offset + 1] & 0xff) << 16) |
                 ((result[offset + 2] & 0xff) << 8)  |
                 ((result[offset + 3] & 0xff));

    StringBuilder code = new StringBuilder(Integer.toString(binary % POWER));
    while (code.length() < DIGITS) code.insert(0, "0"); 
    return code.toString(); 

Using the seed as the key and the counter as the message, we perform the necessary conversions, perform the hash, and truncate the message according to the specification. Finally, we divide the result by 10 to the power of the expected digits (in our case 6) and convert that to a string value. If the result is less than 6 digits we pad it with zeros.

Providing the Secret to the User

In order to provide the secret to the user, we need to provide a consistent string in a format that allows the user to generate tokens reliably. There are several ways to do this, including simply providing the secret and issuer to the user directly. The most common method is by providing a QR code that contains the information. For this example we will use the Zebra Crossing library.

class QrCode {
    private static final int WIDTH = 350;
    private static final int HEIGHT = 350;

    static void generate(String applicationName, String issuer, String path, String secret) {
        try {
            String qrdata = String.format("otpauth://totp/%s?secret=%s&issuer=%s", applicationName, secret, issuer);
            generateQRCodeImage(qrdata, path);
        } catch (WriterException | IOException e) {
            System.out.println("Could not generate QR Code: " + e.getMessage());

    private static void generateQRCodeImage(String text, String filePath) throws WriterException, IOException {
        QRCodeWriter qrCodeWriter = new QRCodeWriter();
        BitMatrix bitMatrix = qrCodeWriter.encode(text, BarcodeFormat.QR_CODE, WIDTH, HEIGHT);
        Path path = FileSystems.getDefault().getPath(filePath);
        MatrixToImageWriter.writeToPath(bitMatrix, "PNG", path);

Executing this code will save a PNG with the corresponding QR code. In a real world situation, you would render this image directly to the user for import by their application of choice.

Consuming the Token

In order to test our implementation we will need a program that can accept an otpauth:// string or a QR Code. This can be done a number of ways. If you want to do this via a mobile device, you can use Google Authenticator or Authy. Both of these programs will scan a QR code. If you want to try locally, 1Password provides a way to import a QR image by adding a label to a login and selecting One-Time Password as the type. You can import the created image using the QR code icon and selecting the path to the generated image. Once the secret is imported it will start producing values. You can use these as your entry values into the example program.

Protecting the Seed

It is important to respect the secret for what it is — a secret. With any secret we must do our part to protect it from misuse. How do we do this? Like any other piece of persisted sensitive information, we encrypt it. Because these secrets are not large in size, we have a number of options at our disposal. The important part is not to manage this step on your own. Take advantage of a system that can encrypt and decrypt for you and just worry about storing the encrypted secret value. There are cloud based tools like Amazon KMS, Google Cloud Key Management, and Azure Key Vault as well as services you can run like Thycotic Secret Server, CyberArk Conjur, and Hashicorp Vault. All of these options require some kind of setup, and some are commercial products.

Setting up Secret Storage

To keep this example both relevant and free of cost to run, we will use Hashicorp Vault. Vault is an open source project with an optional enterprise offering. It’s a wonderful project with capabilities far past this example. There are a number of ways to install Vault, but since it is a single binary, the easiest way is to download the binary and run it. Start vault with the development flag:

λ vault server -dev

During the boot sequence you will be presented with an unseal key and a root token. The example program will expect the VAULT_TOKEN environment variable to be set to the root token provided. Your output will be similar to the following:

λ vault server -dev ==> Vault server configuration:

Api Address:
Cgo: disabled
Cluster Address:
Listener 1: tcp (addr: "", cluster address: "", max_request_duration: "1m30s", max_request_size: "33554432", tls: "disabled")
Log Level: (not set)
Mlock: supported: false, enabled: false
Storage: inmem
Version: Vault v0.11.2
Version Sha: 2b1a4304374712953ff606c6a925bbe90a4e85dd

WARNING! dev mode is enabled! In this mode, Vault runs entirely in-memory
and starts unsealed with a single unseal key. The root token is already
authenticated to the CLI, so you can immediately begin using Vault.

You may need to set the following environment variable:


The unseal key and root token are displayed below in case you want to
seal/unseal the Vault or re-authenticate.

Unseal Key: mkniY94IlJngQz07gfPZlQnZnvEHMXWQ3/MiFegsfr8=
Root Token: 4uYnD1vVZZcNkbYe03t0cLkh

Development mode should NOT be used in production installations!

Once Vault is booted you will need to enable the Transit Backend. This allows us to create an encryption key inside of Vault and seamlessly encrypt and decrypt information.

λ export VAULT_ADDR=
λ vault secrets enable transit
Success! Enabled the transit secrets engine at: transit/

λ vault write -f transit/keys/how_it_works_totp
Success! Data written to: transit/keys/how_it_works_totp

λ echo "my secret" | base64

λ vault write transit/encrypt/myapp plaintext=Im15IHNlY3JldCIgDQo=
Key Value
--- -----
ciphertext vault:v1:/HeILzBTv+JbxdaYeKLVB9RVH9o/b+Lilrja88VhCuaSSlvUY+IzHp2Uλ vault write transit/encrypt/myapp plaintext=Im15IHNlY3JldCIgDQo=

λ vault write -field=plaintext transit/decrypt/myapp ciphertext=vault:v1:/HeILzBTv+JbxdaYeKLVB9RVH9o/b+Lilrja88VhCuaSSlvUY+IzHp2U | base64 -d
"my secret"

We can now encrypt and decrypt our TOTP secrets. The only thing left is to persist those secrets so that they can be referenced on login. For this example we will not be creating a complete user system, but we will setup a database and create an entry with an encrypted seed value to show what the end to end process will resemble.

To encrypt and decrypt our seed we can use a Vault library. The following example demonstrates the essential pieces:

String encryptSeed(String seed) throws VaultException {
    final Map<String, Object> entry = new HashMap<>();
    entry.put("plaintext", seed);
    final LogicalResponse response = client.logical().write("transit/encrypt/myapp", entry);
    return response.getData().get("ciphertext");

String decryptSeed(String ciphertext) throws VaultException {
    final Map<String, Object> entry = new HashMap<>();
    entry.put("ciphertext", ciphertext);
    final LogicalResponse response = client.logical().write("transit/decrypt/myapp", entry);
    return response.getData().get("plaintext");

Note the lack of error handling. In a production system you would want to handle the negative and null cases appropriately.

Finally, we take the output of the vault encryption operation and store it in our database. The sample code contains database handling logic, but it is typical boilerplate database code an not directly relevant to explaining the design of a TOTP system.


By now it should be pretty obvious that time synchronization is of the utmost importance. If the server and client differ more than the period, the token comparison will fail. The RFC describes methods for determining drift and tolerance for devices that have drifted for too many periods. This example does not address drift or resynchronization, but it is recommended that a production implementation address this issue.

Running the Example

The source code for this example is available on Github. Make sure you have read an executed all of the steps above before attempting to run the example. Additionally, you will need to have PostgreSQL installed and running. If this is your first time running the example, you will need to be sure to import the generated token using your preferred application before attempting to type in a value. You should have your MFA token generator application open and the test token selected. You can setup and execute the program by running:

createdb totp
mvn flyway:migrate
mvn compile
# For Unix users
# For Windows users
mvn -q exec:java -Dexec.mainClass=com.jemurai.howitworks.totp.Main

You will be prompted to enter your token value. After pressing return the program will echo the value you entered, the expected token value, and if the values match. This is the core logic necessary to confirm a TOTP based MFA authentication sequence. If your token values do not match, make sure to enter your token value with plenty of time to spare on the countdown. Because we have not implemented a solution that accounts for drift, the value must be entered during the same period the server generates the expected value. If this is your first time running the example you will need to import the QR code that was generated before the input prompt. If everything was done correctly you will see output similar to the following:

MFA Token:
Entered: 808973 : Generated: 808973 : Match: true

At this point you have successfully implemented server side TOTP based MFA and used a client side token generator to validate the implementation.

Security Pitfalls of TOTP

For a long time TOTP or really, just OTP based MFA was the best option. It was popularized by RSA long before smart phones were capable of generating tokens. This method is fundamentally secure but is open to human error. Well crafted phishing attacks can obtain and replay TOTP based MFA responses. Several years ago FIDO and U2F were introduced and this is now the “most secure” option available. It comes with benefits and drawbacks and like all solutions should be carefully considered before use.


Multi-Factor Authentication is an important part of the security of your information systems. Any system providing access to sensitive information should employ the use of MFA to protect that information. With credential theft via phishing on the rise, this could be one of the most important controls you establish. While this article examines an implementation of TOTP token based MFA, you should seek an established provider like Okta, OneLogin, Duo, Auth0, etc. to provide a production ready solution.

Technology and Security: AI, Cloud, IoT

Matt Konda No Comments

So … someone asked me the following question, so I figured I’d put my answer in a blog post.

In what ways are evolving technology like cloud, AI, IoT affecting the cybersecurity landscape? What kind of cybersecurity threats and risks can they bring to the enterprise?

As technology moves forward, it has huge implications on security.

We talk about AI but what that really is behind the scenes is vast amounts of data.  The security implications of the data are significant.  We are already training AI to create sentencing systems that are unfair based on the training data.  The training data reflect our biases, and then in fact, reinforce them.  AI isn’t just in the background either, it is helping to land planes, drive cars, identify faces in video, identify security events and a million other things.  Also, many great AI systems now have gaps in explainability – meaning we don’t even know how they know what they are telling us.

As a user of AI, I would be very concerned about the integrity of the training data.  Many people believe that we can subvert AI / ML algorithms by feeding them malicious data.  Of course, those same systems are still often processing data that comes from users, so as we look at software, we have the problem of separating control from data.  The scope and nature of the data sets (often photos, video, etc.) pose new challenges as well.  In many cases, security is an afterthought, with access to the data bolted on for users but not proactively designed in.

In terms of the cloud, many things in the cloud give us significant improvements in security.  It is easy to pay a little more to have an HSM, a WAF, encrypted data, a key management system, centralized log management, etc.  That does, however, force us into the major clouds (AWS, Azure, GCP) where these items are offered.  Note that the complexity of the cloud is a major concern.  We recommend that our clients use the “infrastructure as code” model, and use tools like Terraform or CloudFormation to provision systems.  That will allow them to audit, track (and often roll back) changes as needed.

Many organizations embraced cloud as a way to reduce friction, which is another way of saying bending the rules.  Without oversight, teams may be creating large environments that don’t follow proper security practices.  We routinely find services exposed to the internet that companies weren’t aware of.  We see poor password management policies, unencrypted data and lots more.  For CIO’s, a big takeaway is to actively manage the cloud.  We actually build a product related to cloud security

A great mini case study is AWS Macie.  It reads S3 buckets and categorizes data, then alerts the data owner about what types of data there is and where it moves.  Seems pretty awesome and powerful.  But if I can read the index, now I know a lot about what data you have and I know it much faster than I did before.

IoT is even more of a problem.  Often IoT applies to devices that are hard to manage and weren’t built to be updated.  So we see lots of potential vulnerabilities in the IoT landscape.  There is also an explosion in the number of devices and a they are connected in much less structured ways – often via relatively open home internet.  IoT also comes with custom operating systems, which are often unpatched and ill suited for long term security.

As a takeaway, as more information is distributed to more places, of course this brings escalating privacy concerns.  Our position on this is generally to have clear classification schemes that can be applied to data in a consistent way across these different scenarios.  And then to use AI in the cloud to find the data.  (wink wink)



Facebook Access Tokens Were An AppSec Issue That Tools Won’t Catch

Matt Konda No Comments

Yesterday I was doing training for a team of architects, project managers and developers at a client and I realized that a deeper discussion of the issue at Facebook might be instructive.

In looking at the issue in more detail, I came to understand that it makes a perfect learning example for why application security training and team communication around security is so important.

In particular I want to look at:

  • What were the issues?
  • How should they have been prevented?
  • How tools could not have helped to identify these issues.


Before I go through this though, let me stop and say this:  Facebook may have a great program.  They are likely just as vulnerable as anyone else.  This is not a critique of their process or response.  I am not aware of any of the back story or processes used (or not used) in this case.  The goal of this post is to talk about the learning points we can take away.

The Gory Details

Part 1:  Facebook describes the first problem as an error related to the View As function.  Essentially, the View As feature allows a user to see what a post will look like to another user.  In that view, a link to upload a video was inadvertently included in the preview.

A full discussion of this issue with a villain hat might have revealed that the View As page should not still have an upload link.  This doesn’t feel like a big deal, but it is a classic example of developers and product owners and stakeholders not communicating adequately about requirements or security requirements.  I would guess a pen test would gloss right over this because even the security expert might not be able to assert that the link shouldn’t be there.

Part 2:  The application then generated an access token that was used during that process.  The access token was scoped to the mobile application.   This seems again like an obvious potential issue.  Someone could have or should have asked, is the scope for this access token correct?  This probably is a security question.

Part 3:  The access token was scoped to the wrong user, the “View As” user.  This again is an issue that if I said it out loud, even a product manager could have told us that it was wrong.  So this looks like an example where just having the conversation or making security visible would have substantially helped to eliminate the security issue.

Tools and How To Find The Issue

As a point of note:  none of these items could be found by static analysis, dynamic analysis, RASP/IAST, etc.

Ways to find this issue include:

  • Code Review
  • Security Requirements
  • Security Unit Tests

From my perspective, a take away is that we should be peer reviewing any code that includes access tokens.  It also suggests that training would be helpful.


Security Update

Facebook Login Update

Validating Search Engine Indexers

Aaron Bedra No Comments


Not all bots are created equal. Some bots are good, some bots are bad, and some bots are not what they appear to be. This article will discuss an aspect of bots that attempt to exploit the sensitive nature of SEO optimization rules.


We love search engines. If it weren’t for search engines most of our sites would never be discovered. When the search engine bot comes knocking, we best let it in, or we will suffer the consequences of not existing. Let’s face it, if you aren’t on the first page of the search results, you don’t exist. Let’s break down a request made by a search engine bot: – – [24/Sep/2018:07:58:32 -0400] “GET / HTTP/1.1” 200 2947 “-” “Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.96 Mobile Safari/537.36 (compatible; Googlebot/2.1; +”

We can identify this request as coming from Google’s indexing bot. We do this by examining the User Agent provided during the request. Unfortunately, this header can be set to any value and cannot be trusted. While we understand this, it is common for site operators to always allow this user agent for fear of vanishing from the Internet.

We are now presented with a problem. How do we know that the actor identifying as the Google indexer actually belongs to Google? Luckily the popular search engines typically provide documentation on how to validate an actor. For example, Google documents this process at At this point you are probably wondering how this pattern applies to other search engine bots. The good news is that while the User Agents and valid domains may change, the process is still the same. It goes something like this:

  • Does the User Agent match a list of provided user agent strings?
  • If yes, perform a reverse lookup on the IP address of the actor
  • Examine the domain of the lookup and ensure it matches the provided domain(s)
  • If yes, perform a forward lookup of the host and check to see if the forward lookup matches the IP address provided

If all of these checks pass, the actor is a valid search engine bot. If any of these checks fails, it isn’t. We are now faced with the problem of validating any actor that claims to be a search engine indexer. Blindly allowing these actors introduces a technical weakness that could lead to a loss event. It is interesting to observe this trend in action. You can do so by searching your logs for actors that match a pattern and performing the verification steps manually. You will likely fine a number of actors that fail the test. We have observed that even on low traffic sites, there are requests every day that fail validation.

We all know that doing this manually doesn’t scale and that neither does manual IP blocking. As usual, this is a process that can be automated.

NGINX Bot Verifier

It’s important to keep edge processing at the edge. This is a common mistake in application security. It’s typically easiest to take all application logic and put it in the application. This makes deployment and operations easier. The problem with this practice is that it allows requests to make it to your application that never should have. This presents an opportunity for an actor to exploit a vulnerability in your application. It’s effectively lack of input validation on a meta level. In general, if the request doesn’t look right and your application shouldn’t process it, it should be rejected before your application has to try.

Applying this idea leads us to an obvious choice; do it at the webserver layer. There are options for this, but NGINX is the most widely used webserver available. It also has a nice API for creating custom modules and request handlers. Because this validation process only looks at the User Agent and IP address of the request, it requires very little work to perform the validation and keep actors from hitting our application.

I took some time recently to put this idea into an NGINX module. You can find the code at It’s an open source project and all ideas, pull requests, and issues are welcome. The module handles the validation steps described and works for the following search engines:

  • Google
  • Yahoo!
  • Bing
  • Baidu
  • Yandex

Because the validation happens inline with the request, and the validation requires a response from DNS, that validation can introduce perceived latency. In order to minimize this the module is backed by a Redis cache. All validation results are cached to prevent latency for subsequent requests. The cache expiry is handled by a timeout configured in the module directives and simply expires over a period of time since the last validation.

Why Does it Matter?

You’ve got a lot to do. The point of automating away threats like this is to reduce the noise and increase the signal of more sophisticated attacks. It produces a heightened awareness of threats and allows you to better understand what types of attacks you are seeing day to day and provides valuable information that could help you understand what attackers are after. Data produced by these types of tools also supports threat models and risk analysis to better support your overall information security program.


This is just one of the many problems you face every day in the world of web application defense. There are many ways to solve this problem, but I encourage you to do so. There are some search engines that make validation difficult. For example, DuckDuckGo uses Amazon EC2 instances that don’t resolve back to DuckDuckGo and thus cannot follow the typical validation process. There are other search engines that present different challenges. If you do decide to give this module a try, let us know. If you find it useful, have questions, or would like to see improvements, let us know. We believe we can make the world a little safer by sharing, and we are hoping that this starts a broader conversation on handling bad robots.