Skip to main content

The Missing Guide to AWS API Gateway Access Logs

· 31 min read
Alex DeBrie

I've spent a fair bit of time with API Gateway over the past few years. It's an awesome (if occasionally frustrating) service for building serverless web APIs using Lambda functions.

In previous blog posts, I took a close look at the internals of API Gateway and ran a performance test for different API setups. We've explored the intricacies of custom authorizers in API Gateway and how to connect API Gateway directly to other AWS services.

In this post, we're continuing the deep dive on API Gateway. Here, we'll be looking at API Gateway access logging. Access logging can save your bacon when debugging a gnarly API Gateway issue, but you need to understand some nuance before you can use it correctly. We'll dig into the details here so that you'll be logging like Paul Bunyan in no time.

This post is a doozy. If you're new to API Gateway, I'd recommend reading the whole thing to get a feel for how logs work. Otherwise, use the following Table of Contents to skip to the section you need:

Background on API Gateway Access Logs

Let's get started with the basics -- what are access logs and why are they useful?

Access logs refer to a single log line that is written out for each request that hits your API Gateway instance. They serve as a general summary of the request -- what time the request occurred, the HTTP method and path that was requested, and the response latency.

If you're familiar with the Apache web server or know all the letters in the LAMP stack, you've probably spent some time digging through access logs. As we walk through the specifics of API Gateway's access logs, you'll see some inspiration from Apache. If you've never used or even heard of Apache, have no fear -- I'm pretty close to the same.

Access logs are useful for two main reasons:

  • Debugging: If you get a spike in 500 Internal Server Error responses, you can locate an access log to point you in the right direction to start your investigation.
  • Performance analysis: You can analyze your access logs to look for performance degradations over time or to identify slow endpoints.

While performance analysis is helpful, you can often accomplish this more easily with tools like CloudWatch Metrics. I've found that API Gateway logs are much more helpful in the debugging use case.

This is particularly true given how broad API Gateway is. API Gateway has a lot of elements, which means a lot of ways it can go wrong.

You could configure your custom authorizer wrong.

You could mess up your method response template.

You could process a request incorrectly in your Lambda function.

You could process a request correctly in your Lambda function but return the wrong shape back to API Gateway.

When I'm building web APIs, I like to alert on 500 responses to end users as that's the most salient point of user pain. However, if your access logs are a mess, you're going to have a bad time diagnosing the source of the user pain. Debugging a system that spans multiple AWS services turns into a Sherlock Holmes story without the satisfying payoff at the end.

Do yourself a favor by structuring your logs correctly.

Access logs vs. execution logs

When configuring API Gateway logs, you will notice that there are two types of logs -- access logs and execution logs. In the API Gateway console, you can configure them in the following screen:

API Gateway console logging configuration

As noted above, access logs are a single log line that is logged out on each request that comes to API Gateway, and they're often used for detecting errors or performing data analysis.

Execution logs are detailed logs about API Gateway internals. They show everything that is happening within API Gateway on a particular request, including the request and response to your authorizer (if any), the request and response to your integration, whether you are using a usage plan, the method response transformation, and more.

This information can be useful when debugging specific requests, but it's also so. much. data.

Here's the execution log output for a single request I made to API Gateway:

Execution log example

You don't need to be able to read this. Just understand there are *a lot* of logs.

This single request generated 32 log lines. If you have this enabled for all requests, this can cost a pretty penny in CloudWatch charges. Unlike access logs, you don't have any control over the format of the logs.

In general, I disable API Gateway execution logs in the normal course of business. If I have a hairy API Gateway issue that I'm trying to debug, I might enable them for a brief time. Once I've figured out my issue, I'll disable them again.

For the rest of this post, we'll be focused purely on API Gateway access logs.

Basic access log configuration

API Gateway gives you a decent amount of flexibility in configuring your access logs. Specifically, you can include any of 75+ different fields in your access logs, all the way from the request time and the status code to the authorizer latency and your AWS account ID.

A bunch of those fields aren't useful, and we'll go deep on the fields you should and shouldn't be logging in the next section. However, in this section, we're going to cover some high-level points about configuring your API Gateway access logs.

Access log format

When configuring your access logs, you get to choose an output format for your access logs. In doing so, you'll be constructing a string to be formatted by API Gateway. These strings can use values from the $context object that will be formatted based on the actual values of your specific request.

The API Gateway docs show four general formats, but you can make any format you choose.

Of the four formats that API Gateway shows, you basically have three types of options:

  • Terse, but obscure: You can use the Common Log Format (CLF) or CSV format to have a brief string describing your requests.

  • Verbose, but human-readable: You can use JSON to provide an object that's easily processed downstream and is easily readable when browsing in the CloudWatch Logs console.

  • XML: Who hurt you?

Let's take a look at the first two options, starting with CFL and CSV. With these formats, your logs will look something like the following:

Common Log Format access log example

You might be able to parse some of the information there, such as the leading timestamp, the HTTP method (GET), the path that was used, and even the 200 status code.

But some of the other values are harder to understand at a glance:

Common Log Format access log example annotated

What is this UUID? And where does this 319 value come from?

These obscure formats can make it harder to visually scan data in the console or even to write search queries or CloudWatch Insights queries to find a needle in the haystack.

Fun fact: The Common Log Format comes from the Apache webserver. I told you there would be Apache influences! There's one more, later on in this post.

I tend to prefer JSON because it's human-readable. If you're browsing the logs in CloudWatch, it might look as follows:

JSON access log example

Notice how it's easy to pick up -- you know exactly what the requestId UUID value is, and you know that the 530 value refers to the latency on the response.

This simplicity comes with a downside -- cost. CloudWatch Logs charges by the GB for both ingestion and storage. The more verbose your logs, the higher this cost will be.

The CFL version of the log was only 103 bytes. In comparison, the JSON version was 250 bytes -- more than twice as much. Additionally, these log lines were pretty sparse. As we get into the fields below, we'll be adding significantly more fields to our logs.

Ultimately, the ease is worth the additional cost to me. Make sure you're accounting for your most valuable resource -- developer time -- and not pinching pennies that end up costing you dollars.

Logging IAM role

Next, let's get into everyone's favorite topic -- permissions. Like anything in AWS, you need to make sure you have the proper IAM configuration to write your access logs correctly. And there's one quirk with API Gateway access logs permissions that has bit me a few times.

To allow your API Gateway to write to a CloudWatch Logs log group, you need to associate an IAM role that has permissions to write to CloudWatch Logs.

The key here is that a single IAM role is configured for all API Gateway APIs in a region of your AWS account. It's a singleton resource, rather than being an IAM role for each API Gateway API that you deploy.

In the API Gateway console, click on one of your deployed APIs. At the very bottom of the left-hand side, you should see a "Settings" option. Click that, and you will see the CloudWatch log role ARN for your API.

API Gateway CloudWatch Logs Role ARN settings

Again, while this appears to be in the context of your chosen API, it actually applies to all APIs in your current region. If you're configuring this via CloudFormation, you'll set it up as the AWS::ApiGateway::Account resource.

I've been bitten with this singleton resource from the following flow:

  1. I deploy Service A, which has an API Gateway instance and configures the AWS::ApiGateway::Account with an IAM role created in Service A's stack.

  2. I deploy Service B, which also has an API Gateway instance and thus also configures the AWS::ApiGateway::Account resource with an IAM role in Service B's stack. This overrides the current setting for AWS::ApiGateway::Account.

  3. I remove Service B. This has the impact of deleting the IAM role while still leaving its value in the AWS::ApiGateway::Account configuration. However, because that role no longer exists, all of your APIs will stop writing access logs to CloudWatch.

Additionally, even if you redeploy Service A, it won't update the value in AWS::ApiGateway::Account because, from CloudFormation's view, it doesn't look like that value has changed.

This can result in you silently losing logs and not finding out until the moment you need it most -- when you want to debug an issue. 🤯

Off the top of my head, I don't know of any other singleton AWS resources, and it's frustrating because it requires coordination across services.

If you want to avoid this problem, here's how to handle it:

First, if you are using the Serverless Framework to deploy your API Gateway, you don't need to do anything. The Framework uses a custom resource that handles API Gateway logging in a way that won't break if you remove the service.

If you are using a different mechanism (SAM, CloudFormation, or CDK), you have two options:

  • Deploy a separate, standalone service that configures the IAM role and API Gateway Account resource in each region you use; or

  • Add DeletionPolicy: Retain for the IAM role that is configured for your API Gateway Account. Even if the service is removed, the IAM role will persist and be able to write logs to CloudWatch.

I don't love either of these solutions. The second one requires sharing knowledge across your team and strict compliance. And no matter which approach you use, if one person on your team does it incorrectly, it could prevent all logs from writing to CloudWatch.

If you really want to get fancy, a previous team of mine used option 1 combined with a linter that ensured no service stacks tried to configure their own AWS::ApiGateway::Account resource. The deploy would be blocked for a violation of this rule.

Access logging fields

We've covered the basics. Now it's time to get to the meat of this post -- what fields can I log and what do they mean? More importantly, what should I log? Remember, there are over 75 different fields on the $context object that you can log!

Selfishly, I'm writing this post for the next time I need to configure access logs. There are so many fields that you can log, and it can be hard to parse through the documentation to understand what they mean or why they'd be useful. This is an opinionated look at the fields I recommend logging, with some details on why you want to log them. For some fields, I'll also mention why I don't want to log them.

The examples below will all use JSON format, as that's what I prefer. However, you can strip out the field names to just log the field values in CLF or CSV format.

Because there are so many fields, I'm going to break them up into five groups that I'll cover in turn. The five groups are:

If you're using a lot of API Gateway features, you might have a lot of fields! It's not uncommon for your log format to look like this:

Annotated access log format

Let's dig into the details.

General request info

The first set of fields are the ones that describe the overall details of your request itself. These are going to be most similar to the logs from the Apache or Nginx access files, including the timestamp of the request, the HTTP method and path, and the status code of the response.

I recommend using the following fields for general request info:

{
"requestTime": "$context.requestTime",
"requestId": "$context.requestId",
"httpMethod": "$context.httpMethod",
"path": "$context.path",
"resourcePath": "$context.resourcePath", // Not supported by HTTP API. Used $routeKey instead.
"routeKey": "$context.routeKey", // Only supported by HTTP API
"status": $context.status, // Note: no quotation marks around the value
"responseLatency": $context.responseLatency, // Note: no quotation marks around the value
"xrayTraceId": "$context.xrayTraceId" // Optional -- only if using X-Ray. Not supported by HTTP API
}

A few of these are pretty obvious:

  • requestTime: The time of the request (Surprise!). Note that this is in the (slightly bizarre) Apache log format that will look something like 26/Jan/2021:21:47:35 +0000.
  • httpMethod: The method of the request (GET, POST, DELETE, etc.)
  • status: The status code returned by the response (200, 400, 404, 500, etc.)
  • responseLatency: The total time it took from when the request reached API Gateway to when the response was returned.

Note that the values for both status and responseLatency are not quoted. Because these are numbers, we want them to show up as numbers in our JSON so that we can easily do math with them. However, we can't do the same with other status and latency fields in subsequent sections.

Let's take a closer look at the others.

requestId is a unique ID given to the request by API Gateway. It must be included in your log format. Important note: this is not the same request ID in your Lambda function invocation (if you're using a Lambda function to process the request). This API Gateway request ID value will be available in your Lambda function or in your custom authorizers as event.requestContext.requestId. However, if you want to log the request ID of the Lambda function in your access logs, you'll need to use $context.integration.requestId (discussed below).

You may have noticed there are two path properties -- $context.path and $context.resourcePath. Both are useful, and there are subtle differences.

$context.path will log the actual, specific path of the request. Thus, if you're calling api.myapp.com/users/1234, the value for $context.path will be /users/1234.

On the other hand, $context.resourcePath will include the path pattern used to handle the request. In the example above, that would be /users/{userId}.

Using the resourcePath can be very useful for identifying patterns in your API. In the querying section below, there's an example of finding failed requests by resource path to help debug troublesome endpoints.

Note: if you are using the new HTTP API, you'll need to use $context.routeKey instead of $context.resourcePath. Though different names, they serve the same purpose.

Finally, you can add $context.xrayTraceId if you're using AWS X-Ray for monitoring your system. If you're using X-Ray, plugging the trace ID directly into the X-Ray interface can drastically cut your debugging time. As of time of writing, HTTP API does not support X-Ray and thus you cannot use this with HTTP APIs.

Integration info

The second group of access log fields are for your endpoint's integration. The integration refers to the service that processes the request and returns a response. In most cases, this will be a Lambda function, though it could also be another AWS service (via a service integration) or even an HTTP endpoint.

If you're confused about the terminology of integrations and other components of API Gateway, check out my detailed overview of API Gateway.

For the integration information, I like to log the following fields:

{
"integrationRequestId": "$context.integration.requestId", // Most important!
"functionResponseStatus": "$context.integration.status",
"integrationLatency": "$context.integration.latency",
"integrationServiceStatus": "$context.integration.integrationStatus"
}

Let's walk through each of these.

First, the $context.integration.requestId is the most helpful field to log here. This is going to be the actual request ID for your Lambda function invocation. If you want to go from a 500 response in your API Gateway logs to the actual Lambda function invocation that failed, you'll want to use this property.

Side note -- this is my biggest complaint around the default AWS monitoring tools. It can be really hard (unless you learn these tricks) to go from a general problem -- "I had ten 500 responses on my getUser endpoint!" -- into the specific debugging details you need ("Ok, now out of my 100k invocations, which ten were the bad ones ...").

Next, notice that we're logging two different status properties. The first one -- $context.integration.status -- refers to the status returned by the code in your Lambda function, if you're using a proxy integration. Thus, if your Lambda code ends with something like:

return {
statusCode: 200,
body: JSON.stringify({ message: "User created successfully!" }),
};

then the value for $context.integration.status will be the value in the statusCode field.

On the other hand, $context.integration.integrationStatus refers to the status code from the service itself. In the case of Lambda, this is likely to be a 200, even if you return a 500 in your response object. This is because Lambda itself was working correctly, even if your function returned something different.

You may want to omit this value entirely. It's likely to be 200 unless you've configured something incorrectly or the AWS service is having an outage.

Finally, I like to log $context.integration.latency to get a feel for the latency of my actual Lambda function. This is particularly helpful if you're also using a custom authorizer in your request as you can see where the real hotspots are in your request flow.

One key point -- you might notice that we're surrounding the status and latency fields in quotation marks this time, rather than leaving them as actual numbers like we do with the overall request status and latency. It's possible your integration won't be hit on a request to API Gateway, such as if the requested route doesn't exist or if the request is blocked by the custom authorizer. In that case, the value for these properties will be a dash (-). If you don't surround that in quotation marks, you'll have invalid JSON and won't be able to easily parse it when searching your CloudWatch Logs. This is a tad frustrating as it complicates doing math on these fields.

Two final notes here:

  • First, there is a $context.integration.error field. I've found this to be pretty generic and thus not useful information above and beyond the integration status code.
  • Second, there are some shorthand versions of these properties ($context.integrationLatency instead of $context.integration.latency). They don't have them for all fields, so I prefer to use the longer, namespaced versions for consistency and so that I can easily see what is integration-related. Your preferences may differ.

Authorizer info

The third section is related to custom authorizers. If you're not using custom authorizers, you can skip this section. If you want to learn more about custom authorizers, check out my guide to custom authorizers in API Gateway.

Custom authorizers are kind of like a second integration. You're calling out to a Lambda function, and there are all kinds of ways that can go wrong. Like your integration, you want to make sure you're logging enough to point you in the right direction.

But if you're looking at the docs to see what you can log for authorizers, it can be ... confusing.

API Gateway access log authorizer fields

There are different versions of (what appear to be) similar fields, and it's hard to know which ones will work. There are three different namespaces (authorize, authorizer, and authenticate) that have similar fields.

Based on testing, I'd think about the three namespaces as follows:

  • authorize refers to the code + results from your Lambda custom authorizer itself;
  • authorizer refers to the Lambda service;
  • authenticate is a mystery. I couldn't get it to log anything while testing custom authorizers or Cognito auth. The only time I could get it to log anything was when I tried a route that did not exist in my API Gateway. In that case, it provided the same values as the overall request status code (403) and response latency (0).

Even with this mental model, there are still a lot of fields to consider. After some extensive testing, I've settled on the following authorizer log fields:

For traditional API Gateway APIs:

{
"authorizeResultStatus": "$context.authorize.status",
"authorizerServiceStatus": "$context.authorizer.status",
"authorizerLatency": "$context.authorizer.latency",
"authorizerRequestId": "$context.authorizer.requestId"
}

Let's walk through each of these:

  • $context.authorize.status: This indicates whether your custom authorizer allowed or denied the request. You'll get a 200 if it was allowed or a 403 if it was denied. However, if the request is missing the required auth header or if your custom authorizer throws an error, you won't get a value at all, just a -.
  • $context.authorizer.status: This indicates whether the custom authorizer itself responded successfully. You should get a 200 if there was a successful response (whether allow or deny), and a 500 if you throw a hard error in the authorizer.
  • $context.authorizer.latency: This shows how long the custom authorizer actually took to run. Useful for seeing whether your authorizer is a bottleneck and should be optimized or cached.
  • $context.authorizer.requestId: Like the integration request ID, this helps you identify the specific Lambda invocation for this authorizer request. It can help with debugging if you threw an error.

Additionally, like the integration section, we're putting quotation marks around the status and latency values. If your authorizer is not invoked, it will return a string value of -, which will break your JSON if it's not quoted.

There are some additional properties you will see in the docs, but I found them not to be helpful. A brief rundown:

  • $context.authorize.error: This gives a generic error message -- The client is not authorized to perform this operation. -- whenever an explicit deny is returned from the authorizer. You can use the authorizer status code instead.
  • $context.authorizer.error: Similar to the above, this gives a generic error message -- Execution failed due to an authorizer error -- whenever the custom authorizer failed to respond properly. Like the above, you can use the authorizer service status code.
  • $context.authorize.latency: Try as I might, I could never get this to show anything higher than 0. Use the $context.authorizer.latency for an accurate latency measure here.

For HTTP APIs:

{
"authorizeResultStatus": "$context.authorizer.status",
"authorizerRequestId": "$context.authorizer.requestId"
}

Fortunately, the HTTP API has simplified it a bit. There is only the authorizer namespace for properties. You'll only be logging two properties:

  • $context.authorizer.status: This is the status code indicating whether your authorizer allowed (200) or denied (403) the request;
  • $context.authorizer.requestId: Like the integration request ID, this helps you identify the specific Lambda invocation for this authorizer request. It can help with debugging if you threw an error.

Caller info

The fourth category of access log fields is around caller info -- who is making this request?

Most of these fields are less helpful in common debugging cases, but they may be useful for your needs. The most common ones are:

{
"ip": "$context.identity.sourceIp",
"userAgent": "$context.identity.userAgent",
"principalId": "$context.authorizer.principalId",
"cognitoUser": "$context.identity.cognitoIdentityId"
"user": "$context.identity.user"
}

Let's quickly review them:

  • $context.identity.sourceIp: The IP address of the client making the request.
  • $context.identity.userAgent: The User Agent (e.g. Chrome, Safari, Postman) making the request.
  • $context.authorizer.principalId: If you're using custom authorizers, this is the principalId value you return in your response. You can use this to identify the user in your application making a request.
  • $context.identity.cognitoIdentityId: If you're using a Cognito authorizer, this is the Cognito user ID that made the request.
  • $context.identity.user: If you're using IAM authentication on your endpoint, this is the IAM user that made the request.

Other fields

We're getting into the esoteric part of field exploration. There are a few fields that I've never used, but you may want depending on your situation.

The first three are all specifics about the API Gateway instance itself. These will probably only be useful to you if you aggregate multiple log groups into a single location:

  • $context.accountId: Your AWS account id
  • $context.apiId: The API Gateway ID (e.g. x9f1xamjw2)
  • $context.stage: The stage of your API Gateway. Likely dev (the default for the Serverless Framework) or Prod (the default for the SAM CLI), though perhaps something different if you're in the 19% of respondents that uses multiple stages!

Beyond that, there are some additional fields like the protocol used, the AWS WAF response, or an epoch timestamp if you are a true glutton for punishment. I won't help you here -- you'll need to check the docs yourself.

Summary

If you want the TL;DR, copy-pastable string for JSON configuration, here's what I go with.

For traditional API Gateways that are using a custom authorizer:

'{"requestTime":"$context.requestTime","requestId":"$context.requestId","httpMethod":"$context.httpMethod","path":"$context.path","resourcePath":"$context.resourcePath","status":$context.status,"responseLatency":$context.responseLatency,"xrayTraceId":"$context.xrayTraceId","integrationRequestId":"$context.integration.requestId","functionResponseStatus":"$context.integration.status","integrationLatency":"$context.integration.latency","integrationServiceStatus":"$context.integration.integrationStatus","authorizeStatus":"$context.authorize.status","authorizerStatus":"$context.authorizer.status","authorizerLatency":"$context.authorizer.latency","authorizerRequestId":"$context.authorizer.requestId","ip":"$context.identity.sourceIp","userAgent":"$context.identity.userAgent","principalId":"$context.authorizer.principalId"}'

For traditional API Gateways that are not using a custom authorizer:

'{"requestTime":"$context.requestTime","requestId":"$context.requestId","httpMethod":"$context.httpMethod","path":"$context.path","resourcePath":"$context.resourcePath","status":$context.status,"responseLatency":$context.responseLatency,"xrayTraceId":"$context.xrayTraceId","integrationRequestId":"$context.integration.requestId","functionResponseStatus":"$context.integration.status","integrationLatency":"$context.integration.latency","integrationServiceStatus":"$context.integration.integrationStatus","ip":"$context.identity.sourceIp","userAgent":"$context.identity.userAgent"}'

For HTTP APIs with a custom authorizer:

'{"requestTime":"$context.requestTime","requestId":"$context.requestId","httpMethod":"$context.httpMethod","path":"$context.path","routeKey":"$context.routeKey","status":$context.status,"responseLatency":$context.responseLatency,"integrationRequestId":"$context.integration.requestId","functionResponseStatus":"$context.integration.status","integrationLatency":"$context.integration.latency","integrationServiceStatus":"$context.integration.integrationStatus","authorizeResultStatus":"$context.authorizer.status","authorizerRequestId":"$context.authorizer.requestId","ip":"$context.identity.sourceIp","userAgent":"$context.identity.userAgent","principalId":"$context.authorizer.principalId"}'

For HTTP APIs without a custom authorizer:

'{"requestTime":"$context.requestTime","requestId":"$context.requestId","httpMethod":"$context.httpMethod","path":"$context.path","routeKey":"$context.routeKey","status":$context.status,"responseLatency":$context.responseLatency,"integrationRequestId":"$context.integration.requestId","functionResponseStatus":"$context.integration.status","integrationLatency":"$context.integration.latency","integrationServiceStatus":"$context.integration.integrationStatus","ip":"$context.identity.sourceIp","userAgent":"$context.identity.userAgent","principalId":"$context.authorizer.principalId"}'

Configuration examples

Now that we've seen which fields we want to log, let's see how to do it. Below are examples for some of the major serverless deployment tools. You may need to customize the fields that get logged, but you can use this as a starting point.

Serverless Framework

For the Serverless Framework, access logs are configured in the provider.logs property.

For traditional API Gateways, your configuration will be:

provider:
name: aws
...
logs:
restApi:
accessLogging: true
format: '{"requestTime":"$context.requestTime","requestId":"$context.requestId","httpMethod":"$context.httpMethod","path":"$context.path","resourcePath":"$context.resourcePath","status":$context.status,"responseLatency":$context.responseLatency}'
executionLogging: false # Turn off execution logs b/c they're too noisy.

If you're using an HTTP API, your configuration is:

provider:
name: aws
...
logs:
httpApi:
format: '{"requestTime":"$context.requestTime","requestId":"$context.requestId","httpMethod":"$context.httpMethod","path":"$context.path","routeKey":"$context.routeKey","status":$context.status,"responseLatency":$context.responseLatency}'

Note that the Serverless Framework will also handle all of the work around the IAM role so that you don't have to mess with it.

Serverless Application Model (SAM)

If you're using AWS SAM, you will need to configure access logs within your Api or HttpApi resource.

For a traditional API Gateway, your configuration would look as follows:

Resources:
AccessLoggedApi:
Type: AWS::Serverless::Api
Properties:
AccessLogSetting:
DestinationArn: !GetAtt AccessLogGroup.Arn
Format: '{"requestTime":"$context.requestTime","requestId":"$context.requestId","httpMethod":"$context.httpMethod","path":"$context.path","resourcePath":"$context.resourcePath","status":$context.status,"responseLatency":$context.responseLatency}'
AccessLogGroup:
Type: AWS::Logs::LogGroup

SAM will pass these properties through to an AWS::ApiGateway::Stage resource. Notice that you will need to create your own Log Group.

If you are using HTTP APIs, the format will be similar:

Resources:
AccessLoggedApi:
Type: AWS::Serverless::HttpApi
Properties:
AccessLogSettings:
DestinationArn: !GetAtt AccessLogGroup.Arn
Format: '{"requestTime":"$context.requestTime","requestId":"$context.requestId","httpMethod":"$context.httpMethod","path":"$context.path","routeKey":"$context.routeKey","status":$context.status,"responseLatency":$context.responseLatency}'
AccessLogGroup:
Type: AWS::Logs::LogGroup

There are two small differences here:

  1. The Type changes to AWS::Serverless::HttpApi
  2. Amazingly, the settings property is AccessLogSettings (with an "s" at the end!) as opposed to AccessLogSetting (no "s") for a traditional API Gateway.

Cloud Development Kit (CDK)

For the CDK fans out there, you can configure API Gateway access logs there as well. There are even a few handy, if under-baked, helpers as well.

If you have your own format string, you can configure a traditional API Gateway as follows:

const logGroup = new cwlogs.LogGroup(this, "ApiGatewayAccessLogs");
new apigateway.RestApi(this, "books", {
deployOptions: {
accessLogDestination: new apigateway.LogGroupLogDestination(logGroup),
accessLogFormat: apigateway.AccessLogFormat.custom(
'{"requestTime":"$context.requestTime","requestId":"$context.requestId","httpMethod":"$context.httpMethod","path":"$context.path","resourcePath":"$context.resourcePath","status":$context.status,"responseLatency":$context.responseLatency}'
),
},
});

If you love types, you can use the AccessLogField helper class to give you typing on your format:

const logGroup = new cwlogs.LogGroup(this, "ApiGatewayAccessLogs");
new apigateway.RestApi(this, "books", {
deployOptions: {
accessLogDestination: new apigateway.LogGroupLogDestination(logGroup),
accessLogFormat: apigateway.AccessLogFormat.custom(
`{"requestTime":"${AccessLogField.contextRequestTime()}","requestId":"${AccessLogField.contextRequestId()}","httpMethod": ...`
),
},
});

Additionally, it even has some standard formats built in:

const logGroup = new cwlogs.LogGroup(this, "ApiGatewayAccessLogs");
new apigateway.RestApi(this, "books", {
deployOptions: {
accessLogDestination: new apigateway.LogGroupLogDestination(logGroup),
accessLogFormat: apigateway.AccessLogFormat.jsonWithStandardFields(),
},
});

That said, you're likely going to want to go with a custom format. The built-in format has some fields that aren't likely to be useful (e.g. protocol) and is missing some very useful fields (path, integrationStatus, etc.).

Querying your access logs

We've spent a bunch of time carefully hand-crafting our access logs, but our logs aren't there to look pretty. They're meant to be useful.

In this final section, we'll see how to query our access logs to find the information we need. We'll look at common use cases for two different methods: CloudWatch Logs search and CloudWatch Logs Insights.

Finding 5XX responses in CloudWatch Logs

Let's start with the original log searching system in CloudWatch Logs. To find this, navigate to the CloudWatch Log Groups section of the AWS console. Find the Log Group for your API Gateway access logs and click on it.

CloudWatch Logs Log Groups

The following page will show all the different Log Streams for this Log Group. You often won't know the specific Log Stream you want to start, so click the Search all button to search across all Log Streams.

CloudWatch Logs Search All

This will aggregate all the logs for your Log Group over a given period. If you have a busy API Gateway, this can be a lot.

CloudWatch Logs unfiltered access logs

We'll use the search bar to narrow it down. CloudWatch Logs search has a robust filtering language, but it can be overwhelming to get started. For now, we'll use some the JSON pattern matching syntax.

If your logs are formatted as JSON, you can use JSON selectors to filter to the proper log lines you need. The syntax is:

{ $.<fieldName> <comparator> <value}

For example, if I want to find all logs that returned a status code of 5XX, I would use the following filter pattern:

{ $.status >= 500 }

If I apply that to my logs, you can see it filters down to more specific logs:

CloudWatch Logs 500 status results

There are two nice things to note here:

  • Because the status value is a number, we can get all 5XX responses by specifying status >= 500 rather than stringing together multiple exact matches (if status == 500 or if status == 502 ...)
  • When investigating the first access log, we can see that there was a 500 response from the integration itself. Using the logged integration request ID, we can find the logs for our Lambda function by searching within the Lambda's Log Group.

Our structured, informative logs make it much easier to diagnose the issue in our API Gateway.

One final note -- you can also create a Metric Filter based on any particular query in CloudWatch Logs. This allows you to write a metric to CloudWatch Metrics on the occurance of any particular log line. You can then alert on the metric or quickly track its value over time.

This is a great way to get CloudWatch Metrics without actually writing the metrics within your application code, or even to create metrics that you couldn't handle within your application code. For example, if you're having issues with an endpoint that is occasionally slow, you could create a metric that logs whenever the latency is over a defined threshold. When those instances arise, you could trigger an alarm that would prompt additional investigation.

Finding aggregates with CloudWatch Logs Insights

The new kid on the block for CloudWatch Logs analysis is the CloudWatch Logs Insights tool. While traditional CloudWatch Logs search works for certain patterns, CloudWatch Logs Insights gives you additional capabilities, including support for aggregations.

To get started, navigate to the CloudWatch Logs Insights section of the AWS console. Use the log group selector dropdown to find the Log Group(s) you want to search.

CloudWatch Logs Insights log group selection

When writing an Insights query, it will run through a series of stages separated by a |. The syntax might feel familiar if you've worked with Splunk, SumoLogic, or other log processing tools.

Imagine you want to find all non-successful responses (status code >= 400), group by the endpoint and the status code returned. You could use the following query:

filter status >= 400
| stats count(*) as exceptionCount by httpMethod, resourcePath, status
| sort exceptionCount desc

Let's walk through this line by line.

First, we filter down to records that have a status field greater than or equal to 400. This filters out our successful responses.

Tip: If you wrote JSON-formatted logs, each field in the log will be automatically parsed for you that you can use in a query. If not, you can parse your own fields using the parse command.

Second, we want to do some aggregations. We'll use the stats directive with the count() function to count the instances of our logs. We'll group by httpMethod, resourcePath, and status to get the grouping we want.

Tip: This is why you want the resourcePath in addition to path in your logs. The resourcePath provides better grouping to find endpoint-specific issues with high-cardinality path parameters stripped out.

Finally, we sort by the counted value so that our most problematic endpoints show up first.

Below is a look at the query in action:

CloudWatch Logs Insights results

From the results, you can see that my "GetUser" endpoint (GET /users/{id}) is the most problematic, as I have lots of unsuccessful responses, including some troublesome 5XX errors. This can be a nice starting place for exploration, and I can use my original query above to identify specific 5XX requests for my GetUser endpoint.

CloudWatch Logs Insights has a lot of power, and the ability to do aggregations on logs is a nice way to get a feel for problems at a general level before switching into specific requests.

Conclusion

If you're still reading this, hats off to you. You made it through quite a bear of a post.

In this post, we went from A - Z on API Gateway access logs. We covered a lot of ground.

First, we learned what access logs are, why they're useful, and how they different from execution logs.

Second, we looked at some of the high-level configuration details around access logs, including the format and the IAM role configuration.

Third, we did a deep dive on the fields you do (and don't) want to log in your access logs.

Fourth, we saw some example configuration for popular IAC tools.

Finally, we saw how to query access logs to get the most use out of them.

I hope this was useful to you! Good luck, and happy logging.

Questions on this post? Leave them in the comments below or shoot me an email.