engineering

Logging at scale with canonical log lines

5 mins read

•

22 September, 2023

Boris Tane

Founder @Baselime

Discuss this post on Discord

Join the Cloudflare Discord community to discuss this topic with 100s of CTOs, AWS heroes, community builders, and cloud computing experts.

I was introduced to the concept of canonical log lines (also called wide events or service logs) when I read this article from the Stripe Blog. It had an impact. I’ve changed my approach to logs since.

It’s a simple, yet powerful concept: in addition to your conventional logs, you add a single log line at the end of every request that includes all the key vitals of the request, from the request method to the status code of the response, including the request ID, user ID and details of the database calls.

This enables you to efficiently identify outliers and the common characteristics of failed requests. It works so well, I’ve been using them everywhere I can since then.

Why are canonical log lines so powerful?

If you’re already doing structured logging (and you should) you must already have all the details about a request sprinkled across multiple log lines. It usually looks like the code block below.

// Pseudo-code
async function handler(req, res, next) {
    try {
        console.log("Request", JSON.stringify(req));
        const { data, duration } = await database.get();
        console.log("Database", JSON.stringify({ data, duration }));
        const response = {
            data: data,
        };
        console.log("Response", JSON.stringify(response));
        res.send(response);
        next();
    } catch(error) {
        console.log("There was an error", error);
        next();
    }
}

With this setup, you’ll get 3 log lines with the details of the request, the details of the database call, and the details of the response (or error). Once you get the unique request ID of this specific request, it’s possible to view all the logs sequentially and figure out why a specific behaviour occurred.

However, it’s impossible to answer any questions where the answers are spread across multiple log lines, for example:

Which paths are failing the most?
Is this user experiencing more failures than others?
Is there a correlation between database latency and errors?

This is where canonical log lines shine, I’d argue they’re borderline magical. In addition to all the existing logs, add a single log at the end of the execution that summarizes the key telemetry data about the request. These log lines are still human-readable, provide the full picture of a single request, and are easy to aggregate by your logs processor. For example:

{
    "message": "Canonical POST /users",
    "id": "f8ddb912-3da6-4050-82bd-6bb7a8587cd9",
    "durationMS": 326,
    "request": {
    "http": {
        "method": "POST",
        "path": "/users",
        "user": "5703383e-c2d8-4f72-b7d2-ab7e3d9d4e00"
    }
  },
    "response": {
        "status": "200"
    },
    "database": {
        "numCalls": 6,
        "numFailures": 1
    }
}

Putting all the important telemetry data in a single log line makes it easy for everyone to query for answers, without the need for complex joins or manual piecing of things together with request IDs, especially during production outages. This is particularly useful when you know something is wrong (a high number of support requests) but not quite what or where yet. You can quickly query this data with multiple filters and group by’s to identify the source of the issue.

It is possible to get to the answer with traditional logs, but it’s much more complex. Moreover, you’re generally under duress when you need to those insights during production incidents - we all know that every second feels like an entire hour.

With canonical log lines, not only is it possible to get your answers quickly, it’s easy.

How to implement canonical log lines?

The concept of canonical log lines is so simple that it can be implemented in any stack or language. If you search the package registry of your favourite programming language, you’ll likely find a library for it.

Or you can build it yourself - a crude implementation might look like this:

// Pseudo-code
async function handler(req, res, next) {
    const canonical = {};
    try {
        canonical["request"] = req;

        // Your business logic  
        
        canonical["response"] = response;
        next();
    } catch(error) {
        // Your error handling
        canonical["error"] = error;
        next();
    } finally {
        console.log("Canonical", JSON.stringify(canonical));
    }
}

Populate an object with the key vitals of your request throughout your code, and print it once the request is completed.

I recommend JSON for canonical log formats. JSON enables you to add as much metadata with high cardinality and dimensionality as you want, is consistent across programming languages and enables nesting objects. Your logs processor should be able to parse JSON and still provide you with a UI that is not overwhelming.

However, if you decide to build your own canonical log lines, there are multiple failure modes, and you need to ensure your canonical lines are always available, regardless of what happened during the process. You must continuously improve your canonical log lines library/middleware to prevent dropping canonical log lines in failure modes. Instead of building it yourself, your favourite logging library probably supports canonical log lines in one form or another, or you can find one that does on your package registry. I’ve used the AWS Lambda PowerTools before and it worked well.

import { Logger } from "@aws-lambda-powertools/logger";
import { getUser } from "../database";

const logger = new Logger({ serviceName: "my-service" });

export async function main(input: Record<string, any>) {
    const canonicalLogger = logger.createChild();
    try {
        canonicalLogger.appendKeys({ input });
        
        // Print a regular log,
        // without all the canonical metadata
        logger.info("Fetching user", { id: input.id });
        
        const user = await getUser(input.id);
        canonicalLogger.appendKeys({ user });
    } catch (e) {
        canonicalLogger.appendKeys({ error });
        return false;
    } finally {
        // Print the canonical log with the input,
        // the user and the error, if any
        canonicalLogger.info("Canonical");
    }
}

Better logging with canonical log lines

If you’re not doing canonical log lines yet, try them on one of your services and see your team thank you for it during the next production incident. They are lightweight, easy to implement, and maintain the flexibility of traditional logs.

Once you fully get the benefits of canonical log lines, start experimenting with distributed tracing. It takes all the concepts of canonical log lines and enhance them for higher visibility across distributed services.

We’re building Baselime to enable you to search through, alert and automatically detect errors across all your telemetry data fast, it works extremely well with canonical log lines.