Observability Glossary

Cardinality

Cardinality is how distinctive or unique data is within a dataset. For example, in a set of strucured logs, the cardinality would be the count of unique values a specific field in the logs take. This concept is important in observability because datasets with high cardinality typically contain more insights into the behaviour of an application.

For example, a typical "high cardinality" field is request ID. You can create a unique request ID for each and every requests going through your application. This field can have millions of even billions of possible values. If your observability solution supports high cardinality data, you can visualise all the logs for a specific request ID, or visualise the duration of all requests where the request ID starts with abc. Your options are limitless.

Essentially, with high cardinality data you gain deeper insights into the behaviour of your application without sacrificing on the type of sata stored.

High cardinality data requires a new type of data store that can handle the complexity of querying and storing this data fast in a cost-efficient way. Time series databases that have worked well in monitoring fail here as high cardinality can lead to more expensive storage and slower queries.

Explore related concepts

Wide event

A wide event is a comprehensive data structure that contains all relevant details of a particular request or transaction within a single event, rather than distributing them across multiple events or logs.

Canonical logs

Canonical logs are logs you add at the end of every request or transaction. Each of these logs contains all the important parameters about the request, for instance, user ID, duration, response status code, etc. Adding all these details in a single log line enable you to debug faster and perform more complex aggregations without the need to join multiple log lines together.

Dimensionality

Dimensionality is the number of attributes or features in a dataset. It is a measure of the complexity and size of the data, which can impact the performance of algorithms and the ability to visualize and interpret the data.