Cardinality is how distinctive or unique data is within a dataset. For example, in a set of strucured logs, the cardinality would be the count of unique values a specific field in the logs take. This concept is important in observability because datasets with high cardinality typically contain more insights into the behaviour of an application.
For example, a typical "high cardinality" field is request ID
. You can create a unique request ID for each and every requests going through your application. This field can have millions of even billions of possible values. If your observability solution supports high cardinality data, you can visualise all the logs for a specific request ID, or visualise the duration of all requests where the request ID starts with abc
. Your options are limitless.
Essentially, with high cardinality data you gain deeper insights into the behaviour of your application without sacrificing on the type of sata stored.
High cardinality data requires a new type of data store that can handle the complexity of querying and storing this data fast in a cost-efficient way. Time series databases that have worked well in monitoring fail here as high cardinality can lead to more expensive storage and slower queries.