Monitoring your ClickHouse Cloud deployment

Overview

This guide provides enterprise teams with information on monitoring and observability capabilities for production deployments of ClickHouse Cloud. Enterprise customers frequently ask about out-of-the-box monitoring features, integration with existing observability stacks including tools like Datadog and AWS CloudWatch, and how ClickHouse's monitoring compares to self-hosted deployments.

Users can use the following methods to monitor their ClickHouse deployment:

Section	Description	Wakes idle services?	Setup required
Cloud Console dashboards	Day-to-day monitoring with built-in dashboards for service health, resource utilization, and query performance	No	None
Notifications	Alerts for scaling events, errors, mutations, and billing	No	None (customizable)
Prometheus endpoint	Export metrics to Grafana, Datadog, or other Prometheus-compatible tools	No	API key + scraper config
System table queries	Deep debugging and custom analysis via direct SQL queries against `system` tables	Yes	SQL queries
Community and partner integrations	Datadog agent integration, community monitoring tools, and the Billing & Usage API	Varies	Tool-specific
Advanced dashboard reference	Detailed reference for each advanced dashboard visualization, including troubleshooting examples	No	None

Quick start

Open the ClickHouse Cloud console to the Monitoring tab. This blog captures common things to watch out for when getting started.

For most users, the Cloud Console dashboards provide everything needed to monitor service health, resource utilization, and query performance without any configuration. If you need to integrate with an external monitoring stack, start with the Prometheus-compatible metrics endpoint.

System impact considerations

The above approaches use a mixture of either relying on Prometheus endpoints, being managed by ClickHouse Cloud, or querying system tables directly. The latter of these options relies on querying the production ClickHouse service, which adds query load to the system under observation and prevents ClickHouse Cloud instances from idling which can impact costs. Additionally, if the production system fails, monitoring may also be affected, since the two are coupled.

Querying system tables directly works well for deep introspection and debugging but is less appropriate for real-time production monitoring. The Cloud Console dashboards and the Prometheus endpoint both use pre-scraped metrics that do not wake idle services, making them better suited for ongoing production monitoring. Consider these trade-offs between detailed system analysis capabilities and operational overhead.

Overview​

Quick start​

System impact considerations​

Overview

Quick start

System impact considerations