SigNoz - Overview

Purpose: SigNoz is an open-source alternative to popular APM (Application Performance Monitoring) and observability tools like DataDog and New Relic. Its primary goal is to provide developers with a unified interface for monitoring applications, detecting performance bottlenecks, and troubleshooting issues in production environments. It achieves this by collecting and visualizing metrics, traces, and logs.

Architecture: The SigNoz architecture consists of several key components:

Frontend: A React-based user interface that provides visualizations and dashboards for analyzing collected data.
Query Service (Go Backend): The backend component, built in Go, handles data querying, aggregation, and processing. It exposes APIs for the frontend to retrieve and manipulate observability data. It is built on top of OpenTelemetry.
OpenTelemetry Collector: An agent that receives telemetry data from applications and forwards it to the backend. It can be configured to receive data in various formats (e.g., OTLP, Jaeger, Prometheus).
ClickHouse: A columnar database used as the primary storage for telemetry data (metrics, traces, and logs). ClickHouse's architecture is optimized for analytical queries, making it suitable for observability use cases. Schema Migrators are used to sync and async schema for clickhouse.
Zookeeper: Used for ClickHouse cluster management in HA setups.
Alertmanager: Used for setting up alerts and notifications. It can be configured to use either legacy alertmanager or SigNoz's own implementation.
Schema Migrator: Used for syncing and asynchronously migrating ClickHouse schemas.

Key Functionalities:

Metrics Monitoring: Collects and visualizes application metrics like request rates, error rates, and latencies. Supports PromQL.
Distributed Tracing: Implements distributed tracing using OpenTelemetry to track requests across multiple services and identify performance bottlenecks. Display traces in Flamegraphs and Gantt Charts.
Logs Management: Provides a centralized log management solution with capabilities for filtering, searching, and analyzing logs.
Infrastructure Monitoring: Gathers infrastructure metrics from various sources (e.g., Kubernetes, EC2) to provide a holistic view of system performance.
Alerting: Allows users to define alerts based on telemetry data and receive notifications when specified thresholds are exceeded.
Managed OpenTelemetry agent: Allows automatic OpenTelemetry instrumentation for supported frameworks and languages.
Data Correlation: Correlates logs, metrics, and traces.