Badger - Overview
Purpose:
Badger is an embeddable, persistent, and fast key-value database written in Go. It's designed to be a performant alternative to other key-value stores, particularly optimized for SSDs. It serves as the underlying database for Dgraph and is intended for use cases requiring high read and write throughput with terabytes of data.
Architecture:
Badger's architecture is based on an LSM (Log-Structured Merge) tree with a value log. This design separates keys from values (WiscKey), reducing write amplification and improving performance on SSDs. Major components include:
- LSM Tree: Stores sorted keys and pointers to values in the value log. It consists of multiple levels, each with increasing size and containing SSTables (Sorted String Tables).
- Value Log: Stores the actual values, appended sequentially to files. This minimizes random writes, which are slow on SSDs. Value log garbage collection reclaims space from obsolete values.
- MemTable: An in-memory sorted map that buffers writes before they are flushed to SSTables.
- Manifest: Keeps track of SSTables and their levels in the LSM tree.
- Cache: Caches frequently accessed data blocks and index information to improve read performance.
These components interact as follows: writes are initially inserted into the MemTable and then flushed to level 0 SSTables. Compaction processes move data between levels, merging and sorting data to optimize reads. When values are larger than a certain threshold, they are stored in the value log, and the LSM tree only stores pointers to the value log location.
Key Functionalities:
- Key-Value Storage: Provides APIs for storing and retrieving data using keys and values.
- ACID Transactions: Supports concurrent ACID transactions with Serializable Snapshot Isolation (SSI) guarantees, ensuring data consistency and integrity.
- Snapshots: Allows creating consistent snapshots of the database for backup or point-in-time recovery.
- Iterators: Offers iterators for efficient scanning and retrieval of key-value pairs, with prefix filtering and versioning capabilities.
- TTL Support: Supports setting Time-To-Live (TTL) values for keys, enabling automatic expiration and cleanup of data.
- Value Log Garbage Collection: Recovers disk space occupied by outdated values in the value log.
- Backup and Restore: Provides functionality for backing up and restoring the database.
- Streaming: Features stream writer and stream reader for efficient data transfer and processing.
- Encryption: Supports encryption-at-rest for data security.
- Metrics: Exposes metrics for monitoring database performance and health.
- Command-Line Tool: Includes a command-line tool for tasks such as backup, restore, and data inspection.