Concepts

Open Source Components

Filebeat

Positioning: Lightweight log collector Description: An open-source log collection component installed on container nodes, responsible for real-time monitoring of log files at specified paths. It collects log data through input modules, processes it, and forwards the logs to Kafka or directly delivers them to storage components via output modules. It supports capabilities such as multiline log aggregation and field filtering for preprocessing.

Elasticsearch

Positioning: Distributed search and analytics engine
Description: A full-text search engine based on Lucene, storing log data in JSON document format, and providing near real-time search capabilities. It supports dynamic mapping for automatic field type recognition and achieves fast keyword searches through inverted indexing, suitable for log searches and monitoring alerts.

ClickHouse

Positioning: Columnar analytical database
Description: High-performance columnar storage database designed for OLAP scenarios, implementing PB-level log data storage using the MergeTree engine. It supports high-speed aggregation queries, time partitioning, and data TTL strategies, making it suitable for log analysis and statistical reporting in batch computation scenarios.

Kafka

Positioning: Distributed message queue
Description: Serving as the messaging middleware for the log pipeline system, it provides high-throughput log buffering capabilities. When the Elasticsearch cluster experiences processing bottlenecks, it receives log data sent by Filebeat via Topics, facilitating traffic peak reduction and asynchronous consumption, ensuring the stability of the log collection end.

Core Functionality Concepts

Log Collection Pipeline

Description: The complete link from log data generation to storage, comprising four stages: Collection -> Transmission -> Buffering -> Storage. It supports two pipeline modes:

Direct Write Mode: Filebeat → Elasticsearch/ClickHouse
Buffer Mode: Filebeat → Kafka → Elasticsearch

Index

Description: The logical data partitioning unit in Elasticsearch, analogous to a table structure in databases. It supports time-based rolling index creation (e.g., logstash-2023.10.01) and automated hot-warm-cold tiered storage via Index Lifecycle Management (ILM).

Shards and Replicas

Description:

Shard: The physical storage unit resulting from Elasticsearch's horizontal splitting of an index, supporting distributed scalability.
Replica: A copy of each shard, providing data high availability and query load balancing.

Columnar Storage

Description: The core storage mechanism of ClickHouse, where data is compressed and stored by column, significantly reducing I/O consumption. It supports the following features:

Vectorized query execution engine
Data partitioning and sharding
Materialized views for pre-aggregation

Key Technical Terms

Ingest Pipeline

Description: The data preprocessing pipeline in Elasticsearch, capable of performing ETL operations such as field renaming, Grok parsing, and conditional logic before data is written.

Consumer Group

Description: Kafka's parallel consumption mechanism, where multiple instances within the same consumer group can consume messages from different partitions in parallel, ensuring ordered message processing.

TTL (Time To Live)

Description: Data lifespan strategy, supporting two implementation methods:

Elasticsearch: Automatically deletes expired indices through ILM policies.
ClickHouse: Automatically deletes table partitions via TTL expressions.

Replication Factor

Description: The data redundancy configuration at the Kafka Topic level, defining the number of message replicas across different Brokers, enhancing data reliability.

Data Flow Model

View full docs as PDF

Concepts

Open Source Components

Filebeat

Elasticsearch

ClickHouse

Kafka

Core Functionality Concepts

Log Collection Pipeline

Description: The complete link from log data generation to storage, comprising four stages: Collection -> Transmission -> Buffering -> Storage. It supports two pipeline modes:

Direct Write Mode: Filebeat → Elasticsearch/ClickHouse
Buffer Mode: Filebeat → Kafka → Elasticsearch

Index

Shards and Replicas

Description:

Shard: The physical storage unit resulting from Elasticsearch's horizontal splitting of an index, supporting distributed scalability.
Replica: A copy of each shard, providing data high availability and query load balancing.

Columnar Storage

Description: The core storage mechanism of ClickHouse, where data is compressed and stored by column, significantly reducing I/O consumption. It supports the following features:

Vectorized query execution engine
Data partitioning and sharding
Materialized views for pre-aggregation

Key Technical Terms

Ingest Pipeline

Description: The data preprocessing pipeline in Elasticsearch, capable of performing ETL operations such as field renaming, Grok parsing, and conditional logic before data is written.

Consumer Group

TTL (Time To Live)

Description: Data lifespan strategy, supporting two implementation methods:

Elasticsearch: Automatically deletes expired indices through ILM policies.
ClickHouse: Automatically deletes table partitions via TTL expressions.

Replication Factor

Description: The data redundancy configuration at the Kafka Topic level, defining the number of message replicas across different Brokers, enhancing data reliability.

How to

Backup Management

Recovery Management

Architecture

Concepts

Guides

How To

Trouble Shooting

Concepts

Guides

How To

Troubleshooting

Install

Concepts

Guides

How To

Disaster Recovery

Concepts

Guides

How To

Guides

Compliance

Install

API Refiner

User

Guides

Group

Guides

Role

Guides

IDP

Guides

Troubleshooting

User Policy

Guides

Overview

Images

Guides

How To

Virtual Machine

Guides

How To

Troubleshooting

Network

Guides

How To

Storage

Guides

Backup and Recovery

Guides

Concepts

Concepts

Guides

Namespaces

Pre-Application-Creation Preparation

Creating Applications

Post-Application-Creation Configuration

Operation and Maintenance

Application Observability

Workloads

Pod

Container

How To

Install

How To

Install

Guides

How To

Concepts

Guides

Argo CD Concept

Alauda Container Platform GitOps Concepts

Creating GitOps Application

GitOps Observability

Architecture

Guides

How To

Guides

How To

Troubleshooting