Explore the key features of the Monitoring & Ops module designed for Inference Services. This overview introduces core capabilities to help users efficiently monitor, analyze, and optimize AI service operations.
Realtime Pod Logs
Stream logs from Replica pods associated with inference services in real time. Debug issues instantly and track service behavior across deployments.
Token Throughput
Measure token processing rates to evaluate model performance and scalability.
Request Traffic Analytics
Analyze request volume, latency, and track successful/failed requests per second (QPS) to maintain service reliability and meet SLAs.