ROSA/CP4D Monitoring
- ROSA Monitoring
- Application Monitoring
- Infrastructure Monitoring
- Monitoring Tools
- Alerts or Notification
- Monitoring Best practices
ROSA Monitoring
- ROSA performs centralized monitoring and maintains an alerting system of all AWS accounts. Platform audit logs are collected and reviewed by the event monitoring system and RedHat SRE team. Alerts are triggered and forwarded to the accounts alerting channels.
Application Monitoring
- Applications deployed on ROSA clusters are monitored by various tools including internal Red Hat OpenShfit and IBM Cloud Pak for Data monitoring components and also by Instana Enterprise Observability server. The custom tailored alerts in addition to the built-in ones allow to capture the state of all application services and notify end users via various alerting channels.
Infrastructure Monitoring
- AWS manages infrastructure and configuration of Red Hat OpenShift Services. It notifies AWS cluster accounts about any services changes or interruptions. IBM Cloud Managed Services actively monitor availability and performance of server instances deployed in clusters on AWS.
Monitoring Tools
- Instana APM
- OpenShift AlertManager
- Prometheus
- Amazon CloudWatch
- Synthetic Monitoring
Alerts or Notification
- There are multiple types of alerts configured in client environments. Among them are alerts on
- application end points availability
- resources restrictions, like memory, cpu, file system, network utilization
- stress on internal cluster resources: Control Plane, API Server, pods and nodes
- EC2 status and performance check
- Notifications are provided to the client’s preferred alerting channel in addition to notifications to the IBM Cloud Services Pager Duty channel.
Monitoring Best practices
- IBM Cloud Managed Services team adheres to the best industry monitoring and logging practices using the best tools for monitoring of Cloud-Native Applications which include 24/7 supervision of:
- cluster and application metrics
- performance metrics
- availability status
- infrastructure state, availability and health