Post

📜 VXRAIL INFRASTRUCTURE Daily Health Report

Objectif

Check the overall health of VXRAIL cluster.

Prerequisites

  • Access to vCenter

Procedure & Validation

  • Connect to vSphere using the S1 account

> Check the state of the hosts

  • Choose the Datacenter > Hosts & Cluster > Hosts:
    Check the value of status is Normal
  • Validation: All the hosts must be in « Normal » state
    (In the case of a « Warning » state, there should be no incident reported, except if it occurs on multiple nodes or across all nodes within the same cluster)

> Check the state of the VMs

  • Choose the Datacenter > VMs > Virtual Machines :
    Check the value of status is Normal
  • Validation: All the VMs must be in « Normal » state
    (In the case of a « Warning » state, there should be no incident reported, except if it occurs on multiple nodes or across all nodes within the same cluster)

> Check the state of the Datastores

  • Choose the Datacenter > Datastores > Datastores:
    Check the value of status is Normal
  • Validation: All the Datastores must be in « Normal » state
    (In the case of a « Warning » state, there should be no incident reported, except if it occurs on multiple Datastores or across all Datastores within the same cluster)

> Checking the performance of the hosts

  • Choose the Datacenter > Select the wanted cluster > Monitor > vSphere DRS > CPU Utilization & Memory Utilization : Verify the value for each node in each cluster does not exceed 80%

  • Validation: Memory and CPU usage should not exceed 80%.

> Run the the Skyline Health test

  • Choose the Datacenter > Select the wanted cluster > Monitor > vSAN > Skyline Health : Verify that no critical error exists (No health score for satellite)

  • Value range:
    • Value range 81-100 = On Target/Healthy.
    • Value range 61-80 = Degraded health.
    • Value range 0-60 = Unhealthy.
  • Validation: The score must not be less than 80.

Send email and Create tickets:

Send a summary email regarding the state of the VXRAIL infrastructure using template in this folder MCO ACT - VxRail Infrastructure Health on ddmmyyyy

A ticket must be created and assigned to Support. The subject of the ticket depends on the failed metric and the type of error

Status of the Node

  • If there is a service failure or a connection issue with the vCenter
    1
    
    Incidents/Infrastructure/Virtualisation/VMware/ESxi _Services et Agents 
    
  • If the failure occurs at the physical layer of the node
    1
    
    Incidents/Infrastructure/Virtualisation/VxRail/Hardware 
    

Status of VMs

  • If the failure occurs at the ESXi layer
    1
    
    Incidents/Infrastructure/Virtualisation/VMware/ESxi _Services et Agents 
    
  • Other cases:
    1
    
    Incidents/Infrastructure/Virtualisation/VM bloquée	 
    

Status of Datastore:

  • If the failure is due to saturation
    1
    
    Incidents/Infrastructure/Virtualisation/VMware/Saturation du DataStore 
    
  • If the failure occurs at the ESXi layer
    1
    
    Incidents/Infrastructure/Virtualisation/VMware/ESxi _Services et Agents 
    

Performance of the Hosts

1
Incidents/Infrastructure/Virtualisation/VMware/ESxi _Services et Agents 

Status of vSAN

1
Incidents/Infrastructure/Virtualisation/VMware/ESxi _Services et Agents 

If the vCenter is unreachable

1
Incidents/Infrastructure/Virtualisation/VMware/VCenter injoignable 

alt text

This post is licensed under CC BY 4.0 by the author.