Problem
In our vSphere environment, we encountered issues related to the default configuration of vSphere HA (High Availability) advanced settings. Specifically, the default setting for the unknown state monitor period did not align with our operational needs, leading to unnecessary alerts during HA resets:
– The default value of 10 for the unknown state monitor period was insufficient in certain conditions, causing alerts to trigger too frequently.
– With different vCenter versions in the environment, it became challenging to consistently apply the correct settings across clusters.
– Manually adjusting these settings and reconfiguring HA on multiple clusters was time-consuming and error-prone.
Troubleshooting
1. Understanding the Issue:
The frequent alerts were tied to the advanced HA options, specifically the `fdm.policy.unknownStateMonitorPeriod` or `fdm.unknownStateMonitorPeriod` depending on the vCenter version. The default value of 10 was not appropriate for our environment.
2. Identifying the Cause:
We discovered that in vCenter 7.0 Update 1 and newer, the name of this advanced setting changed. The inconsistent configuration across clusters led to alerts whenever HA reset operations occurred. These alerts were unnecessary, as they were related to timing rather than actual HA failures.
3. Exploring a Solution:
We needed an automated way to adjust the monitor period on all clusters and ensure that HA would re-enable with the new settings applied. Additionally, we wanted a solution that could handle multiple vCenter versions and ensure consistency.
4. Key Requirements for the Solution:
– Check the current HA advanced settings and adjust them based on the vCenter version.
– Automatically disable and re-enable HA after modifying the setting.
– Provide clear feedback on the status of the operation.
– Support both vCenter 7.x and 8.x environments.
Solution
<#
.SYNOPSIS
Script: FixTimerHA
Version: 1.0 (Tested)
Date: October 9, 2024
Author: Kabir Ali - info@whatkabirwrites.nl
Description: This script checks the HA settings and adjusts them based on the following article: https://knowledge.broadcom.com/external/article?legacyId=2017778
Version history:
0.1 - Oct 9 - Initial version
1.0 - Oct 9 - Tested version
.EXAMPLE
.\FixTimerHA.ps1 -vCenter "Prod-vCenter02.local.domain" -vCenterUser "administrator@vsphere.local" -vCenterPass "VMware1!" -ClusterName "Prod01"
#>
Param (
[Parameter(Mandatory = $true)][string]$vCenter,
[Parameter(Mandatory = $true)][string]$vCenterUser,
[Parameter(Mandatory = $true)][string]$vCenterPass,
[Parameter(Mandatory = $false)][string]$ClusterName = "*"
)
# Try to connect to vCenter
Try {
# Attempt to connect
$connection = Connect-VIServer -Server $vCenter -User $vCenterUser -Password $vCenterPass -ErrorAction Stop
# Check if the connection was successful
if ($connection) {
Write-Host "Successfully connected to vCenter: $vCenter" -ForegroundColor Green
}
} Catch {
Write-Warning "Failed to connect to vCenter. Error: $_"
Exit
}
# Define the new value for the monitor period
$newValue = 30
# Get the cluster
$clusters = Get-Cluster -Name $ClusterName
foreach ($cluster in ($clusters | Where-Object { $_.HAEnabled -eq $true })) {
# Get the vCenter version based on first host in cluster
$vCenterVersion = (Get-Cluster -Name $cluster | Get-VMHost | Select-Object -First 1).Version
# Set correct optionkey
if ($vCenterVersion -ge "8.0.2") {
$optionKey = "das.config.fdm.policy.unknownStateMonitorPeriod"
} elseif ($vCenterVersion -ge "7.0.1" -and $vCenterVersion -lt "8.0.2") {
$optionKey = "das.config.fdm.unknownStateMonitorPeriod"
} else {
$optionKey = "das.config.fdm.policy.unknownStateMonitorPeriod"
}
# Show progress
Write-Host "Checking advanced settings of cluster: $($cluster.Name)"
# Get the advanced HA option
$advancedSetting = Get-AdvancedSetting -Entity $cluster -Name $optionKey -ErrorAction SilentlyContinue
# Check if the value is already correct
if ($advancedSetting.Value -eq $newValue) {
Write-Host "New value matches the current value. No update needed."
} else {
# Set new value
Write-Host "Updating the advanced settings to the new value." -ForegroundColor Red
New-AdvancedSetting -Entity $cluster -Type ClusterHA -Name $optionKey -Value $newValue -force -Confirm:$false | Out-Null
# Disable and re-enable HA on the cluster
Write-Host "Disabling HA." -ForegroundColor Green
$cluster | Set-Cluster -HAEnabled:$false -Confirm:$false | Out-Null
# Track disable task progress
$DisableTask = Get-Task | Where-Object { $_.Name -eq "Unconfiguring vSphere HA" }
while ($DisableTask -and $DisableTask.PercentComplete -ne 100) {
Write-Host "Task running to disable HA."
Start-Sleep -Seconds 10
$DisableTask = Get-Task | Where-Object { $_.Name -eq "Unconfiguring vSphere HA" }
}
# Enable HA
Write-Host "Enabling HA." -ForegroundColor Green
$cluster | Set-Cluster -HAEnabled:$true -Confirm:$false | Out-Null
# Track enable task progress
$EnableTask = Get-Task | Where-Object { $_.Name -eq "Configuring vSphere HA" }
while ($EnableTask -and $EnableTask.PercentComplete -ne 100) {
Write-Host "Task running to enable HA."
Start-Sleep -Seconds 10
$EnableTask = Get-Task | Where-Object { $_.Name -eq "Configuring vSphere HA" }
}
# Validate change
$advancedSetting = Get-AdvancedSetting -Entity $cluster -Name $optionKey -ErrorAction SilentlyContinue
# Check if the value is now correct
if ($advancedSetting.Value -eq $newValue) {
Write-Host "Applied advanced settings successfully." -ForegroundColor Green
} else {
Write-Host "Applying advanced settings failed. Please manually update the advanced settings!" -ForegroundColor Red
Start-Sleep -Seconds 3
}
}
}
# Disconnect from vCenter
Disconnect-VIServer -Server * -Confirm:$false
Write-Host "Successfully disconnected from vCenter: $vCenter" -ForegroundColor Green
The output will be similar to this:
Successfully connected to vCenter: Prod-vCenter02.local.domain Checking advanced settings of cluster: Prod-AMS-Gold01 Updating the advanced settings to the new value. Disabling HA. Task running to disable HA. Enabling HA. Task running to enable HA. Task running to enable HA. Task running to enable HA. Task running to enable HA. Applied advanced settings successfully. Successfully disconnected from vCenter: Prod-vCenter02.local.domain
This script automates the process of updating the HA settings across clusters based on the vCenter version. It adjusts the `unknownStateMonitorPeriod`, disables and re-enables HA, and verifies that the change is successful.