vRealize Orchestrator Endpoints Health Check

ANZ region is one of the highly virtualized regions globally. We have a number of customers running bleeding edge technologies. A lot of them are running exceptionally mature orchestrated solutions requiring a complex echo system of native and third party rest hosts and plugins.

“With great power comes great responsibility”

The success of these solutions is dependent on the availability of rest hosts/plugins and it becomes the responsibility of the cloud admins to ensure they always available. In this post, I will try to solve one of these business problems.

Use Case

I was working for one of the customers running a complex VM provisioning process using vRealize Automation and Orchestrator involving 12+ Rest API solutions like Jenkins, Artifactory, CMDB, Vaults, NSX etc, and a number of plugins like IPAM . The provisioning success was directly dependent on the availability/stability of these endpoints (rest/plugins). The customer wanted an automated reporting solution, something that can be sent to the cloud operations team before the start of business notifying them of any broken endpoints. 

Solution

After giving some thought about possible solutions, I decided to write a vRO workflow as a preferred solution. Since vRO is closest to these endpoints and the workflow can be scheduled, integrated with vRLI, vROPS, etc it wasn’t a hard decision to make. 

I wanted to make the report in a JSON format as it gives flexibility to the customer for using it with any third party data tools as well. This monitoring solution uses modules for each endpoint added to the vRO workflow. These modules are a repeatable unit of code (vRO Actions) that can be extended to use with any number of endpoints. All the rest hosts are added in the config elements for easy access. The customer was using more than one rest host for each system. Each of the health check actions return an array of objects. e.g.

[
  { Prod Endpoint healthCheck JSON},
  { Dev Endpoint healthCheck JSON}
]

The heart of this solution is doing a get call to the given endpoint and making a health decision based the rest response. 

The high-level tasks in each module are listed in the flowchart below. The flow chart represents an example of an Artifactory Health Report with two rest hosts. 

Workflow logical flow

The above process produces an array of JSON reports. We repeated this process for all the rest hosts and plugins added to the vRO. 

Once these individual health reports are generated, we merge all these JSON files into a one large reporting JSON

The above process produces a finished report like below after the merger:

{
  "healthReports": [
    [
      {
        "reportType": "IPAM Plugin",
        "connectionType": "Plugin",
        "endpoint": "ipam.fluffyclouds.com",
        "restStatus": "",
        "restResponse": "82 :EA defintion names available for: ipam.fluffyclouds.com",
        "isHealthy": true
      }
    ],
    [
      {
        "reportType": "NSX Prod",
        "endpoint": "https://nsxprod.fluffyclouds.com",
        "restStatus": 200,
        "restResponse": "{\"vcConfigStatus\":{\"connected\":true,\"lastInventorySyncTime\":1591271616410}}",
        "connectionType": "REST",
        "isHealthy": true
      },
      {
        "reportType": "NSX Dev",
        "endpoint": "https://nsxdev.fluffyclouds.com",
        "restStatus": 200,
        "restResponse": "{\"vcConfigStatus\":{\"connected\":true,\"lastInventorySyncTime\":1591271616410}}",
        "connectionType": "REST",
        "isHealthy": true
      }
    ],
    
    [
      {
        "reportType": "Artifactory",
        "endpoint": "https://artifactory.fluffyclouds.com/artifactory",
        "connectionType": "Rest",
        "isHealthy": true,
        "restStatus": 200,
        "restResponse": "{\n  \"repo\" : \fluffyclouds \ documents}"
      }
    ], 
    [
      {
        "reportType": "Jenkins Prod",
        "endpoint": "https://jenkins.fluffyclouds.com",
        "connectionType": "Rest",
        "isHealthy": false,
        "errors": "[\"HTTP status code: 401, expected 200.  Error received Invalid password/token ]"
      }
    ]
  ]
}

Report Scheduling

The next task is to schedule this workflow. For that, I used a vRO policy to kick off this workflow 07:30 A.M. every morning. It gives the cloud operations team enough time to troubleshoot if need be. 

Report Consumption

This report can be consumed with either vRLI or vROPS. 
I decided to do a vRLI integration as it fulfills the requirements. 
I enabled JSON parser in vRLI to parse this report and created a dashboard for the same. Set up alerts in vRLI to notify the cloud operations team in case we have a host with isHealthy: false value. 
Report consumption is a detailed topic and may need a dedicated post for an explanation. I will leave that to some other time.

As the orchestration evolves, we will see more creative solutions to tackle these issues in the market. Till then I have a happy customer who is ready to react to the ever-changing business dimensions. 

Author – Barjinder Singh

This Post Has 2 Comments

  1. Jaspreet

    Great explanation. Simple and precise

  2. Greg Davis

    Great post Barjinder!

Leave a Reply