Currently I’ve been working on quite a few VMware Cloud on AWS (VMC for short) projects for customers. For those of you who are new to the technology, VMC provides customers with one or more SaaS Datacenters deployed to Amazon’s Public Cloud, with the same VMware infrastructure technology they are familiar with in their own datacenters such as virtualization provided by vSphere, software defined storage provided by vSAN and software defined networking and security powered by NSX-T.
My customers have loved how the VMware Site Reliability Engineering (SRE) team takes away the headache of infrastructure management, including monitoring and Logging, and lets them focus on deploying and optimizing their applications in VMC, or configuring a whole host of Add-On features in VMC and direct access to native AWS services such as EC2 instances and S3 endpoints.
They have consistently been impressed at how quickly a VMC Software Defined Datacenter (SDDC) can be set up, in just one or two hours, and workload mobility powered by VMware HCX has allowed them to immediately start migrating virtual machines from on-premise to the cloud non-disruptively, online, rapidly at scale, and without the need for any IP address or MAC Address changes.
Migrating workloads to VMC on AWS and workload mobility is a big topic worthy of a separate post coming soon.
The Case for More than One SDDC
A VMC SDDC can scale up to a significant size, with a current hard limit of 20 clusters, with a maximum of 300 ESXi hosts with SDDC Version 1.11. (This is a VMC SRE team imposed maximum. The VMware configmax states customers can deploy 20 clusters with up to 16 ESXi hosts per cluster)
However, particularly for larger customers I’ve worked with there is still a need to have more than one SDDC:
- To provide availability for applications, for example by deploying two SDDCs to two AWS Availability Zones: ap-southeast-2a and ap-southeast-2b, and then distributing application components across two or more SDDCs.
- Stretched cluster SDDCs may be required to provide resilience to Availability Zone failure for traditional applications that don’t have in built availability.
- Some customers have a different team to manage Production and Non-Production vCenter environments for security and administrative purposes. As there is a one-to-one relationship between an SDDC and vCenter, more than one SDDC makes sense to segregate Identity Management.
- A customer’s corporate security policy may require separation of Production and Non-Production networking, e.g. Production and Non-Production network segments should not be available to virtual machines in a single vCenter environment.
The diagram below shows an example SDDC layout which addresses these requirements.
Network Connectivity Between VMC SDDCs: Pre-SDDC Version 1.11
Virtual machines in one SDDC will in most cases need to access virtual machines and services in another SDDC. So for customers with two or more SDDCs, how have I approached providing network connectivity between them up until now?
For customers with Amazon Direct Connect, traffic from one VMC SDDC to another routes via their on-premises Datacenter. The traffic traverses their Direct Connect provider, via AWS Private Virtual Interfaces. This is an inefficient path, and depending on the customers Datacenter location can incur significant latency. Also Direct Connect bandwidth is unnecessarily utilized.
There are a number of enhancements I have designed for customers to address the issues above which others have written about on the Internet in depth.
For example customers can set up direct VPN connections between SDDCs, or configure each SDDC with a VPN attachment to a customer managed AWS Transit Gateway. However the 1.2Gbps bandwidth limit for each VPN attachment has meant performance is not optimal, and the configuration requires some complexity to set up and maintain for customers. As VPN connections require a connection to a public IP address this traffic path is often not acceptable to larger corporations from a data security perspective.
Introducing VMware Transit Connect: SDDC Version 11
VMware Transit Connect is a VMware managed connectivity solution between VMC SDDCs. Under the covers it uses the AWS Transit Gateway (TGW) construct. It provides high bandwidth, low latency connectivity between SDDCs in an SDDC Group in a single AWS Region. It also enables connectivity between an SDDC Group and multiple AWS native Virtual Private Clouds (VPCs), as well as customer’s on-premises environments connected via an AWS Direct Connect Gateway. This is a Production ready feature and is available to customers now by requesting SDDC version 1.11 release. Although it is currently classed as Technical Preview, it is expected to become GA by SDDC version 1.12 or sooner.
Some of the key benefits:
- Connectivity between SDDCs is greatly simplified by using a hub and spoke model
- Low latency, high bandwidth connectivity between SDDCs, up to 50Gbps per VPC (SDDC) attachment to the Transit Connect
- The underlying AWS Transit Gateway technology is by design highly available within the AWS Region
- Created inside the VMware managed account, so as with the VMC SDDC it is managed by VMware
Routing Between SDDCs via the VTGW
The diagram below shows routing between SDDCs provided by and SDDC Group and the VTGW. In this example my customer has three SDDCs, one in each of the three Sydney Region Availability Zones. Let’s step through the process of getting this up and running.
Step 1: Create SDDCs
Deploy and configure the required VMC SDDCs from the VMC Console.
Step 2: Create SDDC Group
With SDDC version 1.11 a new tab is available at the top of the VMC Console, called SDDC Groups.
Going through the create SDDC Group wizard is a very simple process, you give the Group a name, and specify which SDDCs will be part of the Group. The process takes a few minutes to complete, after which you will see your SDDC Group members indicating L3 Connectivity Status: CONNECTED.
Note: The selected SDDCs must be in the same Region for this release.
Step 3: VTGW and Routing
The create SDDC Group process automatically creates a VTGW (VMware Transit Connect Gateway), and connects each SDDC to the VTGW via VPC attachment. The process auto-populates route tables in each SDDC and VTGW as per the connectivity policy. The resultant route table from the diagram for the VTGW would look similar to below.
You can verify the learned and advertised routes in the SDDC Console -> Networking & Security -> SDDC Interconnectivity.
Step 4: Setup Security Rules
For each SDDC, Compute Gateway firewall rules need to be set up to allow required virtual machines on network segments to communicate with each other. A system defined group is automatically created called ‘Deployment Group other SDDC Prefixes’, which is auto populated with subnets from other SDDCs. This group is automatically updated as new network segments are added and removed in other SDDCs:
A Compute Group must be manually created called ‘local workload segments’ containing all the prefixes for network segments attached to the Compute Gateway. Finally create Compute Gateway firewall rules to allow inbound and outbound traffic to and from other SDDC prefixes as in the example below.
Now you are all set up with an optimized high bandwidth and low latency network connectivity between your SDDCs.
Recommendation: Customers should configure the Distributed Firewall in each SDDC to further restrict traffic flows between virtual machines in each SDDC based on the required security policy for each application.
Virtual Machine to Virtual Machine Traffic Flow
Now that all requirements are set up, lets follow an example traffic flow from a virtual machine in SDDC1 to a virtual machine in SDDC2 via VMware Transit Connect.
Step 1: VM 1 10.10.0.2 in SDDC 1 sends and ICMP Echo request to 10.10.16.2.
Step 2: The Distributed Firewall applied to VM1 vNIC processes the request and allows the traffic.
Step 3: The Compute Gateway validates its firewall rules and allows the traffic.
Step 4: The SDDC 1 router examines its routing tables and finds a route via the VTGW
Step 5: The packet traverses the direct VPC attachment to the VTGW. The VTGW examines its routing table and finds a route to the destination network via the VPC attachment for SDDC 2.
Step 6: The packet traverses the direct VPC attachment to SDDC2. It is processed by SDDC 2 router and finds a route via the Compute Gateway.
Step 7: The Compute Gateway validates its firewall rules and allows the traffic.
Step 8: The Distributed Firewall applied to VM2 vNIC processes the request and allows the traffic.
Step 9: VM2 receives the ICMP Echo request from 10.10.0.2.
With the new VMware Transit Connect in VMC on AWS, you have optimized, high bandwidth, low latency routing between workloads in SDDCs, whilst preserving stateful protection of workloads applied at the vNIC via the distributed firewall policy in each SDDC. An SDDC Group is very easy to set up, and the VTGW is automatically deployed, updated and maintained by VMware.
With a VTGW and an SDDC Group deployed, a whole host of exciting options are opened up, including virtual machine mobility between SDDCs across the VTGW, powered by HCX cloud to cloud site-pairing.
That’s it for today, I will author part two of this post soon where I’ll discuss network connectivity between SDDCs and on-premises datacenters via the VTGW.