Ken’s Virtual Realty

March 9, 2009

When is it OK to default on your VI?

Filed under: LinkedIn, Virtualization — Tags: , , — Ken Cline @ 10:40 PM

I’ve noticed something about engineers. They’re never happy with the way something is configured out of the box – there’s always a better way! Well, I have a different philosophy:

“If you don’t have a very good reason to change a default value, don’t change it!”

To me, this seems totally obvious – in most cases, the default values are there for a reason.

They’re set to whatever value they’re set to based on testing or customer feedback or imperical knowledge – rarely because someone just chose a value. As an example, let’s explore the configuration of the virtual switch (vSwitch) used to support the Service Console in a VMware ESX Server. Assume the following:

  • vSwitch0 is configured with two port groups – one for the Service Console, the other for VMotion
  • There are two physical NICs (pNICS) (vmnic0 & vmnic1) associated with vSwitch0
  • We want to have Service Console and VMotion traffic using different pNICs to pass their traffic
  • In the event of the failure of one path (pNIC, cable, physical switch (pSwitch)), we want the traffic to automatically failover to the second path
  • There are two separate physical switches available to support our environment
  • [Optional, but recommended] You’re aware that VLANs are not intended to provide security and that you will have comingling of data on the wire when using VLANs. Aware of this risk, you choose to go ahead and use VLANs for traffic segmentation and configure your Service Console port group on “VLAN SC” and your VMotion port group on “VLAN VMotion” – obviously, you’ll need to assign the appropriate VLAN number to each port group as well as configure the appropriate ports on the pSwitch to support 802.1Q trunking.

Scenario 1

Many people will want to do the following:

  • Configure vSwitch0 to use IP Hash as the load balancing algorithm
  • Configure the physical switch (pSwitch) to use 802.3ad static link aggregation
  • Configure the Service Console port group as follows:
    • vmnic0 – Active
    • vmnic1 – Standby
    • Load Balancing – vSwitch Port Based
  • Configure the VMotion port group as follows:
    • vmnic0 – Standby
    • vmnic1 – Active
    • Load Balancing – vSwitch Port Based

Now, let’s look at what’s been achieved…

Figure 1

As shown in Figure 1, all paths are redundant – but there are some issues with this configuration! First, when using IP Hash as the load balancing algorithm on the vSwitch, it is recommended that you not use any other load balancing approach for your port groups:

“All adapters in the NIC team must be attached to the same physical switch or an appropriate set of stacked physical switches. (Contact your switch vendor to find out whether 802.3ad teaming is supported across multiple stacked chassis.) That switch or set of stacked switches must be 802.3ad-compliant and configured to use that link-aggregation standard in static mode (that is, with no LACP). All adapters must be active. You should make the setting on the virtual switch and ensure that it is inherited by all port groups within that virtual switch.”

Next, remember that we have NO control over inbound network traffic. That means that the pSwitch could easily send inbound packets to us on our inactive pNIC. While I’m sure that the vSwitch will figure this out and deliver the packets to us – it is counter-intuitive and (in my opinion) a bad practice.

To continue with the list of problems with this configuration, consider that by specifying Port Based load balancing at the Port Group, you’re defeating the purpose of using IP Hash on the vSwitch! The reason for using IP Hash is to enable link aggregation and the potential for a single vNIC to pass more than a single pNIC’s worth of bandwidth. By explicitly specifying Active and Standby pNICs, you’ve eliminated that possibility.

One other thing that can be an issue is the use of link aggregation and IP Hash load balancing at all. This is not a technical problem, but a political one. I’ve worked in many large environments and have frequently run into issues when having to interface with the networking (or storage, or security) team. Ideally, all parties involved in infrastructure support will be 100% behind the virtualization effort – unfortunately, the world is far from an ideal place! In general, I like to minimize the number of touch points between the various entities – especially in the early stages of implementation…just to reduce friction and allow time for familiarity to breed acceptance.

Scenario 1 – Summary
Criteria Score Evaluation

Overall Score

F

The fact that this is an invalid configuration forces me to give it an overall score of “F”

Valid Configuration

F

When the vSwitch is configured for IP Hash load balancing, it is improper to override at the Port Group level.
Fault Tolerance

A

There is no single point of failure in the network path – assuming you have stackable switches. If your switches don’t support stacking, then the grade here would be reduced to a “C”
Simplicity

C

There is a lot of complexity in this configuration
Politics

B

This configuration uses IP Hash load balancing which requires coordination outside the virtualization team

Scenario 2

OK, so not everyone will go quite so overboard with the “tweeking”. Let’s make a couple changes that will at least make this a valid configuration – let’s remove the specification of a load balancing algorithm on the port groups and set all vmnics active on all port groups. This means that we’ll have something like this (changes from previous configuration highlighted in blue italics):

  • Configure vSwitch0 to use IP Hash as the load balancing algorithm
  • Configure the physical switch (pSwitch) to use 802.3ad static link aggregation
  • Configure the Service Console port group as follows:
    • vmnic0 – Active
    • vmnic1 – Active
    • Load Balancing – default to IP Hash
  • Configure the VMotion port group as follows:
    • vmnic0 – Active
    • vmnic1 – Active
    • Load Balancing – default to IP Hash

Here’s the result of our labors:

Figure 2

At least we have a valid configuration! The benefit of using this configuration is that, if you have multiple VMotions active at the same time, you may use both of the pNICs in the vSwitch to increase your bandwidth utilization, which could help when you’re trying to evacuate a bunch of VMs from a host that’s in need of maintenance.

Scenario 2 – Summary
Criteria Score Evaluation

Overall Score

B

This is a good, relatively simple configuration. It has the advantage that you can use the bandwidth of both pNICs to expedite the migration of virtual machines from the host if you need to.

Valid Configuration

A

This is a valid configuration
Fault Tolerance

A

There is no single point of failure in the network path – assuming you have stackable switches. If your switches don’t support stacking, then the grade here would be reduced to a “C”
Simplicity

C

There is complexity in this configuration, but it is reasonable and serves a valid purpose
Politics

B

This configuration uses IP Hash load balancing which requires coordination outside the virtualization team

Scenario 3

Alright, now let’s see how we can go about simplifying this whole thing:

  • Configure vSwitch0 to use default port-based load balancing algorithm
  • Do not configure the physical switch (pSwitch) to use 802.3ad static link aggregation
  • Configure the Service Console port group as follows:
    • vmnic0 – Active
    • vmnic1 – Standby
    • Load Balancing – default to port-based
  • Configure the VMotion port group as follows:
    • vmnic0 – Standby
    • vmnic1 – Active
    • Load Balancing – default to port-based

Figure 3

With this scenario, we’ve achieved our objectives of having SC & VMotion traffic on separate interfaces, each with a failover adapter available. We also can deterministically say that, assuming no path failures, we KNOW which interface is carrying which type of traffic. An even bigger benefit, in many organizations, is that the configuration required on the physical switch is none, zero, zip, nada!

Scenario 3 – Summary
Criteria Score Evaluation

Overall Score

B+

This is a good, relatively simple configuration. The biggest negatives are 1) VMotion traffic is tied to a single interface and 2) there’s more complexity than you have to have.

Valid Configuration

A

This is a valid configuration
Fault Tolerance

A

There is no single point of failure in the network path – assuming you connect to two different physical switches.
Simplicity

B

There is limited complexity in this configuration
Politics

A

This configuration uses vSwitch port based load balancing which requires no coordination outside the virtualization team

Scenario 4

Finally, let’s get rid of all the complexity that we can – making this the simplest configuration that meets our goals:

  • Configure vSwitch0 to use default port-based load balancing algorithm
  • Do not configure the physical switch (pSwitch) to use 802.3ad static link aggregation
  • Configure the Service Console port group as follows:
    • vmnic0 – Active
    • vmnic1 – Active
    • Load Balancing – default to port-based
  • Configure the VMotion port group as follows:
    • vmnic0 – Active
    • vmnic1 – Active
    • Load Balancing – default to port-based

Basically, what we’ve done is to let everything default. All the adapters are active, the load balancing method is virtual switch port based and nothing is overridden by the port groups. This yeilds the configuration shown in Figure 4, below.

Figure 4

Here we’ve also achieved our objectives of having SC & VMotion traffic on separate interfaces, each with a failover adapter available. But wait! How do we know this to be the case? We did not configure Active and Standby adapters, so how do we know each type of traffic will have its own pNIC? Well, since the default load balancing scheme is based on the vSwitch port and there are two pNICs and only two vSwitch ports in use – that’s the default behavior! The thing that we’ve lost in this configuration is the ability to deterministically know which interface is carrying which type of traffic. I contend that’s not a big deal. We also retain the advantage of no pSwitch configuration required.

Scenario 4 – Summary
Criteria Score Evaluation

Overall Score

A

This is a good, relatively simple configuration. The biggest negatives are 1) VMotion traffic is tied to a single interface and 2) there’s more complexity than you have to have.

Valid Configuration

A

This is a valid configuration
Fault Tolerance

A

There is no single point of failure in the network path – assuming you connect to two different physical switches.
Simplicity

A

There is no unavoidable complexity in this configuration
Politics

A

This configuration uses vSwitch port based load balancing which requires no coordination outside the virtualization team

Summary

I recognize that there are other load balancing options available in VI-3; however, I don’t really see them as being overly useful. Therefore, I’m not going to cover them in this discussion.

So, what does all this mean? Basically, if you’re concerned about the amount of time it takes to migrate virtual machines off of your hosts and you have the ability to use 802.3ad link aggregation, then Scenario 2 is your best bet. This scenario does have more complexity than is absolutely necessary, but provides the benefit of reduced evacuation time when you need to shut a host down quickly.

If, on the other hand, you cannot – or don’t want to – use 802.3ad link aggregation, then Scenario 4 is for you. Honestly, for a large percentage of environments that are deploying VI-3, this is the way to go. I gave it an “A” for a reason. It is the simplest option to implement and to maintain and it will serve your needs admirably, and besides…it follows my mantra of “If you don’t have a very good reason to change a default value, don’t change it!”

Let me know what you think – am I right or am I barking up the wrong tree?

Advertisements

7 Comments »

  1. Ken,

    I think you are barking up the RIGHT tree. 🙂 My client base is SMB and for most of their environments, the default networking load balancing (port based) is fine and in fact does a very good job for them. I never really subscribed to the idea that vMotion traffic would be so intense that it would need to be isolated and given dedicated network cards. During implimentations, when P2Vs are hot and heavy, the amount of vMotion increases but once the environment stabilizes and normal operations resume, I find in my client environments that the virtual machines tend to settle in and not bounce around so much to warrant dedicated bandwidth (which would be taken away from something else i.e. iSCSI or Application traffic)

    I tend to like the KISS approach and agree that limiting touch points from other groups during implimentations helps adoption of the new technology.

    Great post.

    Thanks.
    CARLO.

    Comment by Carlo Costanzo — March 9, 2009 @ 11:10 PM

  2. Carlo – thanks for the comment! Yes, the only time that I see a need for “enhanced” VMotion capabilities is when you are putting a host into maintenance mode. In some larger environments, it’s not uncommon to have 40, 50, or more VMs on a host – being able to accelerate their movement to other hosts can be a God-send…but for the majority of deployments, KISS it!

    Comment by Ken Cline — March 10, 2009 @ 8:52 AM

    • Jason actually wrote a great little article on increasing the number of simultaneous vMotions VC will handle at a time.

      http://www.boche.net/blog/index.php/2009/01/05/guest-blog-entry-vmotion-performance/

      Carlo

      Comment by Carlo — March 16, 2009 @ 3:27 PM

      • Thanks, Carlo! Yes, I was aware of Jason’s post and I should have included a reference to it in my original post. Thanks for linking to it here, it helps add completeness!

        btw – please update your bookmarks to point to my new (correctly named) blog over at http://kensvirtualreality.wordpress.com/!

        Comment by Ken Cline — March 16, 2009 @ 3:56 PM

      • Thanks, Carlo! Yes, I was aware of Jason’s post and I should have included a reference to it in my original post. Thanks for linking to it here, it helps add completeness!

        Comment by Ken Cline — March 16, 2009 @ 4:14 PM

  3. Great article Ken !

    It’s great to see that I’m not the only one with that opinion (KISS or only change default with a good reason)! At the current project we’ve had a lot of discussion about this specific topic, discussed various complex configurations, and at the end the conclusion was indeed : keep it simple!

    Keep up the great work!

    Comment by Matthijs Haverink — March 10, 2009 @ 10:43 AM

  4. I read a lot about ESX networking while setting up our new hosts and got very confused because we have HP switches.

    I ended up getting our IT consulting company to help me with our networking and switch etc. setup.

    What we ended up doing is the exact same thing as Scenario 4 — 1 vSwitch with 2 pNICs for SC, vmKernel, and Production, on the main subnet, we are a very small ESX cluster.

    With the second set of 2 pNICs, we did the same thing for backend SC and backend vmKernel to another vSwitch, but very different IP addresses because these portgroups were for SC and vmKernel SAN traffic, which we VLAN’d on the HP switch per se.

    This is all on 1 HP switch BTW, dual switches would not be practical in our small environment.

    If I ever get 1 more dual NIC per host, I can remove Production from the first set of pNICs on each host to another vSwitch which we set up and left “blank” for this eventuality.

    Comment by Tom — March 10, 2009 @ 4:26 PM


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Blog at WordPress.com.

%d bloggers like this: