Converge already! (Struggling with WLBS)

I hate when things don’t go my way. One server in our NLB (Network Load Balancing) cluster did not want to join the cluster anymore. When I issued a wlbs start, it tried for a couple of seconds to join the cluster, but then remained in ‘Converging’ state. A couple of times I saw an entry in the System Log " ... does not have the same number or type of port rules ...".

I tried:

  • Reboot: worked 1st time, but after that: did not help
  • Compare rules 1: compared output of wlbs display: changed all load parameters to ‘Equal’ (I normally give the servers weights that take into account the # of processors and #MB RAM)
  • Compare rules 2: compared output of wlbs display: identical
  • Compare rules 3: compared output of regedit -e wlbs.reg.%COMPUTERNAME%.reg HKEY_LOCAL_MACHINESYSTEMCurrentControlSetServicesWLBS: no significant differences
  • Recreate rules 1: recreate all 18 port rules on rogue server (so much fun :-/ ): did not help
  • Recreate rules 2: change ALL servers to use only 1 rule: did not help
  • Curse: did not help
  • Restart service: disable NLB on network adapter, press OK, re-enable NLB => Bingo! Server back in the cluster without a blink.

Now I only have to re-create the 18 rules on all servers and I’m done!

Mental note to self: check out if I can build a dedicated load balancing device in Linux, one that (1) takes into account server load (give work to least busy server) (2) response time on individual ports (and automatically disable non-responsive ports) (3) has a web interface, so I can configure from any server in the subnet.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.