Slow LVS-NAT responses
I recently had to investigate an LVS-NAT (via ldirectord) system that appeared to be adding a significant delay to many requests that were passing through it.
The HTTP requests I was looking at were taking around 600ms (with a very large variance) to complete via the director machine, instead of a consistent 35ms when made directly to one of the web server nodes!
Looking at the problem more deeply I found that larger response packets were not making it back to the browser correctly, causing repeated sends using fragmented packets instead.
After much digging, thinking and creative Googling I tracked down a similar report and a much later response (also together here) that suggested there may be a problem in later versions (since 6.0.3) of Debian Squeeze due to more network card offloading being on by default. When recent Squeeze kernels, LVS-NAT and certain network card drivers (in my case r8169) come together there are problems with slow network performance.
The temporary fix is simple enough - check which network offloading options are enabled and disable them until things work.
ethtool -k eth0 # display settings
ethtool -K eth0 gro off # disable gro
For me (and in the post linked above) the offending setting was "gro" or "generic-receive-offload".
To make this permanent ethtool supports some handy /etc/network/interface settings, eg:
auto eth0
iface eth0 inet static
address 192.168.123.123
netmask 255.255.255.0
gateway 192.168.123.254
offload-gro off
Comments
Mate, you saved the day!
Very good man, you save me!
Thanks!!
Wow, you saved me a lot of time debugging the slowness in my upgraded load-balancer. I wonder if this also influences Squeeze machines in general
Googled for a whole night without any useful information before stumbling on this great blog post.