Cisco ASA & ESX: strange ARP behavior
Last week I had a very strange problem with a Cisco ASA firewall. The firewall is configured with multiple interfaces, including a DMZ interface. There are multiple servers in the DMZ. These servers are physical and virtual servers. The virtual servers are VMware servers in a blade environment.
I configured the feature
ip verify reverse-path interface DMZ
to prevent spoofing to occur. I also configured a transparent static NAT rule between the Inside network and the DMZ network and multiple static NAT rules between the DMZ network and the Outside network. I left the proxy ARP feature configured with its default settings.
The customer was complaining about log in problems and connectivity problems on the DMZ servers, especially between different DMZ servers. I have done some research and noticed that all problems were related to DMZ servers in the blade environment.
I started some connectivity test and noticed some strange ICMP behavior on the specific servers. When I started a ping from one DMZ VMware server to an other DMZ server on the same ESX host, the first ping responded with an echo-reply, but consequent pings failed. Looking at the ARP table of the server, I noticed that the firewall responded with its own MAC address for every ARP broadcast.
Looking at different forums on the Internet, everybody is speaking about the proxy ARP feature and that you should disable this feature. By default proxy ARP is enabled and I always leave it enabled. Till now I never had this problem. After disabling the proxy ARP feature for the DMZ interface
sysopt noproxyarp DMZ
the problem was solved, because the firewall doesn’t respond to the ARP queries, except for its own interface. Digging a bit deeper on forums, I never found one thread who explains why the proxy ARP feature should be disabled to solve this particular problem.
In my opinion this problem is related to the VMware environment, because I don’t have these problems with physical DMZ servers. So it is strange why the DMZ servers on the same ESX hosts cannot see each other and why does the firewall respond to the ARP queries?
In the near future the blade environment (ESX hosts, network configuration and SAN configuration) is changed, so I hope to find the exact cause and solution of the problem. Does anybody else have some suggestions??
March 23rd, 2010 at 2:18 pm
I just want to say thank you. I have been troubleshooting this issue for a week now and notice in my packet capture the same exact behavior. Cisco hasn’t published anything on this issue and their TAC has been troubleshooting the wrong problem from day one.
Thanks Again for this article. I can now breath easier..
V/R
James
April 13th, 2010 at 8:26 pm
Same kind of issue coming up for me, except I don’t think it’s exclusive to VMs. I also have a VMware environment set up. My issue is that my default gateway is an MPLS router, and my firewall is my connection to the internet. After some reading around on Cisco’s forums, the proxy arp thing seems to be a byproduct of NATing and the firewall apparently barges into the conversation when I have an ARP request go out for my router. It was causing a lot of problems in my outer offices when they were connecting to our web portal and would randomly just time out rather than connect. I’m putting out feelers on Cisco’s forum to try and track down what is going to happen when I use the sysopt noproxy command on my inside interface.
June 14th, 2010 at 7:36 pm
We just ran into this exact same issue… VMHosts get the arp from the firewall once disabled issue went away. Our issue now is trying to get all subnets from internal to be able to communicate with the DMZ servers. Any suggestions would be much appreciated as our main site can communicate but other two sites (different subnets) can not get to the DMZ.
June 15th, 2010 at 8:41 am
Disabling proxy arp influences the NAT configuration for the specific interface. If you are using a Cisco PIX / ASA firewall, you should check your NAT configuration first. Maybe you have to add NAT exemptions or static NAT entries for the missing internal subnets.