To troubleshoot EIGRP you should obvious have a grasp understanding of the specific routing protocol. Of course this doesn’t only apply to the EIGRP routing protocol.
Troubleshooting the EIGRP routing protocol on a Cisco devices is mainly about logging the correct information to a syslog server, the buffer or the console and know what the output of several show commands mean.
Neighbor instability on a router has mostly one of the following reasons:
Many problems with EIGRP appear when using external routes, like redistributing a routing protocol like OSPF into EIGRP. The most common cause of this problem is the absence of setting the metrics. Important when using the redistribute command is specifying the metrics with the default-metric <metric> or redistribute <protocol> metric <metric> command.
Black Hole Summary Routing is also a common problem when using EIGRP. Black Hole Summary Routing is caused when manual summary routes are configured.
The picture shows a case of Black Hole Routing. Routers A and B summary the different /24 networks as one /16 network to router X. Suddenly the link between router A and router C gets lost. Because we used summarization between router A and router X, router X isn’t aware of the lost link, so router X keeps sending traffic for network 10.1.1.0/24 to router A and router B ((un)equal cost load-balancing). All traffic send to router A would get lost in the process.
A solution to this problem is connecting routers A and B by using a physical link or creating a GRE tunnel between both, if the physical links isn’t possible.
As mentioned before, troubleshooting the routing protocol can be done by using the correct show, logging and debug commands. Important commands for troubleshooting EIGRP are:
Troubleshooting a routing protocol can only be done if you know what the protocol is actually doing. When troubleshooting it is necessary to know the correct way to troubleshoot and start exempting possibilities for the routing problems. Exempting possibilities narrows the scope, which can result in finding the actual problem.
Recently a colleque of mine noticed something strange in the STP configuration from a couple of HP ProCurve switches. He had a network, which was configured by another party, with switches running MST en RSTP mode spanning-tree. He noticed a lot of topology changes in the configuration, but couldn’t find out where they were coming from.
Yesterday I came in another native HP ProCurve environment with two 5412zl switches en multiple 3500yl switches. All the switches have MST configured and I noticed the same strange behavior. One switch had an uptime from 29 days, but had more than 700.000 topology changes. The last change was 11 hours ago. I checked the logging and noticed that ports going up and down, aren’t counted as a topology change. I have looked at different forums, but cannot find a reason for the topology changes.
My colleque and I will try to investigate the problem further, when we have some time left ;-). I hope we can come back on this issue, but maybe some of you already noticed the same problems and know the cause of it….