While looking through some logging on a switch (Cisco Catalyst 3550), I noticed the following messages popping up multiple times in the buffer logging.
-Process= "Pool Manager", ipl= 0, pid= 5
-Traceback= 1A57D0 1A6DF4 161B3C 1B2BF0 1B2E38 1C6440
Jan 26 14:45:48.970 CET: %SYS-2-MALLOCFAIL: Memory allocation of 1680 bytes failed from 0x161B38, alignment 0
Pool: I/O Free: 7412 Cause: Memory fragmentation
Alternate Pool: None Free: 0 Cause: No Alternate pool
That doesn’t look good, but the customer didn’t receive any complaints about troubles or performance issues on the network. I did some research on the memory of the switch, but couldn’t find any strange behavior. The memory allocation looks normal and buffers look normal too. I found some memory allocation failures with the command show memory failures allow, but I already knew that looking at the error message. I found an article on the Cisco website concerning this error message, but that didn’t help much either.
The switch is running IOS 12.1(13)EA1a, which is marked as deferred. The last deferral notice I can find on the Cisco website is about IOS 12.1(19)EA1. The notice displays bugs with memory leakage problems. The next step I took was checking the Bug Toolkit for the running IOS.
I searched for all bugs of the running IOS and the bug toolkit reports 391 bugs. Narrowing the search with the string “%SYS-2-MALLOCFAIL” resulted in three bugs. One bug concerns a possible problem with spanning-tree and the creation of a loop in the network. Looking at the logging of other switch I noticed multiple MAC flap messages and BPDUGuard messages at the same time as the memory message. This indicates a possible loop in the network.
The bug concerns the following behavior:
Spanning-tree BPDUs (802.1d and 802.1w/802.1s) are sent to the incorrect destination MAC address. Consequently, other switches in the network will not process the BPDUs. If the network has been designed with a physical loop, spanning-tree will not correctly block the loop, causing traffic levels to increase and users to not be able to send data. In most cases, switch management will only be possible via the console port due to looping packets. The log might also contain %SYS-2-MALLOCFAIL messages, which indicate that the switch is running out of I/O memory. Spanning-tree loops are just one cause, but not the only one, of this message. Additional testing will help to confirm that the log messages are generated due to a spanning-tree loop that occurs as a result of this specific issue.
The switch is running Per-VLAN Spanning Tree, which can be compared with the default Spanning Tree Protocol (IEEE 802.1d). This bug could be the problem of the failed memory allocation, I recommended the customer to upgrade to the latest IOS. He will do so as soon as possible and informs me if the problem reoccurs.