HPE Aruba switches have the concept of user-based tunnelling. In short, the wired connections behave like a wireless connection. All traffic from the wired client is tunnelled to the central controller. This provides functions like central firewalling and micro-segmentation by blocking inter-user traffic.
Yesterday I had a customer complaining that multiple clients weren’t able to communicate. After investigation the problem focused on one HPE 2930F stack. The stack had been working without problems, but now I found the following error message in the logging.
I 01/17/20 10:16:01 05563 dca: ST1-CMDR: Failed to apply user role THIN_CLIENT_UBT-3007-3_7Z4q with tunnel redirect to 8021X client F44D306E2DB9 on port 3/21 as user tunnel is not operational.
The EventID description is This log event informs the user that Tunneled-node-server-redirect is enabled in the user role but per user tunnel feature is disabled.
I checked the switch “show tunneled-node-server”, and the feature is enabled. I deleted the “tunneled-node-server” configuration and reapplied the configuration to the switch, but still the same error message.
To solve the problem: CHECK THE LICENSES ON THE MOBILITY MASTER
Jan 17 07:39:10 stm: <304109> <5640> <WARN> |stm| No available license type PEFNG for Tunneled Node 54:80:28:cf:4a:4b
A switch consumes a license for user-based tunnelling.
AOS switches have the option to monitor / copy traffic from port A to port B. You also have the option to send the monitor traffic to a remote switch or even to a remote host. When the remote host is running WireShark, the monitored traffic can be analysed on the remote host.
First you need to configure the switch to send a copy of the traffic to a remote host. Use the following commands to create a monitor session to a remote host. In this case the switch is using IP adres 172.18.9.3 with source port UDP/10999 and the remote host has IP adres 172.18.11.233.
ASW-C01# conf t ASW-C01(config)# monitor mac MAC address. ASW-C01(config)# mirror endpoint Remote mirroring destination configuration. <1-4> Mirror destination number. ASW-C01(config)# mirror 1 name Mirroring destination name string. port Mirroring destination monitoring port. remote Remote mirroring destination configuration. ASW-C01(config)# mirror 1 remote ip Remote mirroring destination configuration. ASW-C01(config)# mirror 1 remote ip IP-ADDR Enter an IP address. ASW-C01(config)# mirror 1 remote ip 172.18.9.2 <1-65535> Remote mirroring UDP encapsulation port. ASW-C01(config)# mirror 1 remote ip 172.18.9.2 10999 IP-ADDR Remote mirroring UDP encapsulation destination ip addr. ASW-C01(config)# mirror 1 remote ip 172.18.9.2 10999 172.18.11.233 truncation Enable truncation for Remote mirroring. <cr> ASW-C01(config)# mirror 1 remote ip 172.18.9.2 10999 172.18.11.233 The destination switch must be configured before proceeding.
Has the remote switch been configured (y/n)? y
Next you need to configure the interface for which you would like to analyse the traffic.
ASW-C01(config)# int 4/3 ASW-C01(eth-4/3)# monitor all Monitor all traffic. <cr> ASW-C01(eth-4/3)# monitor all both mirror Mirror destination. ASW-C01(eth-4/3)# monitor all both mirror 1 no-tag-added Don’t add VLAN tag for this untagged-port <1-4> Mirror destination number. <cr> ASW-C01(eth-4/3)# monitor all both mirror 1
Traffic from port 4/3 is now send to the remote host. Now start WireShark on the remote host and create a capture filter to capture only packets for port UDP/10999.
WireShark displays packets like below, which are useless to analyse traffic. The packets are encoded as HP ERM packets.
So the final step is to decode the traffic. Just right click on a packet and choose the option “Decode As…”. You could also choose from the menu Analyze >> Decode As…
Change the column Current from (none) to HP_ERM from the drop down list and choose OK.
HP ERM, Hewlett-Packard Encapsulated Remote Mirror protocol is used by the HPE (Hewlett-Packard Enterprise) switches based on ProVision ASICs formerly of the ProCurve family, now branded under Aruba Networks, a Hewlett Packard Enterprise company. Unlike Cisco RSPAN, HP ERM encapsulates the frames to be mirrored inside UDP datagrams with a proprietary header, allowing it to be transported over any IP network (like Cisco ERSPAN)
Now the packets should be “readable” for traffic analysis.
I guess something that many HPE Aruba wireless engineers have to do these days is migrating the “old” AOS 6.x environment to the new AOS 8.x with Mobility Masters. I am not going to explain what the differences between both are and what a Mobility Master does, but I have a tip when you need to migrate remote access points (RAPs).
Yesterday evening I had to perform a migration from 3 7030 controllers to a environment with redundant Mobility Masters. The solution contains 35 campus APs and 20 remote APs. The migration should be relatively simple…….
I installed the Mobility Masters, added the licences, create the node hierarchy, add all the configuration regarding WiFi, AAA, RAP ports, user-roles, VLANs and so on. To make live easier I exported the whitelist db on the old controller (localuser–db export) and imported the whitelist db into the AOS 8.4 (or 8.5) environment (localuser–db import).
I was able to add one controller to the Mobility Masters by distributing the APs in the 6.5 environment over the remaining 2 controllers. So the old environment had 2 controllers left, one for the campus APs and one for the RAPs. I wanted to migrate the campus APs first. Can’t be that difficult……. just provision the campus AP with the IP address of the AOS 8.4 controller and wait for the magic to happen. All campus AP came online, so I added the second controller to the Mobility Master.
Since I now have two controllers I started with the lc-cluster configuration. I configured the lc-cluster group-profile, added the two controllers, including the vrrp-ip and rap-public-ip, like shown below.
Add the correct NAT and firewall configuration to the firewall environment
Add a LC-RAP-pool to the Mobility Master >> /mm level
BAM, ready to migrate the RAP. I converted the first by reprovision the AP in the old controller and configure the new public IP. Here we go!! You only need to wait and wait and wait and wait……. but the RAP didn’t come online!!! What can go wrong??
Created a whitelist entry? – yes, export / import and I see the entries via show whitelist-db rap
Created a LC-RAP-pool? – yes, show lc-rap-pool on /mm level
Correct LC-cluster config? – yes, rap-public-ip is configured correctly
Firewall config? – Config is correct. Firewall policy is matched and packet sniffer shows that packets are forwarded to controller
Controller hardware limit reached? No, there are currently 35 campus APs and the limit of a 7030 is 64 APs
Time for some deeper troubleshooting. The RAPs gets stuck at the logon role in the user-table. Time for some crypto debugging. Lets get started with logging security process crypto level debugging to check if a VPN tunnel establishment.
Hhhhmmmm, there is a strange error message!!!
|ike| HMAC_SHA1_96 ESN_0 <– R Notify: INTERNAL_ADDRESS_FAILURE (ESP spi=ea87a200)#SEND 80
Seems like the RAP IP pool isn’t configured correctly. So double check, but the pool is definitely configured correctly. Okay, another test. One RAP is onsite, so factory default the RAP and convert it from scratch. SAME RESULT AND SAME ERROR message. Hhhhmmm, I am AMFX certificated, but can’t even get a RAP connected?!?!?! So it is time to ask another AMFX’er, my colleague Peter (AMFX#36). Luckily he was willing to power up his RAP and provision it to the customer’s environment.
Okay, let’s add the RAP to the whitelist, let Peter do the conversion magic from IAP to RAP and check what happens. BAM, RAP connects, updates software and is online……. Difference, other hardware. Our RAP is RAP303-H, the customer is using RAP155P. RAP155P is supported in AOS 8.4, so that can’t be the problem.
So what is another difference…. THE WHITELIST. Our RAP was added manually after the configuration was completely done and wasn’t imported at the beginning. So let’s check the whitelist again and there it is!!!! I didn’t really check all columns from the show whitelist-db rap, but one of the columns on the right is called Cluster-InnerIP. The output below shows the column, normally there are even more columns, so you have to scroll to view the Cluster-InnerIP.
You notice that the manually added entry gets an IP address assigned from the LC-RAP-pool and all imported RAPs don’t get an IP address. This definitely reflexes the error INTERNAL_ADDRESS_FAILURE error message. The cert-type is changed to factory-cert after the RAP is converted and online.
So I removed one of the imported entries and added it again (manually) to the whitelist db and guess what. The Cluster-InnerIP is assigned to the RAP and the RAP connects directly and without any problem.
So in my opinion there could be two reasons why this issue arrises:
I imported the whitelist-db to early, because I hadn’t configured the LC-RAP-pool yet;
The import from 6.5 to 8.4 doesn’t work for RAP entries;
To be sure, I removed all entries from the whitelist db and imported the whitelist again. Again the same result, the RAPs don’t get a Cluster-InnerIP assigned. Conclusion:
TIP of the WEEK: when migrating from 6.5 to 8.4 MANUALLY add the whitelist-db rap entries to the 8.4 environment!!!
Or does someone have a better way to import the whitelist for remote APs?!?!?!
The HPE Aruba switches have this cool feature called downloadable user-roles (DUR). DUR enables the switch to use a central ClearPass server to download user-roles to the switch for authenticated users.
More and more customers want to implement wired authentication to strengthen the security level of their network. Via DUR the switches perform an HTTPS API request against ClearPass to download the user-role configuration. This makes the configuration of multiple switches easier, because you don’t need to configure the user-roles locally on the switches anymore, but you push them from a central server. The communication between switch and ClearPass is illustrated in the picture below.
I won’t describe the whole DUR configuration step-by-step, but below you can find the most important configuration for the switch.
For the HTTP GET to work the switch needs to trust the certificate chain from ClearPass. In ArubaOS 16.08 and later the certificate is automatically downloaded when specifying the option “clearpass” when configuring the RADIUS client. Another very important step for DUR to work is NTP time sync. The time on the switches needs to be in sync and here a “problem” arises.
After a switch power outage, the switch has to sync its time with an NTP server. And the time needs to be in sync before the first wired clients start authenticating. Even when I use the “iburst option with the NTP server for aggressive polling, I see that the time isn’t always synced in time.
Below you see the output from “show log -r” when the client authenticates, but the switch hasn’t synced its time yet.
I 02/12/19 10:55:46 04908 ntp: ST1-CMDR: The system clock time was changed by 918813141 sec 661757827 nsec. The new time is Tue Feb 12 10:55:46 2019 I 01/01/90 01:03:11 04911 ntp: ST1-CMDR: The NTP Server 10.10.1.1 is unreachable. I 01/01/90 01:02:55 00584 WebMacAuth: ST1-CMDR: Port 1/1, re-auth timeout 10 too short. I 01/01/90 01:02:55 05747 DFP: ST1-CMDR: device_fingerPrinting: Hardware Rules updated successfully for port:1/1, protocol:80, client:08:00:0F:9D:45:BF W 01/01/90 01:02:55 05204 dca: ST1-CMDR: Failed to apply user role VOIP___DUR-3005-1_7Z4q to macAuth client 08000F9D45BF on port 1/1: user role is invalid. W 01/01/90 01:02:55 05620 dca: ST1-CMDR: macAuth client 08000F9D45BF on port 1/1 assigned to initial role as downloading failed for user role VOIP___DUR-3005-1. I 01/01/90 01:02:51 00076 ports: ST1-CMDR: port 1/1 is now on-line I 01/01/90 01:02:51 00435 ports: ST1-CMDR: port 1/1 is Blocked by STP
The port is placed in the initial-role which is by default the role denyall. “Problem” with the default role is the missing option “reauthentication period”, so the connected clients will not automatically reauthenticate after an X-period of time.
User Role Information Name : denyall Type : predefined Reauthentication Period (seconds) : 0 Cached Reauth Period (seconds) : 0 Logoff Period (seconds) : 300
To “fix” this issue I added a new local user-role to the switch and configured this user-role as initial-role. I added the reauthentication period to the user-role, so the clients reauthenticate when time isn’t synced yet and they receive this initial-role from the switch. The configuration of the role is displayed below.
class ipv4 “IP_ANY_ANY” 10 match ip 0.0.0.0 255.255.255.255 0.0.0.0 255.255.255.255 exit policy user “DENYALL” 10 class ipv4 “IP_ANY_ANY” action deny exit aaa authorization user-role name “reauth-role” policy “DENYALL” reauth-period 30 vlan-id 1 exit
To use this role as initial-role you need to execute the following command.
Next I tested the role by rebooting the switch. After rebooting I noticed that the switch port is placed in the “reauth-role“, because I receive the error message “assigned to initial role as downloading failed for user role” in the logs. In ClearPass I see another authentication request from the client after X seconds. At that moment the time on the switch is in sync and the switch port is configured with the correct user-role.
============================================= Edited: February 13th 2019 I created a topic on the AirHeads community on this matter and HPE Aruba responded with:
A software fix for the clock reset on cold boot/power loss issue on the 2930F and 2540 is in the works, and is expected to be released by the end of February.
I used MacOS X already in the past on an “old” MacBook and I have an iMac at home, but recently I am using a MacBook Pro for work. This blog is just a wrap up for “things” that I use often, but for some reason I always forget.