Problem with Vlan-mon and Intel X710/X722

Bug reports
andlui9
Posts: 37
Joined: 20 Jan 2017, 11:46

Problem with Vlan-mon and Intel X710/X722

Post by andlui9 »

I recently had the opportunity to install some Accel-PPP servers on good servers with good network cards. However I ended up realizing that when trying to use the vlan monitoring feature, both for pppoe server and ipoe server, linux presents a call-trace, as I will leave below, when turning off the vlan monitoring feature (vlan-mon) o problem does not happen. I even tried to update the network card driver (i40e) and also disable the CPU management mode (powersave), using the performance mode, both in BIOS and Linux, without success in both attempts. I am using Debian 9 with 4.9 kernel.

Even include the following options to inform the kernel about the frequency control over the CPU, "intel_pstate=disable idle=poll intel_idle.max_cstate=1", also successful, I also left the scaling governor as "performance", without success either.

Below the call trace data:
[16611.374728] INFO: rcu_sched self-detected stall on CPU
[16611.374772] 21-...: (5250 ticks this GP) idle=89f/140000000000001/0 softirq=9653/9653 fqs=1733
[16611.374808] (t=5251 jiffies g=717042 c=717041 q=7008)
[16611.374840] Task dump for CPU 21:
[16611.374841] accel-pppd R running task 0 17318 1 0x00000008
[16611.374844] ffffffff9d719a00 ffffffff9caa953b 0000000000000015 ffffffff9d719a00
[16611.374846] ffffffff9cb830ad ffff8987ee5596c0 ffffffff9d64fd80 0000000000000000
[16611.374849] ffffffff9d719a00 00000000ffffffff ffffffff9cae51ca 0000000000000001
[16611.374851] Call Trace:
[16611.374853] <IRQ>
[16611.374859] [<ffffffff9caa953b>] ? sched_show_task+0xcb/0x130
[16611.374863] [<ffffffff9cb830ad>] ? rcu_dump_cpu_stacks+0x92/0xb2
[16611.374866] [<ffffffff9cae51ca>] ? rcu_check_callbacks+0x75a/0x8b0
[16611.374870] [<ffffffff9cafb770>] ? tick_sched_do_timer+0x30/0x30
[16611.374872] [<ffffffff9caebda8>] ? update_process_times+0x28/0x50
[16611.374874] [<ffffffff9cafb170>] ? tick_sched_handle.isra.12+0x20/0x50
[16611.374876] [<ffffffff9cafb7a8>] ? tick_sched_timer+0x38/0x70
[16611.374878] [<ffffffff9caec87e>] ? __hrtimer_run_queues+0xde/0x250
[16611.374880] [<ffffffff9caecf5c>] ? hrtimer_interrupt+0x9c/0x1a0
[16611.374883] [<ffffffff9d021b27>] ? smp_apic_timer_interrupt+0x47/0x60
[16611.374887] [<ffffffff9d02025e>] ? apic_timer_interrupt+0x9e/0xb0
[16611.374887] <EOI>
[16611.374896] [<ffffffffc033c21c>] ? i40e_find_filter+0x2c/0x70 [i40e]
[16611.374900] [<ffffffffc0341b54>] ? i40e_add_filter+0x54/0x140 [i40e]
[16611.374904] [<ffffffffc0343722>] ? i40e_vsi_add_vlan+0xe2/0x2f0 [i40e]
[16611.374908] [<ffffffffc0343963>] ? i40e_vlan_rx_add_vid+0x33/0x50 [i40e]
[16611.374912] [<ffffffffc0404afc>] ? vlan_mon_nl_cmd_add_vlan_mon+0x17c/0x2c0 [vlan_mon]
[16611.374915] [<ffffffff9cf4b9f5>] ? genl_family_rcv_msg+0x1c5/0x360
[16611.374917] [<ffffffff9cefce3e>] ? __kmalloc_reserve.isra.35+0x2e/0x80
[16611.374921] [<ffffffff9cbe9ae6>] ? kmem_cache_alloc_node_trace+0x156/0x5a0
[16611.374923] [<ffffffff9cf4bb90>] ? genl_family_rcv_msg+0x360/0x360
[16611.374925] [<ffffffff9cf4bc12>] ? genl_rcv_msg+0x82/0xc0
[16611.374927] [<ffffffff9cf4b194>] ? netlink_rcv_skb+0xa4/0xc0
[16611.374929] [<ffffffff9cf4b814>] ? genl_rcv+0x24/0x40
[16611.374931] [<ffffffff9cf4ab6a>] ? netlink_unicast+0x18a/0x230
[16611.374933] [<ffffffff9cf4af67>] ? netlink_sendmsg+0x357/0x3b0
[16611.374936] [<ffffffff9cef5996>] ? sock_sendmsg+0x36/0x40
[16611.374938] [<ffffffff9cef6428>] ? ___sys_sendmsg+0x2c8/0x2e0
[16611.374940] [<ffffffff9cf49593>] ? netlink_insert+0x1a3/0x320
[16611.374942] [<ffffffff9cf4979e>] ? netlink_autobind.isra.30+0x8e/0xd0
[16611.374944] [<ffffffff9cc0953a>] ? __check_object_size+0xfa/0x1d8
[16611.374946] [<ffffffff9cef4935>] ? move_addr_to_user+0xb5/0xd0
[16611.374948] [<ffffffff9cef6d31>] ? __sys_sendmsg+0x51/0x90
[16611.374951] [<ffffffff9ca03b7d>] ? do_syscall_64+0x8d/0x100
[16611.374953] [<ffffffff9d01e3ce>] ? entry_SYSCALL_64_after_swapgs+0x58/0xc6
lbw
Posts: 27
Joined: 09 Mar 2019, 00:16

Re: Problem with Vlan-mon and Intel X710/X722

Post by lbw »

Try turning off with 'ethtool' any VLAN tag related acceleration and see if that solves your issue.
andlui9
Posts: 37
Joined: 20 Jan 2017, 11:46

Re: Problem with Vlan-mon and Intel X710/X722

Post by andlui9 »

I did it, but without success.

I did it like this:
ethtool -K enp134s0f2 rxvlan off txvlan off
lbw
Posts: 27
Joined: 09 Mar 2019, 00:16

Re: Problem with Vlan-mon and Intel X710/X722

Post by lbw »

I came across this today:

https://serverfault.com/questions/73205 ... untu-14-04

Disable LRO if enabling ip forwarding or bridging

WARNING: The ixgbe driver supports the Large Receive Offload (LRO) feature. This option offers the lowest CPU utilization for receives but is completely incompatible with routing/ip forwarding and bridging. If enabling ip forwarding or bridging is a requirement, it is necessary to disable LRO using compile time options as noted in the LRO section later in this document. The result of not disabling LRO when combined with ip forwarding or bridging can be low throughput or even a kernel panic.

Perhaps it's related?
andlui9
Posts: 37
Joined: 20 Jan 2017, 11:46

Re: Problem with Vlan-mon and Intel X710/X722

Post by andlui9 »

in my case it doesn't even allow to change, it was already disabled
lbw
Posts: 27
Joined: 09 Mar 2019, 00:16

Re: Problem with Vlan-mon and Intel X710/X722

Post by lbw »

It might be worth switching to a X540 or X520. They are relatively cheap and see if that solves your problem?
dimka88
Posts: 866
Joined: 13 Oct 2014, 05:51
Contact:

Re: Problem with Vlan-mon and Intel X710/X722

Post by dimka88 »

andlui9, provide please `ethtool -k enp134s0f2 ` output
slima
Posts: 6
Joined: 16 Nov 2020, 11:42

Re: Problem with Vlan-mon and Intel X710/X722

Post by slima »

andlui9 you solved the problem?
Phyllo
Posts: 11
Joined: 25 Aug 2021, 19:28

Re: Problem with Vlan-mon and Intel X710/X722

Post by Phyllo »

I just ran into the same issue on the X710-DA2 NIC on Debian 10 on a HP DL 360 G9 on the latest i40e driver.
As soon as I would run accel-cmd show sessions the CPU would stall and the server would require a hard reboot as it would never shut down.

I tried one of the suggestions, but slightly altered on syntax and it appears to have worked as it has not locked my CPU up yet.
(Update: It did eventually lock up and caused major network issues. Avoid the X710-DA2 nic!)

Code: Select all

ethtool -K ens2f1 rxvlan off tx-vlan-offload off
ethtool -K ens2f1 rxvlan off rx-vlan-offload off
Out of curiosity does anyone know of any other NIC that could be used that has VLAN offloading that won't crash accel-ppp?

The default feature set for the NIC can be seen below:
Note: Anything tagged [fixed] is unchangeable.

Code: Select all

Features for ens2f1:
rx-checksumming: on
tx-checksumming: on
        tx-checksum-ipv4: on
        tx-checksum-ip-generic: off [fixed]
        tx-checksum-ipv6: on
        tx-checksum-fcoe-crc: off [fixed]
        tx-checksum-sctp: on
scatter-gather: on
        tx-scatter-gather: on
        tx-scatter-gather-fraglist: off [fixed]
tcp-segmentation-offload: on
        tx-tcp-segmentation: on
        tx-tcp-ecn-segmentation: on
        tx-tcp-mangleid-segmentation: off
        tx-tcp6-segmentation: on
generic-segmentation-offload: on
generic-receive-offload: on
large-receive-offload: off [fixed]
rx-vlan-offload: on
tx-vlan-offload: on
ntuple-filters: on
receive-hashing: on
highdma: on
rx-vlan-filter: on [fixed]
vlan-challenged: off [fixed]
tx-lockless: off [fixed]
netns-local: off [fixed]
tx-gso-robust: off [fixed]
tx-fcoe-segmentation: off [fixed]
tx-gre-segmentation: on
tx-gre-csum-segmentation: on
tx-ipxip4-segmentation: on
tx-ipxip6-segmentation: on
tx-udp_tnl-segmentation: on
tx-udp_tnl-csum-segmentation: on
tx-gso-partial: on
tx-sctp-segmentation: off [fixed]
tx-esp-segmentation: off [fixed]
tx-udp-segmentation: off [fixed]
fcoe-mtu: off [fixed]
tx-nocache-copy: off
loopback: off [fixed]
rx-fcs: off [fixed]
rx-all: off [fixed]
tx-vlan-stag-hw-insert: off [fixed]
rx-vlan-stag-hw-parse: off [fixed]
rx-vlan-stag-filter: off [fixed]
l2-fwd-offload: off [fixed]
hw-tc-offload: on
esp-hw-offload: off [fixed]
esp-tx-csum-hw-offload: off [fixed]
rx-udp_tunnel-port-offload: on
tls-hw-tx-offload: off [fixed]
tls-hw-rx-offload: off [fixed]
rx-gro-hw: off [fixed]
tls-hw-record: off [fixed]
Last edited by Phyllo on 13 Sep 2021, 19:02, edited 4 times in total.
dimka88
Posts: 866
Joined: 13 Oct 2014, 05:51
Contact:

Re: Problem with Vlan-mon and Intel X710/X722

Post by dimka88 »

Hello Phyllo, you can use x520-da2, with this NIC all work successfully.
Post Reply