Page 1 of 2

Problem with Vlan-mon and Intel X710/X722

Posted: 21 Jan 2020, 18:54
by andlui9
I recently had the opportunity to install some Accel-PPP servers on good servers with good network cards. However I ended up realizing that when trying to use the vlan monitoring feature, both for pppoe server and ipoe server, linux presents a call-trace, as I will leave below, when turning off the vlan monitoring feature (vlan-mon) o problem does not happen. I even tried to update the network card driver (i40e) and also disable the CPU management mode (powersave), using the performance mode, both in BIOS and Linux, without success in both attempts. I am using Debian 9 with 4.9 kernel.

Even include the following options to inform the kernel about the frequency control over the CPU, "intel_pstate=disable idle=poll intel_idle.max_cstate=1", also successful, I also left the scaling governor as "performance", without success either.

Below the call trace data:
[16611.374728] INFO: rcu_sched self-detected stall on CPU
[16611.374772] 21-...: (5250 ticks this GP) idle=89f/140000000000001/0 softirq=9653/9653 fqs=1733
[16611.374808] (t=5251 jiffies g=717042 c=717041 q=7008)
[16611.374840] Task dump for CPU 21:
[16611.374841] accel-pppd R running task 0 17318 1 0x00000008
[16611.374844] ffffffff9d719a00 ffffffff9caa953b 0000000000000015 ffffffff9d719a00
[16611.374846] ffffffff9cb830ad ffff8987ee5596c0 ffffffff9d64fd80 0000000000000000
[16611.374849] ffffffff9d719a00 00000000ffffffff ffffffff9cae51ca 0000000000000001
[16611.374851] Call Trace:
[16611.374853] <IRQ>
[16611.374859] [<ffffffff9caa953b>] ? sched_show_task+0xcb/0x130
[16611.374863] [<ffffffff9cb830ad>] ? rcu_dump_cpu_stacks+0x92/0xb2
[16611.374866] [<ffffffff9cae51ca>] ? rcu_check_callbacks+0x75a/0x8b0
[16611.374870] [<ffffffff9cafb770>] ? tick_sched_do_timer+0x30/0x30
[16611.374872] [<ffffffff9caebda8>] ? update_process_times+0x28/0x50
[16611.374874] [<ffffffff9cafb170>] ? tick_sched_handle.isra.12+0x20/0x50
[16611.374876] [<ffffffff9cafb7a8>] ? tick_sched_timer+0x38/0x70
[16611.374878] [<ffffffff9caec87e>] ? __hrtimer_run_queues+0xde/0x250
[16611.374880] [<ffffffff9caecf5c>] ? hrtimer_interrupt+0x9c/0x1a0
[16611.374883] [<ffffffff9d021b27>] ? smp_apic_timer_interrupt+0x47/0x60
[16611.374887] [<ffffffff9d02025e>] ? apic_timer_interrupt+0x9e/0xb0
[16611.374887] <EOI>
[16611.374896] [<ffffffffc033c21c>] ? i40e_find_filter+0x2c/0x70 [i40e]
[16611.374900] [<ffffffffc0341b54>] ? i40e_add_filter+0x54/0x140 [i40e]
[16611.374904] [<ffffffffc0343722>] ? i40e_vsi_add_vlan+0xe2/0x2f0 [i40e]
[16611.374908] [<ffffffffc0343963>] ? i40e_vlan_rx_add_vid+0x33/0x50 [i40e]
[16611.374912] [<ffffffffc0404afc>] ? vlan_mon_nl_cmd_add_vlan_mon+0x17c/0x2c0 [vlan_mon]
[16611.374915] [<ffffffff9cf4b9f5>] ? genl_family_rcv_msg+0x1c5/0x360
[16611.374917] [<ffffffff9cefce3e>] ? __kmalloc_reserve.isra.35+0x2e/0x80
[16611.374921] [<ffffffff9cbe9ae6>] ? kmem_cache_alloc_node_trace+0x156/0x5a0
[16611.374923] [<ffffffff9cf4bb90>] ? genl_family_rcv_msg+0x360/0x360
[16611.374925] [<ffffffff9cf4bc12>] ? genl_rcv_msg+0x82/0xc0
[16611.374927] [<ffffffff9cf4b194>] ? netlink_rcv_skb+0xa4/0xc0
[16611.374929] [<ffffffff9cf4b814>] ? genl_rcv+0x24/0x40
[16611.374931] [<ffffffff9cf4ab6a>] ? netlink_unicast+0x18a/0x230
[16611.374933] [<ffffffff9cf4af67>] ? netlink_sendmsg+0x357/0x3b0
[16611.374936] [<ffffffff9cef5996>] ? sock_sendmsg+0x36/0x40
[16611.374938] [<ffffffff9cef6428>] ? ___sys_sendmsg+0x2c8/0x2e0
[16611.374940] [<ffffffff9cf49593>] ? netlink_insert+0x1a3/0x320
[16611.374942] [<ffffffff9cf4979e>] ? netlink_autobind.isra.30+0x8e/0xd0
[16611.374944] [<ffffffff9cc0953a>] ? __check_object_size+0xfa/0x1d8
[16611.374946] [<ffffffff9cef4935>] ? move_addr_to_user+0xb5/0xd0
[16611.374948] [<ffffffff9cef6d31>] ? __sys_sendmsg+0x51/0x90
[16611.374951] [<ffffffff9ca03b7d>] ? do_syscall_64+0x8d/0x100
[16611.374953] [<ffffffff9d01e3ce>] ? entry_SYSCALL_64_after_swapgs+0x58/0xc6

Re: Problem with Vlan-mon and Intel X710/X722

Posted: 21 Jan 2020, 23:26
by lbw
Try turning off with 'ethtool' any VLAN tag related acceleration and see if that solves your issue.

Re: Problem with Vlan-mon and Intel X710/X722

Posted: 31 Jan 2020, 17:54
by andlui9
I did it, but without success.

I did it like this:
ethtool -K enp134s0f2 rxvlan off txvlan off

Re: Problem with Vlan-mon and Intel X710/X722

Posted: 11 Feb 2020, 05:38
by lbw
I came across this today:

https://serverfault.com/questions/73205 ... untu-14-04

Disable LRO if enabling ip forwarding or bridging

WARNING: The ixgbe driver supports the Large Receive Offload (LRO) feature. This option offers the lowest CPU utilization for receives but is completely incompatible with routing/ip forwarding and bridging. If enabling ip forwarding or bridging is a requirement, it is necessary to disable LRO using compile time options as noted in the LRO section later in this document. The result of not disabling LRO when combined with ip forwarding or bridging can be low throughput or even a kernel panic.

Perhaps it's related?

Re: Problem with Vlan-mon and Intel X710/X722

Posted: 19 Feb 2020, 14:47
by andlui9
in my case it doesn't even allow to change, it was already disabled

Re: Problem with Vlan-mon and Intel X710/X722

Posted: 23 Feb 2020, 03:01
by lbw
It might be worth switching to a X540 or X520. They are relatively cheap and see if that solves your problem?

Re: Problem with Vlan-mon and Intel X710/X722

Posted: 23 Feb 2020, 11:58
by dimka88
andlui9, provide please `ethtool -k enp134s0f2 ` output

Re: Problem with Vlan-mon and Intel X710/X722

Posted: 05 Jan 2021, 09:45
by slima
andlui9 you solved the problem?

Re: Problem with Vlan-mon and Intel X710/X722

Posted: 25 Aug 2021, 19:41
by Phyllo
I just ran into the same issue on the X710-DA2 NIC on Debian 10 on a HP DL 360 G9 on the latest i40e driver.
As soon as I would run accel-cmd show sessions the CPU would stall and the server would require a hard reboot as it would never shut down.

I tried one of the suggestions, but slightly altered on syntax and it appears to have worked as it has not locked my CPU up yet.
(Update: It did eventually lock up and caused major network issues. Avoid the X710-DA2 nic!)

Code: Select all

ethtool -K ens2f1 rxvlan off tx-vlan-offload off
ethtool -K ens2f1 rxvlan off rx-vlan-offload off
Out of curiosity does anyone know of any other NIC that could be used that has VLAN offloading that won't crash accel-ppp?

The default feature set for the NIC can be seen below:
Note: Anything tagged [fixed] is unchangeable.

Code: Select all

Features for ens2f1:
rx-checksumming: on
tx-checksumming: on
        tx-checksum-ipv4: on
        tx-checksum-ip-generic: off [fixed]
        tx-checksum-ipv6: on
        tx-checksum-fcoe-crc: off [fixed]
        tx-checksum-sctp: on
scatter-gather: on
        tx-scatter-gather: on
        tx-scatter-gather-fraglist: off [fixed]
tcp-segmentation-offload: on
        tx-tcp-segmentation: on
        tx-tcp-ecn-segmentation: on
        tx-tcp-mangleid-segmentation: off
        tx-tcp6-segmentation: on
generic-segmentation-offload: on
generic-receive-offload: on
large-receive-offload: off [fixed]
rx-vlan-offload: on
tx-vlan-offload: on
ntuple-filters: on
receive-hashing: on
highdma: on
rx-vlan-filter: on [fixed]
vlan-challenged: off [fixed]
tx-lockless: off [fixed]
netns-local: off [fixed]
tx-gso-robust: off [fixed]
tx-fcoe-segmentation: off [fixed]
tx-gre-segmentation: on
tx-gre-csum-segmentation: on
tx-ipxip4-segmentation: on
tx-ipxip6-segmentation: on
tx-udp_tnl-segmentation: on
tx-udp_tnl-csum-segmentation: on
tx-gso-partial: on
tx-sctp-segmentation: off [fixed]
tx-esp-segmentation: off [fixed]
tx-udp-segmentation: off [fixed]
fcoe-mtu: off [fixed]
tx-nocache-copy: off
loopback: off [fixed]
rx-fcs: off [fixed]
rx-all: off [fixed]
tx-vlan-stag-hw-insert: off [fixed]
rx-vlan-stag-hw-parse: off [fixed]
rx-vlan-stag-filter: off [fixed]
l2-fwd-offload: off [fixed]
hw-tc-offload: on
esp-hw-offload: off [fixed]
esp-tx-csum-hw-offload: off [fixed]
rx-udp_tunnel-port-offload: on
tls-hw-tx-offload: off [fixed]
tls-hw-rx-offload: off [fixed]
rx-gro-hw: off [fixed]
tls-hw-record: off [fixed]

Re: Problem with Vlan-mon and Intel X710/X722

Posted: 03 Sep 2021, 18:07
by dimka88
Hello Phyllo, you can use x520-da2, with this NIC all work successfully.