list_add corruption

PPPoE related questions
Post Reply
TeCHNoiD
Posts: 10
Joined: 24 Nov 2016, 00:35

list_add corruption

Post by TeCHNoiD »

Промахнулся темой, просьба перенести в багрепорт

Постоянно вываливается паник с последующим ребутом

Возникает когда часть сети по питанию падает, потом поднимается (происходит таймаут сессий рррое, биллинг пытается их закрыть через accel-cmd и возникает ошибка, около 300 сессий одновременно, всего ~1500 на этом сервере)

Code: Select all

Jun  7 10:28:41 gw kernel: ------------[ cut here ]------------
Jun  7 10:28:41 gw kernel: WARNING: CPU: 11 PID: 2837 at lib/list_debug.c:33 __list_add+0x8e/0xc0
Jun  7 10:28:41 gw kernel: list_add corruption. prev->next should be next (ffff88079aa00828), but was           (null). (prev=ffff8807c0ef22b8).
Jun  7 10:28:41 gw kernel: Modules linked in: xt_conntrack xt_TCPMSS nf_conntrack_h323 nf_conntrack_sip nf_nat_pptp nf_nat_proto_gre nf_conntrack_pptp nf_conntrack_proto_gre snd_pcm ixgbe(O) snd_timer snd iTCO_wdt iTCO_vendor_support x86
_pkg_temp_thermal soundcore coretemp crc32c_intel pcspkr ghash_clmulni_intel cryptd lpc_ich i2c_i801 mfd_core i2c_smbus wmi xts gf128mul cbc iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi tg3 libphy bnx2 fuse jfs reiserfs btrfs ext
4 jbd2 ext2 mbcache linear raid10 raid1 raid0 dm_raid raid456 async_raid6_recov async_memcpy async_pq async_xor xor async_tx raid6_pq dm_snapshot dm_bufio dm_crypt dm_mirror dm_region_hash dm_log dm_mod firewire_core crc_itu_t sl811_hcd
xhci_pci xhci_hcd usb_storage aic94xx libsas lpfc qla2xxx megaraid_sas megaraid_mbox megaraid_mm aacraid sx8 hpsa
Jun  7 10:28:41 gw kernel:  cciss 3w_9xxx 3w_xxxx 3w_sas mptsas scsi_transport_sas mptfc scsi_transport_fc mptspi mptscsih mptbase imm parport sym53c8xx initio arcmsr aic7xxx aic79xx scsi_transport_spi sr_mod cdrom sg sd_mod pdc_adma sat
a_inic162x sata_mv ata_piix ahci libahci sata_qstor sata_vsc sata_uli sata_sis sata_sx4 sata_nv sata_via sata_svw sata_sil24 sata_sil sata_promise pata_via pata_jmicron pata_marvell pata_sis pata_netcell pata_pdc202xx_old pata_atiixp pat
a_amd pata_ali pata_it8213 pata_serverworks pata_oldpiix pata_artop pata_it821x pata_hpt3x2n pata_hpt3x3 pata_hpt37x pata_hpt366 pata_cmd64x pata_sil680 pata_pdc2027x
Jun  7 10:28:41 gw kernel: CPU: 11 PID: 2837 Comm: accel-pppd Tainted: G           O    4.9.16-gentoo #1
Jun  7 10:28:41 gw kernel: Hardware name: Supermicro SYS-6018R-WTR/X10DRW-i, BIOS 2.0a 07/26/2016
Jun  7 10:28:41 gw kernel:  ffffc90008c3fd68 ffffffff81480092 ffffc90008c3fdb8 0000000000000000
Jun  7 10:28:41 gw kernel:  ffffc90008c3fda8 ffffffff810b6751 0000002100000000 ffff88082169dcb8
Jun  7 10:28:41 gw kernel:  ffff88079aa00828 ffff8807c0ef22b8 ffff88082169dcc8 ffff88079aa007c0
Jun  7 10:28:41 gw kernel: Call Trace:
Jun  7 10:28:41 gw kernel:  [<ffffffff81480092>] dump_stack+0x67/0x95
Jun  7 10:28:41 gw kernel:  [<ffffffff810b6751>] __warn+0xd1/0xf0
Jun  7 10:28:41 gw kernel:  [<ffffffff810b67bf>] warn_slowpath_fmt+0x4f/0x60
Jun  7 10:28:41 gw kernel:  [<ffffffff8149e92e>] __list_add+0x8e/0xc0
Jun  7 10:28:41 gw kernel:  [<ffffffff8166d64a>] ppp_ioctl+0x46a/0xa60
Jun  7 10:28:41 gw kernel:  [<ffffffff81219420>] do_vfs_ioctl+0x90/0x5a0
Jun  7 10:28:41 gw kernel:  [<ffffffff81208a4e>] ? ____fput+0xe/0x10
Jun  7 10:28:41 gw kernel:  [<ffffffff812199a9>] SyS_ioctl+0x79/0x90
Jun  7 10:28:41 gw kernel:  [<ffffffff819248ae>] entry_SYSCALL_64_fastpath+0x1c/0xac
Jun  7 10:28:41 gw kernel: ---[ end trace dc0d0e3e4f060251 ]---
Jun  7 10:29:08 gw kernel: ------------[ cut here ]------------
Jun  7 10:29:08 gw kernel: WARNING: CPU: 2 PID: 1911 at lib/list_debug.c:59 __list_del_entry+0xa4/0xd0
Jun  7 10:29:08 gw kernel: list_del corruption. prev->next should be ffff88082169dcb8, but was           (null)
Jun  7 10:29:08 gw kernel: Modules linked in: xt_conntrack xt_TCPMSS nf_conntrack_h323 nf_conntrack_sip nf_nat_pptp nf_nat_proto_gre nf_conntrack_pptp nf_conntrack_proto_gre snd_pcm ixgbe(O) snd_timer snd iTCO_wdt iTCO_vendor_support x86_pkg_temp_thermal soundcore coretemp crc32c_intel pcspkr ghash_clmulni_intel cryptd lpc_ich i2c_i801 mfd_core i2c_smbus wmi xts gf128mul cbc iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi tg3 libphy bnx2 fuse jfs reiserfs btrfs ext4 jbd2 ext2 mbcache linear raid10 raid1 raid0 dm_raid raid456 async_raid6_recov async_memcpy async_pq async_xor xor async_tx raid6_pq dm_snapshot dm_bufio dm_crypt dm_mirror dm_region_hash dm_log dm_mod firewire_core crc_itu_t sl811_hcd xhci_pci xhci_hcd usb_storage aic94xx libsas lpfc qla2xxx megaraid_sas megaraid_mbox megaraid_mm aacraid sx8 hpsa
Jun  7 10:29:08 gw kernel:  cciss 3w_9xxx 3w_xxxx 3w_sas mptsas scsi_transport_sas mptfc scsi_transport_fc mptspi mptscsih mptbase imm parport sym53c8xx initio arcmsr aic7xxx aic79xx scsi_transport_spi sr_mod cdrom sg sd_mod pdc_adma sata_inic162x sata_mv ata_piix ahci libahci sata_qstor sata_vsc sata_uli sata_sis sata_sx4 sata_nv sata_via sata_svw sata_sil24 sata_sil sata_promise pata_via pata_jmicron pata_marvell pata_sis pata_netcell pata_pdc202xx_old pata_atiixp pata_amd pata_ali pata_it8213 pata_serverworks pata_oldpiix pata_artop pata_it821x pata_hpt3x2n pata_hpt3x3 pata_hpt37x pata_hpt366 pata_cmd64x pata_sil680 pata_pdc2027x
Jun  7 10:29:08 gw kernel: CPU: 2 PID: 1911 Comm: accel-pppd Tainted: G        W  O    4.9.16-gentoo #1
Jun  7 10:29:08 gw kernel: Hardware name: Supermicro SYS-6018R-WTR/X10DRW-i, BIOS 2.0a 07/26/2016
Jun  7 10:29:08 gw kernel:  ffffc900030c3cc0 ffffffff81480092 ffffc900030c3d10 0000000000000000
Jun  7 10:29:08 gw kernel:  ffffc900030c3d00 ffffffff810b6751 0000003b030c3d80 ffff88082169dcb8
Jun  7 10:29:08 gw kernel:  ffff88082169dc00 ffff88079aa00840 ffff88079aa0083c ffff88085bdf6660
Jun  7 10:29:08 gw kernel: Call Trace:
Jun  7 10:29:08 gw kernel:  [<ffffffff81480092>] dump_stack+0x67/0x95
Jun  7 10:29:08 gw kernel:  [<ffffffff810b6751>] __warn+0xd1/0xf0
Jun  7 10:29:08 gw kernel:  [<ffffffff810b67bf>] warn_slowpath_fmt+0x4f/0x60
Jun  7 10:29:08 gw kernel:  [<ffffffff81416af1>] ? avc_has_perm+0x31/0x120
Jun  7 10:29:08 gw kernel:  [<ffffffff8149ea04>] __list_del_entry+0xa4/0xd0
Jun  7 10:29:08 gw kernel:  [<ffffffff8149ea3d>] list_del+0xd/0x30
Jun  7 10:29:08 gw kernel:  [<ffffffff8166bc0b>] ppp_disconnect_channel+0x6b/0xd0
Jun  7 10:29:08 gw kernel:  [<ffffffff8166bcce>] ppp_unregister_channel+0x5e/0xe0
Jun  7 10:29:08 gw kernel:  [<ffffffff81672362>] pppox_unbind_sock+0x22/0x30
Jun  7 10:29:08 gw kernel:  [<ffffffff81673a15>] pppoe_release+0x65/0x170
Jun  7 10:29:08 gw kernel:  [<ffffffff81724b7f>] sock_release+0x1f/0x80
Jun  7 10:29:08 gw kernel:  [<ffffffff81724bf2>] sock_close+0x12/0x20
Jun  7 10:29:08 gw kernel:  [<ffffffff8120888a>] __fput+0xaa/0x230
Jun  7 10:29:08 gw kernel:  [<ffffffff81208a4e>] ____fput+0xe/0x10
Jun  7 10:29:08 gw kernel:  [<ffffffff810d244e>] task_work_run+0x7e/0xa0
Jun  7 10:29:08 gw kernel:  [<ffffffff8100258f>] exit_to_usermode_loop+0x8f/0xa0
Jun  7 10:29:08 gw kernel:  [<ffffffff81002a48>] syscall_return_slowpath+0x58/0x60
Jun  7 10:29:08 gw kernel:  [<ffffffff8192493c>] entry_SYSCALL_64_fastpath+0xaa/0xac
Jun  7 10:29:08 gw kernel: ---[ end trace dc0d0e3e4f060252 ]---

Это софтовая ошибка или у нас чтото с памятью на сервере?
Dmitry
Администратор
Posts: 954
Joined: 09 Oct 2014, 10:06

Re: list_add corruption

Post by Dmitry »

сайтом промахнулcя
это на lkml надо
TeCHNoiD
Posts: 10
Joined: 24 Nov 2016, 00:35

Re: list_add corruption

Post by TeCHNoiD »

То есть проблема в ядре я правильно понимаю?
TeCHNoiD
Posts: 10
Joined: 24 Nov 2016, 00:35

Re: list_add corruption

Post by TeCHNoiD »

Откатился до ветки longterm 3.16 проблема исчезла, все стабильно. Видимо аксель пока не адаптирован для 4й ветки ядра, либо они там чего-то наворотили, т.к. в целом субъективно ядро 4.х как-то медленнее работает.
Post Reply