падение accel-ppp d06572417e1e500d7bd56859335d2f7dd0f3fd8f

Bug reports
Post Reply
_longhorn_
Posts: 36
Joined: 03 Sep 2015, 14:37

падение accel-ppp d06572417e1e500d7bd56859335d2f7dd0f3fd8f

Post by _longhorn_ »

Добрый день. Сегодня обновил ядро на 4.1.24, драйвер ixgbe 4.3.15 и за одно accel-ppp до последней dev-версии. Спустя несколько часов работы accel упал по out-of-memory. До этого стояло ядро 3.14.58, ixgbe 4.1.5 и accel 1.10 релизный, при этом проблем вообще не было. В dmesg высыпало:

[Tue May 24 12:53:44 2016] accel-pppd invoked oom-killer: gfp_mask=0x201da, order=0, oom_score_adj=0
[Tue May 24 12:53:44 2016] accel-pppd cpuset=/ mems_allowed=0
[Tue May 24 12:53:44 2016] CPU: 3 PID: 10092 Comm: accel-pppd Tainted: G O 4.1.24-nas.1 #1
[Tue May 24 12:53:44 2016] Hardware name: System manufacturer System Product Name/P8Z77-M, BIOS 2105 09/10/2013
[Tue May 24 12:53:44 2016] 0000000000000286 0000000000000000 ffffffff814d1dd1 0000000000000007
[Tue May 24 12:53:44 2016] 00000000000201da 0000000000000000 ffffffff814d1174 ffff8800d74c5800
[Tue May 24 12:53:44 2016] ffff8800c57dbb48 ffff8800d99e7e70 0000000000000000 ffff88011fdf6b00
[Tue May 24 12:53:44 2016] Call Trace:
[Tue May 24 12:53:44 2016] [<ffffffff814d1dd1>] ? dump_stack+0x47/0x5b
[Tue May 24 12:53:44 2016] [<ffffffff814d1174>] ? dump_header+0x95/0x20f
[Tue May 24 12:53:44 2016] [<ffffffffa03e5089>] ? i915_gem_shrinker_oom+0x1b9/0x210 [i915]
[Tue May 24 12:53:44 2016] [<ffffffff81136053>] ? oom_kill_process+0x1d3/0x3b0
[Tue May 24 12:53:44 2016] [<ffffffff81135aaf>] ? find_lock_task_mm+0x3f/0xa0
[Tue May 24 12:53:44 2016] [<ffffffff811365a5>] ? __out_of_memory+0x315/0x540
[Tue May 24 12:53:44 2016] [<ffffffff81136963>] ? out_of_memory+0x53/0x70
[Tue May 24 12:53:44 2016] [<ffffffff8113bdf4>] ? __alloc_pages_nodemask+0x924/0xa10
[Tue May 24 12:53:44 2016] [<ffffffff8127c6a9>] ? queue_unplugged+0x29/0xc0
[Tue May 24 12:53:44 2016] [<ffffffff8117b781>] ? alloc_pages_current+0x91/0x110
[Tue May 24 12:53:44 2016] [<ffffffff81134b7c>] ? filemap_fault+0x1ac/0x420
[Tue May 24 12:53:44 2016] [<ffffffffa010ea21>] ? ext4_filemap_fault+0x31/0x50 [ext4]
[Tue May 24 12:53:44 2016] [<ffffffff8115b24f>] ? __do_fault+0x3f/0xd0
[Tue May 24 12:53:44 2016] [<ffffffff8115eb52>] ? handle_mm_fault+0xda2/0x14d0
[Tue May 24 12:53:44 2016] [<ffffffff811e2831>] ? ep_poll+0x1f1/0x3e0
[Tue May 24 12:53:44 2016] [<ffffffff8104c608>] ? __do_page_fault+0x1a8/0x470
[Tue May 24 12:53:44 2016] [<ffffffff811e3bc8>] ? SyS_epoll_wait+0x88/0xe0
[Tue May 24 12:53:44 2016] [<ffffffff814d8cb2>] ? page_fault+0x22/0x30
[Tue May 24 12:53:44 2016] Mem-Info:
[Tue May 24 12:53:44 2016] active_anon:707452 inactive_anon:180081 isolated_anon:1
active_file:272 inactive_file:14 isolated_file:0
unevictable:0 dirty:0 writeback:393 unstable:0
slab_reclaimable:8830 slab_unreclaimable:9139
mapped:211 shmem:214 pagetables:4334 bounce:0
free:6245 free_pcp:0 free_cma:0
[Tue May 24 12:53:44 2016] Node 0 DMA free:15428kB min:28kB low:32kB high:40kB active_anon:164kB inactive_anon:208kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15984kB managed:15892kB mlocked:0kB dirty:0kB writeback:0kB mapped:8kB shmem:0kB slab_reclaimable:12kB slab_unreclaimable:68kB kernel_stack:0kB pagetables:4kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
[Tue May 24 12:53:44 2016] lowmem_reserve[]: 0 3406 3851 3851
[Tue May 24 12:53:44 2016] Node 0 DMA32 free:8656kB min:6920kB low:8648kB high:10380kB active_anon:2636948kB inactive_anon:527096kB active_file:840kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:3563748kB managed:3489828kB mlocked:0kB dirty:0kB writeback:0kB mapped:616kB shmem:760kB slab_reclaimable:31080kB slab_unreclaimable:27088kB kernel_stack:1184kB pagetables:14920kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:5156 all_unreclaimable? yes
[Tue May 24 12:53:44 2016] lowmem_reserve[]: 0 0 444 444
[Tue May 24 12:53:44 2016] Node 0 Normal free:896kB min:900kB low:1124kB high:1348kB active_anon:192696kB inactive_anon:193020kB active_file:248kB inactive_file:192kB unevictable:0kB isolated(anon):4kB isolated(file):0kB present:522240kB managed:454900kB mlocked:0kB dirty:0kB writeback:1572kB mapped:220kB shmem:96kB slab_reclaimable:4228kB slab_unreclaimable:9400kB kernel_stack:1072kB pagetables:2412kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:2688 all_unreclaimable? yes
[Tue May 24 12:53:44 2016] lowmem_reserve[]: 0 0 0 0
[Tue May 24 12:53:44 2016] Node 0 DMA: 6*4kB (UEM) 3*8kB (UEM) 5*16kB (UEM) 4*32kB (UM) 3*64kB (UEM) 1*128kB (E) 2*256kB (UE) 2*512kB (EM) 3*1024kB (UEM) 3*2048kB (EMR) 1*4096kB (M) = 15424kB
[Tue May 24 12:53:44 2016] Node 0 DMA32: 565*4kB (UEM) 25*8kB (UM) 8*16kB (UM) 8*32kB (M) 2*64kB (M) 0*128kB 0*256kB 0*512kB 0*1024kB 1*2048kB (R) 1*4096kB (R) = 9116kB
[Tue May 24 12:53:44 2016] Node 0 Normal: 69*4kB (UEMR) 17*8kB (UMR) 12*16kB (MR) 6*32kB (MR) 3*64kB (R) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 988kB
[Tue May 24 12:53:44 2016] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
[Tue May 24 12:53:44 2016] 1291 total pagecache pages
[Tue May 24 12:53:44 2016] 627 pages in swap cache
[Tue May 24 12:53:44 2016] Swap cache stats: add 1028806, delete 1028179, find 5007/6368
[Tue May 24 12:53:44 2016] Free swap = 0kB
[Tue May 24 12:53:44 2016] Total swap = 4101116kB
[Tue May 24 12:53:44 2016] 1025493 pages RAM
[Tue May 24 12:53:44 2016] 0 pages HighMem/MovableOnly
[Tue May 24 12:53:44 2016] 35338 pages reserved
[Tue May 24 12:53:44 2016] 0 pages hwpoisoned
[Tue May 24 12:53:44 2016] [ pid ] uid tgid total_vm rss nr_ptes nr_pmds swapents oom_score_adj name
[Tue May 24 12:53:44 2016] [ 199] 0 199 8242 10 20 3 61 0 systemd-journal
[Tue May 24 12:53:44 2016] [ 214] 0 214 10011 48 22 3 76 -1000 systemd-udevd
[Tue May 24 12:53:44 2016] [ 8844] 0 8844 6876 0 18 4 70 0 cron
[Tue May 24 12:53:44 2016] [ 8845] 0 8845 13796 4 34 3 168 -1000 sshd
[Tue May 24 12:53:44 2016] [ 8846] 107 8846 16319 11 33 3 164 0 zabbix_agentd
[Tue May 24 12:53:44 2016] [ 8848] 0 8848 4964 0 14 4 69 0 systemd-logind
[Tue May 24 12:53:44 2016] [ 8852] 105 8852 10531 30 25 3 71 -900 dbus-daemon
[Tue May 24 12:53:44 2016] [ 8859] 0 8859 64580 4 28 3 247 0 rsyslogd
[Tue May 24 12:53:44 2016] [ 8861] 0 8861 1064 4 7 3 36 0 acpid
[Tue May 24 12:53:44 2016] [ 8921] 107 8921 16319 119 32 3 153 0 zabbix_agentd
[Tue May 24 12:53:44 2016] [ 8922] 107 8922 16319 25 32 3 155 0 zabbix_agentd
[Tue May 24 12:53:44 2016] [ 8923] 107 8923 16319 17 32 3 167 0 zabbix_agentd
[Tue May 24 12:53:44 2016] [ 8924] 107 8924 16319 48 32 3 147 0 zabbix_agentd
[Tue May 24 12:53:44 2016] [ 8925] 107 8925 16319 31 32 3 153 0 zabbix_agentd
[Tue May 24 12:53:44 2016] [ 8928] 106 8928 6726 220 17 3 92 0 zebra
[Tue May 24 12:53:44 2016] [ 8939] 106 8939 7925 245 18 3 316 0 bgpd
[Tue May 24 12:53:44 2016] [ 8953] 0 8953 4341 13 14 3 37 0 watchquagga
[Tue May 24 12:53:44 2016] [ 8964] 0 8964 5054 4 15 3 64 0 xinetd
[Tue May 24 12:53:44 2016] [ 8977] 0 8977 3604 4 12 3 39 0 agetty
[Tue May 24 12:53:44 2016] [10053] 0 10053 2022613 885488 3762 11 1020507 0 accel-pppd
[Tue May 24 12:53:44 2016] [11477] 0 11477 8140 56 20 3 68 0 systemd-udevd
[Tue May 24 12:53:44 2016] [11478] 0 11478 8140 56 20 3 68 0 systemd-udevd
[Tue May 24 12:53:44 2016] [11479] 0 11479 8140 56 20 3 68 0 systemd-udevd
[Tue May 24 12:53:44 2016] [11480] 0 11480 8140 56 20 3 68 0 systemd-udevd
[Tue May 24 12:53:44 2016] [11488] 0 11488 8140 56 20 3 66 0 systemd-udevd
[Tue May 24 12:53:44 2016] [11489] 0 11489 8140 57 20 3 65 0 systemd-udevd
[Tue May 24 12:53:44 2016] Out of memory: Kill process 10053 (accel-pppd) score 919 or sacrifice child
[Tue May 24 12:53:44 2016] Killed process 10053 (accel-pppd) total-vm:8090452kB, anon-rss:3541952kB, file-rss:0kB

Также настрочило логов на 10 GB, в основном записи типа [2016-05-24 12:46:51]: error: ppp23: ppp_unit_read: short read 0

Пока ничего не предпринимал. Пробовать ядро старее или откатить accel?
dimka88
Posts: 866
Joined: 13 Oct 2014, 05:51
Contact:

Re: падение accel-ppp d06572417e1e500d7bd56859335d2f7dd0f3fd8f

Post by dimka88 »

Как вариант старое ядро. Но если хотите именно на новых ядрах, то необходимы core-file и собрать accel-ppp с -DCMAKE_BUILD_TYPE=Debug и -DMEMDEBUG=TRUE
_longhorn_
Posts: 36
Joined: 03 Sep 2015, 14:37

Re: падение accel-ppp d06572417e1e500d7bd56859335d2f7dd0f3fd8f

Post by _longhorn_ »

dimka88 wrote:Как вариант старое ядро. Но если хотите именно на новых ядрах, то необходимы core-file и собрать accel-ppp с -DCMAKE_BUILD_TYPE=Debug и -DMEMDEBUG=TRUE
Спасибо, собрал с дебагом. Пока писал он уже умер, но корки нету. Не подскажите что я делаю не так?
_longhorn_
Posts: 36
Joined: 03 Sep 2015, 14:37

Re: падение accel-ppp d06572417e1e500d7bd56859335d2f7dd0f3fd8f

Post by _longhorn_ »

Пробовал на 4.4.11 - тот же результат. Собрал 3.14.70 и пока работает. Видно в новых ядрах совсем pppoe сломали.
dimka88
Posts: 866
Joined: 13 Oct 2014, 05:51
Contact:

Re: падение accel-ppp d06572417e1e500d7bd56859335d2f7dd0f3fd8f

Post by dimka88 »

_longhorn_ wrote:
dimka88 wrote:Как вариант старое ядро. Но если хотите именно на новых ядрах, то необходимы core-file и собрать accel-ppp с -DCMAKE_BUILD_TYPE=Debug и -DMEMDEBUG=TRUE
Спасибо, собрал с дебагом. Пока писал он уже умер, но корки нету. Не подскажите что я делаю не так?
В sysctl.conf

Code: Select all

kernel.core_uses_pid = 1
kernel.core_pattern = /root/core-%e-%p
и выполнить

Code: Select all

ulimit -c unlimited
_longhorn_
Posts: 36
Joined: 03 Sep 2015, 14:37

Re: падение accel-ppp d06572417e1e500d7bd56859335d2f7dd0f3fd8f

Post by _longhorn_ »

Согласно совету из соседней темы обновился до свежего мастера, жду.
_longhorn_
Posts: 36
Joined: 03 Sep 2015, 14:37

Re: падение accel-ppp d06572417e1e500d7bd56859335d2f7dd0f3fd8f

Post by _longhorn_ »

После обновления пока не падал. Похоже проблема решена :) Спасибо!
Post Reply