accel-ppp locking, 200% cpu and fills logs until there's no space left.

PPPoE related questions
zash
Posts: 12
Joined: 03 Apr 2016, 22:17

accel-ppp locking, 200% cpu and fills logs until there's no space left.

Post by zash »

Hi,

Noticed it twice already on one server.

accel-ppp had 200%+ CPU in top output.

uname -a
Linux pppoe2 4.4.12 #1 SMP Sun Jun 5 21:22:38 CEST 2016 x86_64 GNU/Linux (Debian Jessie 8.2) - happened on kernel 4.4.6 as well.

accel-ppp installed from commit: e3e804a0266840ef0a9cc13c4f6d1ac8fc68c676 (it happened on commit 06572417e1e500d7bd56859335d2f7dd0f3fd8f as well)

cat /etc/accel-ppp.conf

Code: Select all

[modules]
log_file
#log_syslog
pppoe
auth_chap_md5
radius
shaper
#connlimit

[core]
log-error=/var/log/accel-ppp/core.log
thread-count=4

[common]
#single-session=deny

[ppp]
verbose=1
min-mtu=1280
max-mtu=1500
#mtu=1492
#mru=1492
ipv4=require
ipv6=deny
ipv6-intf-id=0:0:0:1
ipv6-peer-intf-id=0:0:0:2
ipv6-accept-peer-intf-id=1
lcp-echo-interval=12
lcp-echo-timeout=60
unit-cache=1
mppe=deny
ccp=0

[pppoe]
verbose=1
called-sid=mac
interface=eth5
interface=eth5.2011
pado-delay=0,50:500,60:550,70:600,80:650,90:700,100:750,110:800,120:850,130:900,140:950,150:1000,160:1050,170:1100,180:1150,190:1200,200:1250,210:1300,220:1350,230:1400,240:1450,250:1500,260:1550,270:1600,280:1650,290:1700,300:1750,310:1800,320:1850,330:1900,340:1950,350:2000,360:2050,370:2100,380:2150,390:2200,400:2250,410:2300,420:2350,430:2400,440:2450,450:2500,460:2550,470:2600,480:2650,490:2700,500:2750,510:2800,520:2850,530:2900,540:2950,550:3000
mppe=deny

[dns]
dns1=8.8.4.4
dns2=8.8.8.8

[radius]
nas-identifier=accel-ppp
nas-ip-address=10.1.1.2
gw-ip-address=10.1.1.1
server=10.1.1.2,4LTPv4BJ8gfdgffg324243oFDosdf,auth-port=1812,acct-port=1813,req-limit=50,fail-timeout=0,max-fail=10,weight=1
dae-server=10.1.1.2:3799,hFdxbwfnLDFFDSfs234oiOQdefdLS
acct-interim-interval=600
acct-timeout=0
verbose=0

[log]
log-file=/var/log/accel-ppp/accel-ppp.log
log-emerg=/var/log/accel-ppp/emerg.log
log-fail-file=/var/log/accel-ppp/auth-fail.log
#log-debug=/dev/stdout
#syslog=accel-pppd,daemon
#log-tcp=127.0.0.1:3000
copy=1
#color=1
#per-user-dir=per_user
#per-session-dir=per_session
#per-session=1
level=3

[shaper]
time-range=1,11:00-23:00
time-range=2,23:00-11:00
attr=Filter-Id
up-limiter=police
down-limiter=tbf
verbose=1
#down-limiter=htb
#leaf-qdisc=fq_codel 10240 1024 5 100 1514 ecn

[cli]
verbose=1
telnet=10.1.1.2:5010
tcp=127.0.0.1:2001
password=qPpLgfdFSfdsfds32423122Fgfd
history-file=/dev/null

[snmp]
master=0
agent-name=accel-ppp

[connlimit]
limit=10/min
burst=3
timeout=60

ls /var/log/accel-ppp

Code: Select all

drwxr-xr-x 12 root root 4.0K Jun  6 06:25 ..
-rw-------  1 root root 125G Jun 10 12:11 accel-ppp.log
-rw-------  1 root root 276K Jun 10 11:23 auth-fail.log
-rw-r--r--  1 root root    0 Dec 22 15:09 core.log
-rw-r--r--  1 root root 4.3K Apr 14 02:20 emerg.log

Log is filled with:
tail -f /var/log/accel-ppp.log

Code: Select all

[2016-06-10 06:25:03]: error: ppp891: ppp_unit_read: short read 0
[2016-06-10 06:25:03]: error: ppp891: ppp_unit_read: short read 0
[2016-06-10 06:25:03]: error: ppp891: ppp_unit_read: short read 0
[2016-06-10 06:25:03]: error: ppp891: ppp_unit_read: short read 0
[2016-06-10 06:25:03]: error: ppp891: ppp_unit_read: short read 0
[2016-06-10 06:25:03]: error: ppp891: ppp_unit_read: short read 0
[2016-06-10 06:25:03]: error: ppp891: ppp_unit_read: short read 0
[2016-06-10 06:25:03]: error: ppp891: ppp_unit_read: short read 0
[2016-06-10 06:25:03]: error: ppp891: ppp_unit_read: short read 0
[2016-06-10 06:25:03]: error: ppp891: ppp_unit_read: short read 0
[2016-06-10 06:25:03]: error: ppp891: ppp_unit_read: short read 0
[2016-06-10 06:25:03]: error: ppp891: ppp_unit_read: short read 0
[2016-06-10 06:25:03]: error: ppp891: ppp_unit_read: short read 0
[2016-06-10 06:25:03]: error: ppp891: ppp_unit_read: short read 0
[2016-06-10 06:25:03]: error: ppp891: ppp_unit_read: short read 0
[2016-06-10 06:25:03]: error: ppp891: ppp_unit_read: short read 0
[2016-06-10 06:25:03]: error: ppp891: ppp_unit_read: short read 0
[2016-06-10 06:25:03]: error: ppp891: ppp_unit_read: short read 0
[2016-06-10 06:25:03]: error: ppp891: ppp_unit_read: short read 0
[2016-06-10 06:25:03]: error: ppp891: ppp_unit_read: short read 0
[2016-06-10 06:25:03]: error: ppp891: ppp_unit_read: short read 0
When I checked 'ifconfig | grep ppp891' there was no such interface.

I couldn't telnet log to accel-ppp.

Other server reacted with an immediate OOM crash:

Code: Select all

 active_file:1050 inactive_file:766 isolated_file:0
 unevictable:0 dirty:931 writeback:1583 unstable:0
 slab_reclaimable:5726 slab_unreclaimable:24648
 mapped:771 shmem:633 pagetables:9472 bounce:0
 free:28950 free_pcp:0 free_cma:0
[  561.088832] Node 0 Normal free:45040kB min:45448kB low:56808kB high:68172kB active_anon:5181756kB inactive_anon:740148kB active_file:876kB inactive_file:804kB unevictable:0kB isolated(anon):128kB isolated(file):0kB present:6291456kB managed:6190296kB mlocked:0kB dirty:0kB writeback:968kB mapped:268kB shmem:68kB slab_reclaimable:13136kB slab_unreclaimable:52276kB kernel_stack:2176kB pagetables:18392kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:10440 all_unreclaimable? yes
[  561.088838] lowmem_reserve[]: 0 0 0 0
[  561.088841] Node 1 DMA free:15904kB min:116kB low:144kB high:172kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15988kB managed:15904kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
[  561.088845] lowmem_reserve[]: 0 3207 5924 5924
[  561.088848] Node 1 DMA32 free:34692kB min:24112kB low:30140kB high:36168kB active_anon:2662948kB inactive_anon:532344kB active_file:1160kB inactive_file:540kB unevictable:0kB isolated(anon):232kB isolated(file):0kB present:3378660kB managed:3302492kB mlocked:0kB dirty:1336kB writeback:1596kB mapped:1140kB shmem:1048kB slab_reclaimable:3508kB slab_unreclaimable:22772kB kernel_stack:368kB pagetables:10512kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:10620 all_unreclaimable? yes
[  561.088853] lowmem_reserve[]: 0 0 2717 2717
[  561.088855] Node 1 Normal free:20164kB min:20432kB low:25540kB high:30648kB active_anon:2144172kB inactive_anon:536172kB active_file:2164kB inactive_file:1720kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:2883584kB managed:2782912kB mlocked:0kB dirty:2388kB writeback:3768kB mapped:1676kB shmem:1416kB slab_reclaimable:6260kB slab_unreclaimable:23544kB kernel_stack:400kB pagetables:8984kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:23780 all_unreclaimable? yes
[  561.088860] lowmem_reserve[]: 0 0 0 0
[  561.088862] Node 0 Normal: 991*4kB (UME) 670*8kB (UME) 413*16kB (UME) 180*32kB (UME) 11*64kB (UE) 3*128kB (U) 0*256kB 2*512kB (UE) 0*1024kB 2*2048kB (ME) 4*4096kB (M) = 44284kB
[  561.088872] Node 1 DMA: 0*4kB 0*8kB 0*16kB 1*32kB (U) 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15904kB
[  561.088881] Node 1 DMA32: 413*4kB (ME) 324*8kB (UME) 174*16kB (UME) 66*32kB (ME) 28*64kB (UME) 8*128kB (ME) 1*256kB (U) 0*512kB 1*1024kB (E) 2*2048kB (UE) 4*4096kB (M) = 33716kB
[  561.088890] Node 1 Normal: 316*4kB (UME) 301*8kB (UME) 150*16kB (UME) 68*32kB (UME) 19*64kB (UME) 2*128kB (E) 3*256kB (UME) 1*512kB (U) 2*1024kB (ME) 1*2048kB (U) 1*4096kB (M) = 19192kB
[  561.088901] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
[  561.088902] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
[  561.088903] 4374 total pagecache pages
[  561.088905] 1429 pages in swap cache
[  561.088906] Swap cache stats: add 1458688, delete 1457259, find 1657/2981
[  561.088907] Free swap  = 0kB
[  561.088908] Total swap = 5820412kB
[  561.088909] 3142422 pages RAM
[  561.088910] 0 pages HighMem/MovableOnly
[  561.088911] 69521 pages reserved
[  561.088912] 0 pages hwpoisoned
[  561.088913] [ pid ]   uid  tgid total_vm      rss nr_ptes nr_pmds swapents oom_score_adj name
[  561.088916] [  741]     0   741    10220       54      22       3      133         -1000 systemd-udevd
[  561.088919] [  751]     0   751     8242      267      19       3       74             0 systemd-journal
[  561.088921] [ 1439]     0  1439     9270        0      23       3      104             0 rpcbind
[  561.088923] [ 1453]   106  1453     9320       53      24       3      149             0 rpc.statd
[  561.088925] [ 1465]     0  1465     5839        0      16       3       61             0 rpc.idmapd
[  561.088927] [ 1554]     0  1554     4756       17      15       3       45             0 atd
[  561.088928] [ 1558]   109  1558    14891      337      34       3      106             0 ulogd
[  561.088930] [ 1559]     0  1559     6876       16      19       3       71             0 cron
[  561.088932] [ 1561]     0  1561    13796       54      32       3      168         -1000 sshd
[  561.088934] [ 1569]     0  1569     4964        0      15       3       66             0 systemd-logind
[  561.088936] [ 1583]   105  1583    10531       54      25       3       96          -900 dbus-daemon
[  561.088938] [ 1598]     0  1598     3180        0      11       3       57             0 ntpd
[  561.088940] [ 1599]   108  1599     3218       19      12       3       61             0 ntpd
[  561.088942] [ 1600]   108  1600     4254       49      14       3       63             0 ntpd
[  561.088944] [ 1605]     0  1605    64668       54      28       3      235             0 rsyslogd
[  561.088946] [ 1607]     0  1607     1064       48       8       3       37             0 acpid
[  561.088947] [ 1634]     0  1634     3604       49      11       3       38             0 agetty
[  561.088950] [ 1639]     0  1639  4425172  2947007    8593      20  1437664             0 accel-pppd
[  561.088951] [ 1640]     0  1640     4843       90      14       3       33             0 irqbalance
[  561.088953] [ 1647]  1000  1647     4109      163      14       3      195             0 bird
[  561.088955] [ 1695]   110  1695    31196      530      34       3     2749             0 unbound
[  561.088957] [ 1768]     0  1768    22796        2      42       3      319             0 nginx
[  561.088959] [ 1769]    33  1769    22880        0      42       3      399             0 nginx
[  561.088961] [ 1770]    33  1770    22880        0      42       3      385             0 nginx
[  561.088963] [ 1859]    33  1859    23450      176      48       3     3137             0 uwsgi
[  561.088965] [ 2135]    33  2135    23450      173      49       3     3135             0 uwsgi
[  561.088966] [ 2150]   104  2150    13313       53      29       3      163             0 exim4
[  561.088968] [ 2167]    33  2167    23450       16      47       3     3145             0 uwsgi
[  561.088970] [ 2168]    33  2168    23450       25      48       3     3143             0 uwsgi
[  561.088972] [ 2169]    33  2169    23450       16      47       3     3145             0 uwsgi
[  561.088974] [ 2170]    33  2170    23450       23      48       3     3143             0 uwsgi
[  561.088976] [ 2215]     0  2215    20682       84      45       3      184             0 sshd
[  561.088977] [ 2246]     0  2246     6168      268      16       3      609             0 bash
[  561.088979] Out of memory: Kill process 1639 (accel-pppd) score 941 or sacrifice child
[  561.089028] Killed process 1639 (accel-pppd) total-vm:17700688kB, anon-rss:11787776kB, file-rss:252kB
Last edited by zash on 10 Jun 2016, 12:42, edited 3 times in total.
Dmitry
Администратор
Posts: 954
Joined: 09 Oct 2014, 10:06

Re: accel-ppp locking, 200% cpu and fills logs until there's no space left.

Post by Dmitry »

hi
it is kernel bug, so downgrade kernel to < 4.4
zash
Posts: 12
Joined: 03 Apr 2016, 22:17

Re: accel-ppp locking, 200% cpu and fills logs until there's no space left.

Post by zash »

Dmitry wrote:hi
it is kernel bug, so downgrade kernel to < 4.4
Hi.

In emergency I downgraded accel-ppp to 1.10.2 and it seems to work so far.

Previously no matter if I killed the accel-ppp process to restart it or reboot the system, accel-ppp would start with 200+% CPU and OOM crash or fill logs.

Funnily it has been working fine for the past 5 months or so on 2 dell poweredge r610 servers (I am pretty much running kernel 4.4 from the very beginning I use accel-ppp).

It is as if some rogue pppoe client started sending malformed packets and it caused accel-ppp crashing (for example if I stopped one pppoe server that was hanging, the clients would reconnect to the other one and it would crash too. It stopped doing this after downgrading accel-ppp to 1.10.2 on both servers)

Apart from downgrading accel-ppp to 1.10.2 I decreased mtu of my eth pppoe interfaces from 1508 to standard 1500, and changed the max pppoe mtu in accel-config back to 1492.

However if I see it happening again I will downgrade kernel to 4.3.(4.4 has nice multipath routing that I use in conjunction with BGP which would be a shame to lose).

Thanks
ddobrev
Posts: 8
Joined: 04 Jul 2016, 12:44

Re: accel-ppp locking, 200% cpu and fills logs until there's no space left.

Post by ddobrev »

Were you able to fix the problem with downgrade to 1.10.2, or you had to downgrade the kernel too?

Please, let me know, because I'm faced with a similar situation on my 3 servers.
ddobrev
Posts: 8
Joined: 04 Jul 2016, 12:44

Re: accel-ppp locking, 200% cpu and fills logs until there's no space left.

Post by ddobrev »

Also, why the problem is not yet fixed for 4.4/+ kernels ?
Dmitry
Администратор
Posts: 954
Joined: 09 Oct 2014, 10:06

Re: accel-ppp locking, 200% cpu and fills logs until there's no space left.

Post by Dmitry »

so did you try fresh 4.4 ?
hugleo
Posts: 59
Joined: 13 Apr 2016, 14:28

Re: accel-ppp locking, 200% cpu and fills logs until there's no space left.

Post by hugleo »

I'm running accel-ppp with debian backported kernel 4.4 and 4.5
I'm also using the 1.10.2 version.
No problems here.
Maybe the problem is with the git version?
Dmitry
Администратор
Posts: 954
Joined: 09 Oct 2014, 10:06

Re: accel-ppp locking, 200% cpu and fills logs until there's no space left.

Post by Dmitry »

try this commit 2de5323e832b0eaa3aaea748f60d4d481f0e3e87
ddobrev
Posts: 8
Joined: 04 Jul 2016, 12:44

Re: accel-ppp locking, 200% cpu and fills logs until there's no space left.

Post by ddobrev »

Will try tomorrow and report.
(Current kernel 4.4.0-24-generic & accel-ppp version 9bd003f95f8609b0330dca028e5a6b6e78bcabb8)
_longhorn_
Posts: 36
Joined: 03 Sep 2015, 14:37

Re: accel-ppp locking, 200% cpu and fills logs until there's no space left.

Post by _longhorn_ »

Dmitry wrote:try this commit 2de5323e832b0eaa3aaea748f60d4d481f0e3e87
Подобное наблюдал и у себя, писал об этом здесь. Попробовал у себя обновить, пока работает. Понаблюдаю и отпишусь. Спасибо большое!
Post Reply