Here are the easy steps to reproduce a bug with the radius' queue implementation that I experienced in production from this post: viewtopic.php?f=11&p=4318#p4318
1. Take a default accel-ppp config with radius authentication (For easy demonstration to trigger the bug quickly low req-limit=5 setting is used):
Code: Select all
[radius]
nas-identifier=accel-ppp
nas-ip-address=192.168.89.200
gw-ip-address=192.168.89.200
server=192.168.89.201,secret1,auth-port=1812,acct-port=1813,req-limit=5
dae-server=192.168.89.200:3799,secret2
acct-interim-interval=30
acct-timeout=0
verbose=1
Code: Select all
pon test1
accel-ppp# show sessions
ifname | username | calling-sid | ip | rate-limit | type | comp | state | uptime
--------+----------+-------------------+------------+------------+-------+------+--------+----------
ppp0 | test1 | 08:00:27:20:e3:21 | 10.3.3.160 | | pppoe | | active | 00:01:12
Code: Select all
ip addr add blackhole 192.168.89.201
Code: Select all
pon test2
pon test3
pon test4
Code: Select all
tail -f /var/log/accel-ppp/debug.log
[2018-02-18 21:01:55.829] enp0s3: f6d503d584def8c6: radius(1): queue 0x7f6d80013cf8
Code: Select all
ip addr del blackhole 192.168.89.201
Code: Select all
pon test5
pon test6
pon test7
Code: Select all
accel-ppp# show sessions
ifname | username | calling-sid | ip | rate-limit | type | comp | state | uptime
--------+----------+-------------------+------------+------------+-------+------+--------+----------
ppp0 | test1 | 08:00:27:20:e3:21 | 10.3.3.160 | | pppoe | | active | 00:05:15
| | 08:00:27:20:e3:21 | | | pppoe | | start | 00:02:21
| | 08:00:27:20:e3:21 | | | pppoe | | start | 00:02:14
| | 08:00:27:20:e3:21 | | | pppoe | | start | 00:02:14
| | 08:00:27:20:e3:21 | | | pppoe | | start | 00:01:09
| | 08:00:27:20:e3:21 | | | pppoe | | start | 00:01:09
| | 08:00:27:20:e3:21 | | | pppoe | | start | 00:01:09
Code: Select all
[2018-02-18 20:59:02.025] enp0s3: f6d503d584def8c0: radius(1): req_enter 1
[2018-02-18 20:59:02.268] enp0s3: f6d503d584def8c0: radius(1): req_exit 0
[2018-02-18 20:59:02.277] ppp0: f6d503d584def8c0: radius(1): req_enter 1
[2018-02-18 20:59:02.294] ppp0: f6d503d584def8c0: radius(1): req_exit 0
[2018-02-18 21:00:55.664] enp0s3: f6d503d584def8c1: radius:connect: Invalid argument
[2018-02-18 21:00:55.664] enp0s3: f6d503d584def8c1: radius: no available servers
[2018-02-18 21:00:56.528] enp0s3: f6d503d584def8c2: radius(1): req_enter 2
[2018-02-18 21:00:56.528] enp0s3: f6d503d584def8c2: radius:connect: Invalid argument
[2018-02-18 21:00:56.528] enp0s3: f6d503d584def8c2: radius: no available servers
[2018-02-18 21:00:57.408] enp0s3: f6d503d584def8c3: radius(1): req_enter 3
[2018-02-18 21:00:57.408] enp0s3: f6d503d584def8c3: radius:connect: Invalid argument
[2018-02-18 21:00:57.408] enp0s3: f6d503d584def8c3: radius: no available servers
[2018-02-18 21:01:02.298] ppp0: f6d503d584def8c0: radius(1): req_enter 4
[2018-02-18 21:01:02.298] ppp0: f6d503d584def8c0: radius:connect: Invalid argument
[2018-02-18 21:01:02.298] ppp0: f6d503d584def8c0: radius: no available servers
[2018-02-18 21:01:25.743] enp0s3: f6d503d584def8c4: radius(1): req_enter 5
[2018-02-18 21:01:25.743] enp0s3: f6d503d584def8c4: radius:connect: Invalid argument
[2018-02-18 21:01:25.743] enp0s3: f6d503d584def8c4: radius: no available servers
[2018-02-18 21:01:32.298] ppp0: f6d503d584def8c0: radius(1): queue 0x7f6d80007d88
[2018-02-18 21:01:55.829] enp0s3: f6d503d584def8c5: radius(1): queue 0x7f6d80013cf8
[2018-02-18 21:02:02.590] enp0s3: f6d503d584def8c7: radius(1): queue 0x7f6d80017098
[2018-02-18 21:02:17.617] enp0s3: f6d503d584def8c8: radius(1): queue 0x7f6d800182a8
[2018-02-18 21:02:17.617] enp0s3: f6d503d584def8c9: radius(1): queue 0x7f6d800122c4
................
Sessions get stuck after radius' server "request count" reaches "req-limit" parameter. Requests are queued indefinitely and never removed even after radius communication is restored causing authentication outage. accel-cmd show stat shows increasing "queue length" parameter. The current necessary step to fix it is to restart accel-ppp daemon.
To not hit the bug it is necessary to setup "req-limit=0" parameter in radius configuration to skip request queuing altogether. It is adviced to use it in conjunction with the "connlimit" module.