Radius request queue bug. Steps to reproduce included.

Bug reports
Post Reply
kompex
Posts: 5
Joined: 30 Jun 2017, 07:00

Radius request queue bug. Steps to reproduce included.

Post by kompex »

Hello.

Here are the easy steps to reproduce a bug with the radius' queue implementation that I experienced in production from this post: viewtopic.php?f=11&p=4318#p4318

1. Take a default accel-ppp config with radius authentication (For easy demonstration to trigger the bug quickly low req-limit=5 setting is used):

Code: Select all

 
[radius]
nas-identifier=accel-ppp
nas-ip-address=192.168.89.200
gw-ip-address=192.168.89.200
server=192.168.89.201,secret1,auth-port=1812,acct-port=1813,req-limit=5 
dae-server=192.168.89.200:3799,secret2
acct-interim-interval=30
acct-timeout=0
verbose=1
2. Start a single pppoe session that connects to accel-ppp (to simulate already existing connections):

Code: Select all

pon test1

accel-ppp# show sessions
 ifname | username |    calling-sid    |     ip     | rate-limit | type  | comp | state  |  uptime  
--------+----------+-------------------+------------+------------+-------+------+--------+----------
 ppp0   | test1    | 08:00:27:20:e3:21 | 10.3.3.160 |            | pppoe |      | active | 00:01:12 
3. On the accel-ppp server null route the radius ip address to simulate lost connection to radius server

Code: Select all

ip addr add blackhole 192.168.89.201
4. Start a few other pppoe sessions.

Code: Select all

pon test2
pon test3
pon test4
5. Wait until radius server's "request count" parameter in accel-cmd show stat output reaches "req-limit=5" parameter in /etc/accel-ppp.conf. Then requests will be queued.

Code: Select all

tail -f /var/log/accel-ppp/debug.log
[2018-02-18 21:01:55.829] enp0s3: f6d503d584def8c6: radius(1): queue 0x7f6d80013cf8
6. Restore communication to radius server

Code: Select all

ip addr del blackhole 192.168.89.201
7. Start a few other pppoe sessions.

Code: Select all

pon test5
pon test6
pon test7
8. Notice that all sessions are still stuck even after restoring communication. Radius still keeps queuing requests from new connections (test5, test6 and test7).

Code: Select all


accel-ppp# show sessions
 ifname | username |    calling-sid    |     ip     | rate-limit | type  | comp | state  |  uptime  
--------+----------+-------------------+------------+------------+-------+------+--------+----------
 ppp0   | test1    | 08:00:27:20:e3:21 | 10.3.3.160 |            | pppoe |      | active | 00:05:15 
        |          | 08:00:27:20:e3:21 |            |            | pppoe |      | start  | 00:02:21 
        |          | 08:00:27:20:e3:21 |            |            | pppoe |      | start  | 00:02:14 
        |          | 08:00:27:20:e3:21 |            |            | pppoe |      | start  | 00:02:14 
        |          | 08:00:27:20:e3:21 |            |            | pppoe |      | start  | 00:01:09
        |          | 08:00:27:20:e3:21 |            |            | pppoe |      | start  | 00:01:09
        |          | 08:00:27:20:e3:21 |            |            | pppoe |      | start  | 00:01:09
Radius log:

Code: Select all

[2018-02-18 20:59:02.025] enp0s3: f6d503d584def8c0: radius(1): req_enter 1
[2018-02-18 20:59:02.268] enp0s3: f6d503d584def8c0: radius(1): req_exit 0
[2018-02-18 20:59:02.277] ppp0: f6d503d584def8c0: radius(1): req_enter 1
[2018-02-18 20:59:02.294] ppp0: f6d503d584def8c0: radius(1): req_exit 0  
[2018-02-18 21:00:55.664] enp0s3: f6d503d584def8c1: radius:connect: Invalid argument
[2018-02-18 21:00:55.664] enp0s3: f6d503d584def8c1: radius: no available servers
[2018-02-18 21:00:56.528] enp0s3: f6d503d584def8c2: radius(1): req_enter 2
[2018-02-18 21:00:56.528] enp0s3: f6d503d584def8c2: radius:connect: Invalid argument
[2018-02-18 21:00:56.528] enp0s3: f6d503d584def8c2: radius: no available servers
[2018-02-18 21:00:57.408] enp0s3: f6d503d584def8c3: radius(1): req_enter 3
[2018-02-18 21:00:57.408] enp0s3: f6d503d584def8c3: radius:connect: Invalid argument
[2018-02-18 21:00:57.408] enp0s3: f6d503d584def8c3: radius: no available servers
[2018-02-18 21:01:02.298] ppp0: f6d503d584def8c0: radius(1): req_enter 4
[2018-02-18 21:01:02.298] ppp0: f6d503d584def8c0: radius:connect: Invalid argument
[2018-02-18 21:01:02.298] ppp0: f6d503d584def8c0: radius: no available servers
[2018-02-18 21:01:25.743] enp0s3: f6d503d584def8c4: radius(1): req_enter 5
[2018-02-18 21:01:25.743] enp0s3: f6d503d584def8c4: radius:connect: Invalid argument
[2018-02-18 21:01:25.743] enp0s3: f6d503d584def8c4: radius: no available servers
[2018-02-18 21:01:32.298] ppp0: f6d503d584def8c0: radius(1): queue 0x7f6d80007d88
[2018-02-18 21:01:55.829] enp0s3: f6d503d584def8c5: radius(1): queue 0x7f6d80013cf8
[2018-02-18 21:02:02.590] enp0s3: f6d503d584def8c7: radius(1): queue 0x7f6d80017098
[2018-02-18 21:02:17.617] enp0s3: f6d503d584def8c8: radius(1): queue 0x7f6d800182a8
[2018-02-18 21:02:17.617] enp0s3: f6d503d584def8c9: radius(1): queue 0x7f6d800122c4
................
Conclusion:

Sessions get stuck after radius' server "request count" reaches "req-limit" parameter. Requests are queued indefinitely and never removed even after radius communication is restored causing authentication outage. accel-cmd show stat shows increasing "queue length" parameter. The current necessary step to fix it is to restart accel-ppp daemon.

To not hit the bug it is necessary to setup "req-limit=0" parameter in radius configuration to skip request queuing altogether. It is adviced to use it in conjunction with the "connlimit" module.
dimka88
Posts: 866
Joined: 13 Oct 2014, 05:51
Contact:

Re: Radius request queue bug. Steps to reproduce included.

Post by dimka88 »

Hi, kompex. This is a good research.
Dmitry
Администратор
Posts: 954
Joined: 09 Oct 2014, 10:06

Re: Radius request queue bug. Steps to reproduce included.

Post by Dmitry »

thank you
commit 956cb489628621a2298937afeaed57cccf605bb2
Post Reply