Массовая авторизация и radreqlimit

Radius related questions
brodayga
Posts: 94
Joined: 23 Oct 2014, 06:13

Массовая авторизация и radreqlimit

Post by brodayga » 09 Nov 2017, 12:09

Проблема с тем что при массовой авторизации (после перезагрузки например) происходит какойто ступор, но при этом через какое-то время(иногда через минуту, иногда через пол часа) может моментально авторизовать всех

Code: Select all

radius(1, 192.168.55.2):
state: active
  fail count: 0
  request count: 50
  queue length: 2840
  auth sent: 9355
  auth lost(total/5m/1m): 4492/541/97
  auth avg query time(5m/1m): 0/0 ms
 
через 30 сек

Code: Select all

radius(1, 192.168.55.2):
  state: active
  fail count: 0
  request count: 50
  queue length: 2836
  auth sent: 9405
  auth lost(total/5m/1m): 4544/524/98
  auth avg query time(5m/1m): 0/0 ms
 
При этом видно что время запросов среднее 0 , как я понимаю значит что все запросы теряются, но при этом если смотреть tcpdump радиус отвечает моментально
Спойлер
14:51:34.237355 IP 192.168.55.4.35274 > 192.168.55.2.1812: RADIUS, Access Request (1), id: 0x01 length: 121
14:51:34.237467 IP 192.168.55.4.43921 > 192.168.55.2.1812: RADIUS, Access Request (1), id: 0x01 length: 122
14:51:34.237582 IP 192.168.55.4.49290 > 192.168.55.2.1812: RADIUS, Access Request (1), id: 0x01 length: 122
14:51:34.240386 IP 192.168.55.2.1812 > 192.168.55.4.49290: RADIUS, Access Accept (2), id: 0x01 length: 38
14:51:34.241054 IP 192.168.55.2.1812 > 192.168.55.4.43921: RADIUS, Access Accept (2), id: 0x01 length: 39
14:51:34.241307 IP 192.168.55.2.1812 > 192.168.55.4.35274: RADIUS, Access Accept (2), id: 0x01 length: 39
14:51:34.327338 IP 192.168.55.4.45302 > 192.168.55.2.1812: RADIUS, Access Request (1), id: 0x01 length: 122
14:51:34.331028 IP 192.168.55.4.41401 > 192.168.55.2.1812: RADIUS, Access Request (1), id: 0x01 length: 123
14:51:34.331351 IP 192.168.55.2.1812 > 192.168.55.4.45302: RADIUS, Access Accept (2), id: 0x01 length: 38
14:51:34.333792 IP 192.168.55.2.1812 > 192.168.55.4.41401: RADIUS, Access Accept (2), id: 0x01 length: 37
14:51:34.455568 IP 192.168.55.4.47378 > 192.168.55.2.1812: RADIUS, Access Request (1), id: 0x01 length: 122
14:51:34.455758 IP 192.168.55.4.47058 > 192.168.55.2.1812: RADIUS, Access Request (1), id: 0x01 length: 122
14:51:34.458763 IP 192.168.55.2.1812 > 192.168.55.4.47058: RADIUS, Access Accept (2), id: 0x01 length: 39
14:51:34.458982 IP 192.168.55.2.1812 > 192.168.55.4.47378: RADIUS, Access Accept (2), id: 0x01 length: 38
14:51:35.550846 IP 192.168.55.4.33677 > 192.168.55.2.1812: RADIUS, Access Request (1), id: 0x01 length: 121
14:51:35.553966 IP 192.168.55.2.1812 > 192.168.55.4.33677: RADIUS, Access Accept (2), id: 0x01 length: 39
14:51:36.820320 IP 192.168.55.4.40998 > 192.168.55.2.1812: RADIUS, Access Request (1), id: 0x01 length: 120
14:51:36.823517 IP 192.168.55.2.1812 > 192.168.55.4.40998: RADIUS, Access Accept (2), id: 0x01 length: 39
14:51:37.037387 IP 192.168.55.4.49338 > 192.168.55.2.1812: RADIUS, Access Request (1), id: 0x01 length: 122
14:51:37.037410 IP 192.168.55.4.52571 > 192.168.55.2.1812: RADIUS, Access Request (1), id: 0x01 length: 122
14:51:37.038662 IP 192.168.55.4.46203 > 192.168.55.2.1812: RADIUS, Access Request (1), id: 0x01 length: 121
14:51:37.040079 IP 192.168.55.4.57684 > 192.168.55.2.1812: RADIUS, Access Request (1), id: 0x01 length: 121
14:51:37.040195 IP 192.168.55.2.1812 > 192.168.55.4.52571: RADIUS, Access Accept (2), id: 0x01 length: 38
14:51:37.040553 IP 192.168.55.2.1812 > 192.168.55.4.49338: RADIUS, Access Accept (2), id: 0x01 length: 39
14:51:37.040974 IP 192.168.55.2.1812 > 192.168.55.4.46203: RADIUS, Access Accept (2), id: 0x01 length: 39
14:51:37.042403 IP 192.168.55.2.1812 > 192.168.55.4.57684: RADIUS, Access Accept (2), id: 0x01 length: 38
14:51:37.113835 IP 192.168.55.4.41866 > 192.168.55.2.1812: RADIUS, Access Request (1), id: 0x01 length: 120
14:51:37.116303 IP 192.168.55.2.1812 > 192.168.55.4.41866: RADIUS, Access Accept (2), id: 0x01 length: 38
14:51:37.179430 IP 192.168.55.4.43462 > 192.168.55.2.1812: RADIUS, Access Request (1), id: 0x01 length: 121
14:51:37.182978 IP 192.168.55.2.1812 > 192.168.55.4.43462: RADIUS, Access Accept (2), id: 0x01 length: 38
14:51:37.273601 IP 192.168.55.4.42513 > 192.168.55.2.1812: RADIUS, Access Request (1), id: 0x01 length: 122
14:51:37.273795 IP 192.168.55.4.56158 > 192.168.55.2.1812: RADIUS, Access Request (1), id: 0x01 length: 121
14:51:37.275163 IP 192.168.55.2.1812 > 192.168.55.4.56158: RADIUS, Access Reject (3), id: 0x01 length: 23
14:51:37.277431 IP 192.168.55.2.1812 > 192.168.55.4.42513: RADIUS, Access Accept (2), id: 0x01 length: 38
14:51:37.380881 IP 192.168.55.4.52520 > 192.168.55.2.1812: RADIUS, Access Request (1), id: 0x01 length: 120
14:51:37.383898 IP 192.168.55.2.1812 > 192.168.55.4.52520: RADIUS, Access Accept (2), id: 0x01 length: 39
14:51:37.437572 IP 192.168.55.4.58629 > 192.168.55.2.1812: RADIUS, Access Request (1), id: 0x01 length: 122
14:51:37.440487 IP 192.168.55.2.1812 > 192.168.55.4.58629: RADIUS, Access Accept (2), id: 0x01 length: 38
14:51:37.522017 IP 192.168.55.4.49668 > 192.168.55.2.1812: RADIUS, Access Request (1), id: 0x01 length: 121
14:51:37.525073 IP 192.168.55.2.1812 > 192.168.55.4.49668: RADIUS, Access Accept (2), id: 0x01 length: 46
14:51:37.541408 IP 192.168.55.4.53648 > 192.168.55.2.1812: RADIUS, Access Request (1), id: 0x01 length: 121
14:51:37.544793 IP 192.168.55.2.1812 > 192.168.55.4.53648: RADIUS, Access Accept (2), id: 0x01 length: 46
14:51:37.636611 IP 192.168.55.4.34503 > 192.168.55.2.1812: RADIUS, Access Request (1), id: 0x01 length: 122
14:51:37.639543 IP 192.168.55.2.1812 > 192.168.55.4.34503: RADIUS, Access Accept (2), id: 0x01 length: 46
14:51:37.673187 IP 192.168.55.4.32878 > 192.168.55.2.1812: RADIUS, Access Request (1), id: 0x01 length: 123
14:51:37.676458 IP 192.168.55.2.1812 > 192.168.55.4.32878: RADIUS, Access Accept (2), id: 0x01 length: 39
14:51:37.823515 IP 192.168.55.4.50993 > 192.168.55.2.1812: RADIUS, Access Request (1), id: 0x01 length: 123
14:51:37.827048 IP 192.168.55.2.1812 > 192.168.55.4.50993: RADIUS, Access Accept (2), id: 0x01 length: 38
14:51:38.186891 IP 192.168.55.4.59239 > 192.168.55.2.1812: RADIUS, Access Request (1), id: 0x01 length: 120
14:51:38.190660 IP 192.168.55.2.1812 > 192.168.55.4.59239: RADIUS, Access Accept (2), id: 0x01 length: 46
14:51:38.409908 IP 192.168.55.4.39612 > 192.168.55.2.1812: RADIUS, Access Request (1), id: 0x01 length: 122
14:51:38.411621 IP 192.168.55.2.1812 > 192.168.55.4.39612: RADIUS, Access Reject (3), id: 0x01 length: 23
в логах аксела
Спойлер
.....
[2017-11-09 14:53:22]: debug: ipoe6566: radius(1): req_exit 49
[2017-11-09 14:53:22]: debug: ipoe6566: radius(1): wakeup 0x2c58b18
[2017-11-09 14:53:22]: warn: ipoe6566: radius: server(1) not responding
[2017-11-09 14:53:22]: warn: ipoe6566: radius: no available servers
[2017-11-09 14:53:22]: warn: ipoe6566: authentication failed
[2017-11-09 14:53:22]: debug: ipoe6566: terminate
[2017-11-09 14:53:22]: info: ipoe6566: ipoe: session finished
[2017-11-09 14:26:10]: info: ipoe6567: create interface ipoe6567 parent eth2
[2017-11-09 14:26:10]: debug: ipoe6567: radius(1): queue 0x7f1fd4725bc8
[2017-11-09 14:53:05]: debug: ipoe6567: radius(1): wakeup 0x7f1fd4725bc8 -1
[2017-11-09 14:53:05]: info: ipoe6567: send [RADIUS(1) Access-Request id=1 <User-Name "10.213.8.60"> <NAS-Identifier "accelipup"> <NAS-IP-Address 192.168.55
.4> <NAS-Port 9772> <NAS-Port-Id "ipoe6567"> <NAS-Port-Type Ethernet> <Calling-Station-Id "4c:5e:0c:14:1a:91"> <Called-Station-Id "eth2"> <Framed-IP-Address
10.213.8.60> <User-Password>]
[2017-11-09 14:53:23]: debug: ipoe6567: radius(1): req_exit 49
[2017-11-09 14:53:23]: debug: ipoe6567: radius(1): wakeup 0x7f1fd07b5c48
[2017-11-09 14:53:23]: warn: ipoe6567: radius: server(1) not responding
[2017-11-09 14:53:23]: warn: ipoe6567: radius: no available servers
[2017-11-09 14:53:23]: warn: ipoe6567: authentication failed
[2017-11-09 14:53:24]: debug: ipoe6567: terminate
[2017-11-09 14:53:24]: info: ipoe6567: ipoe: session finished
[2017-11-09 14:26:10]: info: ipoe6568: create interface ipoe6568 parent eth2
[2017-11-09 14:26:10]: debug: ipoe6568: radius(1): queue 0x27aae18
[2017-11-09 14:53:06]: debug: ipoe6568: radius(1): wakeup 0x27aae18 -1
[2017-11-09 14:53:06]: info: ipoe6568: send [RADIUS(1) Access-Request id=1 <User-Name "10.38.0.20"> <NAS-Identifier "accelipup"> <NAS-IP-Address 192.168.55.
4> <NAS-Port 9773> <NAS-Port-Id "ipoe6568"> <NAS-Port-Type Ethernet> <Calling-Station-Id "4c:5e:0c:14:1a:91"> <Called-Station-Id "eth2"> <Framed-IP-Address 1
0.38.0.20> <User-Password>]
[2017-11-09 14:53:24]: debug: ipoe6568: radius(1): req_exit 49
[2017-11-09 14:53:24]: debug: ipoe6568: radius(1): wakeup 0x7f1fd4a2e198
[2017-11-09 14:53:24]: warn: ipoe6568: radius: server(1) not responding
[2017-11-09 14:53:24]: warn: ipoe6568: radius: no available servers
[2017-11-09 14:53:24]: warn: ipoe6568: authentication failed
[2017-11-09 14:53:24]: debug: ipoe6568: terminate
[2017-11-09 14:53:24]: info: ipoe6568: ipoe: session finished

настройки радиус

Code: Select all

[radius]
nas-identifier=accelipup
nas-ip-address=192.168.55.4
gw-ip-address=192.168.55.4
server=192.168.55.2,*******,auth-port=1812,acct-port=0,req-limit=50,fail-timeout=0
server=192.168.55.2,*******,auth-port=0,acct-port=1813,fail-timeout=0
dae-server=192.168.55.4:3799,********
verbose=1
timeout=1
max-try=1
acct-timeout=500
Ещё интересный момент что при этом аккаунтинг указанный отдельным сервером тоже перестаёт работать хотя не указана очередь , и судя по tcpdump тоже все прилетает моментально
Спойлер
radius(2, 192.168.55.2):
state: active
fail count: 0
request count: 0
queue length: 0
acct sent: 4800
acct lost(total/5m/1m): 87/1/0
acct avg query time(5m/1m): 10437/0 ms
interim sent: 94116
interim lost(total/5m/1m): 46984/4492/1912
interim avg query time(5m/1m): 0/0 ms

через 15 минут, хотя за минуту до этого всё было также плохо
Спойлер
ipoe:
starting: 0
active: 6633
delayed: 0
radius(1, 192.168.55.2):
state: active
fail count: 0
request count: 0
queue length: 0
auth sent: 13573
auth lost(total/5m/1m): 6504/1027/0
auth avg query time(5m/1m): 226/3 ms
radius(2, 192.168.55.2):
state: active
fail count: 0
request count: 0
queue length: 0
acct sent: 6837
acct lost(total/5m/1m): 90/3/0
acct avg query time(5m/1m): 216/4 ms
interim sent: 101209
interim lost(total/5m/1m): 48571/803/0
interim avg query time(5m/1m): 59/0 ms

Упустил, одно ядро при этом уходит в полку и возвращается после авторизации всех абонов

brodayga
Posts: 94
Joined: 23 Oct 2014, 06:13

Re: Массовая авторизация и radreqlimit

Post by brodayga » 09 Nov 2017, 13:33

Вроде похожа проблема в ветке viewtopic.php?f=11&t=762 .Но пишут что совсем зависает, но может просто не дождались пока отвиснет.

brodayga
Posts: 94
Joined: 23 Oct 2014, 06:13

Re: Массовая авторизация и radreqlimit

Post by brodayga » 21 Jan 2018, 18:40

данный вопрос актуален. Могу провести необходимые тесты и предоставить логи.

dimka88
Posts: 419
Joined: 13 Oct 2014, 05:51
Contact:

Re: Массовая авторизация и radreqlimit

Post by dimka88 » 21 Jan 2018, 19:24

Покажите полный accel-ppp.conf, используете ли вы compat?

kompex
Posts: 4
Joined: 30 Jun 2017, 07:00

Re: Массовая авторизация и radreqlimit

Post by kompex » 21 Jan 2018, 22:23

Hi.

Hope you don't mind me posting in english.

I had a similar problem described in: viewtopic.php?f=11&t=762, sessions would be stuck in 'start' state (even for a few hours if I remember) unless I restarted accel-ppp or reloaded the config with a higher req-limit (for radius) which would clear the 'queue length' attribute in 'show stat'

This was happening seemingly randomly (once or twice every 2 months). There was no server restart that would cause mass reauthentication of all sessions. Simply all new pppoe connections from CPEs would get out of the sudden stuck in 'start' session and never go into 'active' state. (accel-cmd show sessions). Maybe the sessions would eventually go into 'active' state like brodayga says, but I never waited to see. I would immediatelly take action if I noticed this happening. What was noticable is that the accel-ppp server affected by this stopped sending interim updates on all 'active' sessions present on the server.

I don't think I experienced this exact problem from viewtopic.php?f=11&t=762 since I run newer version with this commit included: [92af4b] https://sourceforge.net/p/accel-ppp/cod ... 572b54df9/

But I had recently a server reboot (power outage) that caused mass reauthentication [1300 sessions] and I was running on commit: [5dbd7c] (latest version as of posting) and I had a symptom that is pretty much what brodayga describes.

Sessions were stuck in 'start' state and I got impatient and used the trick from viewtopic.php?f=11&t=762, increased req-limit from 50 to 200, reloaded accel-ppp config and boom everyone connected.

The only special thing that I use is that on freeradius side is that I enforce Simultaneous-Use=1 and use a script "checkrad" https://freeradius.org/radiusd/man/checkrad.html to connect to accel-ppp via telnet and check if a session is up to clear stale sessions. Other than that my config is pretty much like from viewtopic.php?f=11&t=762

Without req-limit for radius in accel-ppp config this script would open tons of telnet connections to accel-ppp on freeradius server. This is why I keep req-limit=50.

It would be good to know if brodayga uses Simultaneus-Use=1 and a checkrad script too. Maybe the combination of accel-ppp's radius req-limit and freeradius' checkrad/Simultaneus-Use=1 attribute causes the problem that sessions get stuck for so long in 'start' state.

However what brodayga describes and what is described in viewtopic.php?f=11&t=762 seems to be related.

brodayga
Posts: 94
Joined: 23 Oct 2014, 06:13

Re: Массовая авторизация и radreqlimit

Post by brodayga » 22 Jan 2018, 07:09

compat не использую. Конфигурации разные . Версии тоже. Есть pptp, pppoe, ipoe : старт по дшсп и ipup.
Но везде одинаково если пользователей много и используется реклимит для рудиуса он входит в ступор. Именно в момент массовой авторизации на серверах где 2000 - 3000 и более начинает одновременно подключаться. Если не использовать реклимит то всё вроде нормально. Вот только билинг уже не держит нагрузку и начинает медленно отвечать. и тоже проблема. Если пользователи авторизовались дальнейшая работа проходит без проблем.
Если посмотртеть tcpdump то видно что ответы прилетают моментально.
14:51:34.237582 IP 192.168.55.4.49290 > 192.168.55.2.1812: RADIUS, Access Request (1), id: 0x01 length: 122
14:51:34.240386 IP 192.168.55.2.1812 > 192.168.55.4.49290: RADIUS, Access Accept (2), id: 0x01 length: 38
Но в статусе аксела и в логах аксела не видно что они обрабатываются.
могу попробовать собрать стенд с ииспользованием freeradius - для описания шагов как повторить данную ситуацию.

to kompex :
Your problem is similar. But I dont use freeradius.

dimka88
Posts: 419
Joined: 13 Oct 2014, 05:51
Contact:

Re: Массовая авторизация и radreqlimit

Post by dimka88 » 22 Jan 2018, 07:24

Попробуйте как временное решение использовать connlimit, если опишите как воспроизвести на стенде с freeradius будет замечательно и я думаю ускорит процесс устранения причины.

brodayga
Posts: 94
Joined: 23 Oct 2014, 06:13

Re: Массовая авторизация и radreqlimit

Post by brodayga » 22 Jan 2018, 07:58

conlimit только и помогает

Che
Posts: 1
Joined: 01 Feb 2018, 14:02

Re: Массовая авторизация и radreqlimit

Post by Che » 01 Feb 2018, 14:08

Здравствуйте
Точна такая же проблема.
ipup, при массовой авторизации ~3500 аксел встает колом, может простоять пол часа, потом все таки начать авторизовывать.

kompex
Posts: 4
Joined: 30 Jun 2017, 07:00

Re: Массовая авторизация и radreqlimit

Post by kompex » 17 Feb 2018, 14:57

So it doesn't really need an unexpected server reboot (mass reauthentication) to cause the sessions to get stuck. Here's what happened today on my accel-ppp nas.

I stopped bird (ospf) because I was upgrading to a new version. This caused no access to the radius servers for approximatelly a minute before the bird daemon upgrade got completed and the ospf routes for radius servers got relearned again.

using latest accel-ppp commit [54f225]

Here's my [radius] section (removed real ips):

Code: Select all

[radius]
nas-identifier=BNG2
nas-ip-address=100.100.100.2
gw-ip-address=100.100.100.2
server=192.168.100.2,secret1,auth-port=1812,acct-port=1813,req-limit=50,fail-timeout=0,max-fail=10
server=192.168.101.2,secret2,auth-port=1812,acct-port=0,req-limit=50,fail-timeout=0,max-fail=10,backup
dae-server=100.100.100.2:3799,secret3
acct-interim-interval=600
acct-timeout=0
verbose=0
Then after 40 minutes I come back and I notice that no new pppoe session managed to connect since then (they are stuck in start state).

Code: Select all

 
accel-ppp# show sessions
... skipped rest ...
 ppp303 | mm19xx     | xx:xx:xx:6e:e8:c6 | 10.xx.xx.236  | 12288/2048   | pppoe |      | active | 00:54:33
 ppp301 | op12xx     | xx:xx:xx:68:9d:79 | 10.xx.xx.74   | 10240/2048   | pppoe |      | active | 00:54:33
 ppp304 | ra24xx     | xx:xx:xx:90:8d:82 | 10.xx.xx.65   | 10240/2048   | pppoe |      | active | 00:54:07
 ppp293 | pa30xx     | xx:xx:xx:8e:b3:b5 | 10.xx.xx.180  | 12288/2048   | pppoe |      | active | 00:50:07
 ppp33  | gt31xx     | xx:xx:xx:da:6f:45 | 10.xx.xx.92   | 10240/2048   | pppoe |      | active | 00:48:50
 ppp227 | op36xx     | xx:xx:xx:a8:67:e4 | 10.xx.xx.206  | 10240/2048   | pppoe |      | active | 00:47:54
 ppp305 | pu35xx     | xx:xx:xx:84:15:62 | 10.xx.xx.170  | 10240/2048   | pppoe |      | active | 00:45:09
        |            | xx:xx:xx:ec:6f:03 |               |              | pppoe |      | start  | 00:38:03
        |            | xx:xx:xx:e4:09:4e |               |              | pppoe |      | start  | 00:36:22
        |            | xx:xx:xx:5a:85:10 |               |              | pppoe |      | start  | 00:35:59
        |            | xx:xx:xx:84:f4:99 |               |              | pppoe |      | start  | 00:35:27
        |            | xx:xx:xx:e4:09:4e |               |              | pppoe |      | start  | 00:34:13
        |            | xx:xx:xx:84:f4:99 |               |              | pppoe |      | start  | 00:31:35
        |            | xx:xx:xx:e4:09:4e |               |              | pppoe |      | start  | 00:26:37
        |            | xx:xx:xx:34:9f:f5 |               |              | pppoe |      | start  | 00:25:03
        |            | xx:xx:xx:c8:d7:ef |               |              | pppoe |      | start  | 00:23:16
        |            | xx:xx:xx:4c:d0:ff |               |              | pppoe |      | start  | 00:22:31
        |            | xx:xx:xx:76:53:4a |               |              | pppoe |      | start  | 00:22:14
        |            | xx:xx:xx:ce:28:92 |               |              | pppoe |      | start  | 00:21:23
        |            | xx:xx:xx:da:6a:1c |               |              | pppoe |      | start  | 00:20:27
        |            | xx:xx:xx:e4:09:4e |               |              | pppoe |      | start  | 00:20:26
        |            | xx:xx:xx:84:fb:20 |               |              | pppoe |      | start  | 00:19:58
        |            | xx:xx:xx:e4:09:64 |               |              | pppoe |      | start  | 00:19:53
        |            | xx:xx:xx:44:cb:4c |               |              | pppoe |      | start  | 00:19:01
        |            | xx:xx:xx:da:b0:04 |               |              | pppoe |      | start  | 00:15:14
        |            | xx:xx:xx:a4:84:e3 |               |              | pppoe |      | start  | 00:14:11
        |            | xx:xx:xx:e2:56:9d |               |              | pppoe |      | start  | 00:13:58
        |            | xx:xx:xx:ce:2f:4b |               |              | pppoe |      | start  | 00:13:44
        |            | xx:xx:xx:84:f4:99 |               |              | pppoe |      | start  | 00:13:36
        |            | xx:xx:xx:2c:26:1d |               |              | pppoe |      | start  | 00:13:09
        |            | xx:xx:xx:a0:da:84 |               |              | pppoe |      | start  | 00:12:04
        |            | xx:xx:xx:ce:2f:4b |               |              | pppoe |      | start  | 00:11:07
        |            | xx:xx:xx:96:3f:49 |               |              | pppoe |      | start  | 00:10:27
        |            | xx:xx:xx:ce:28:92 |               |              | pppoe |      | start  | 00:07:35
        |            | xx:xx:xx:da:6a:1c |               |              | pppoe |      | start  | 00:06:25
        |            | xx:xx:xx:ce:28:92 |               |              | pppoe |      | start  | 00:04:04
        |            | xx:xx:xx:62:22:fe |               |              | pppoe |      | start  | 00:03:10
        |            | xx:xx:xx:da:6a:1c |               |              | pppoe |      | start  | 00:03:07
        |            | xx:xx:xx:6e:a7:44 |               |              | pppoe |      | start  | 00:01:35
        |            | xx:xx:xx:da:6a:1c |               |              | pppoe |      | start  | 00:00:26
        

Notice how the queue length for primary radius server is higher than the request limit. Also notice that the backup radius server was never sent any requests for auth.

Code: Select all

accel-ppp# show stat
uptime: 0.21:08:14
cpu: 2%
mem(rss/virt): 13992/459348 kB
core:
  mempool_allocated: 2921080
  mempool_available: 205292
  thread_count: 4
  thread_active: 1
  context_count: 336
  context_sleeping: 0
  context_pending: 0
  md_handler_count: 623
  md_handler_pending: 0
  timer_count: 633
  timer_pending: 0
sessions:
  starting: 33
  active: 292
  finishing: 0
pppoe:
  starting: 0
  active: 325
  delayed PADO: 0
  recv PADI: 5882
  drop PADI: 2017
  sent PADO: 3865
  recv PADR(dup): 579(0)
  sent PADS: 579
  filtered: 0
radius(1, 192.168.100.2):
  state: active
  fail count: 0
  request count: 50
  queue length: 338
  auth sent: 535
  auth lost(total/5m/1m): 0/0/0
  auth avg query time(5m/1m): 0/0 ms
  acct sent: 763
  acct lost(total/5m/1m): 0/0/0
  acct avg query time(5m/1m): 0/0 ms
  interim sent: 15621
  interim lost(total/5m/1m): 2/0/0
  interim avg query time(5m/1m): 0/0 ms
radius(2, 192.168.101.2):
  state: active
  fail count: 0
  request count: 0
  queue length: 0
  auth sent: 0
  auth lost(total/5m/1m): 0/0/0
  auth avg query time(5m/1m): 0/0 ms
  

Modules that I make use of:

Code: Select all

[modules]
log_file
pppoe
auth_chap_md5
radius
shaper
net-snmp
connlimit
It is like if communication to radius stops, then it never gets reestablished by accel-ppp and no one can be authenticated and gets stuck. (Auth requests get queued up and go beyond req-limit, never getting cleared)

The suggestion with using [connlimit] module for mass reauthentication somewhat helped, but this problem still persists from time to time unfortunately like this event above has shown.

I had to restart accel-ppp in this case to reauthenticate users.

======== EDIT ========

This can be easily reproduced. Just blackhole the radius ip addresses on the accel-ppp server to simulate lost connection.

e.g:

ip route add blackhole IP_OF_RADIUS_SERVER1
ip route add blackhole IP_OF_RADIUS_SERVER2

Now start a dozen or so pppoe sessions.

Then remove the blackhole routes after a while (e.g 2 or 3 minutes):

ip route del blackhole IP_OF_RADIUS_SERVER1
ip route del blackhole IP_OF_RADIUS_SERVER2

Then you will notice that all new pppoe sessions after the blackhole was applied are stuck in 'start' state. This includes new pppoe sessions that are attempting to connect after the blackhole was removed.

Only solution is pretty much the restart accel-ppp daemon to fix it.

Post Reply

Who is online

Users browsing this forum: No registered users and 1 guest