Dispatcher loses nodes and freezes

Viewing 4 posts - 1 through 4 (of 4 total)
  • 21st March 2019 at 3:03 pm #17339

    Hello!

    We were currently using Muster v.9.0.12 in limited mode (4 nodes max) on Fedora 29 workstation.
    About every 2 day dispatcher looses all connected nodes (they going grayed in Muster Console)
    In dispatcher log files we found:
    03/21/2019 06:38:39 – [SEVR] [CORE] Instance connection(192.168.1.230:61785) didn’t send an heartbeat in 720 seconds, scheduling disconnection!
    03/21/2019 06:38:39 – [SEVR] [CORE] Instance connection(CG_03:14) from (192.168.1.230:61785) terminated
    03/21/2019 06:40:02 – [SEVR] [CORE] Queuing new client connection(14) from 192.168.1.230:59599

    We changed “Heartbeat in seconds” to 180 second for all nodes, but without success.
    An attempt to make a soft-restart of the dispatcher causes the dispatcher to freeze. It can only be restarted by removing the process. After manual killing and running dispatcher all nodes are online and working properly.
    The same dispatcher behavior was be on Windows 7 workstation with Muster v.9.0.12 and v.9.0.10

    In addition, on version 9.0.12 under Fedora 29, no connection is established with notifiers on any of the nodes. Dispatcher log sample:
    03/21/2019 06:45:27 – [SEVR] [CORE] Accepting new notificator connection(17) from 192.168.1.230:59989
    03/21/2019 06:45:27 – [SEVR] [CORE] Notification connection(17) from (192.168.1.230:59989) terminated
    03/21/2019 06:45:31 – [SEVR] [CORE] Accepting new notificator connection(17) from 192.168.1.126:63604
    03/21/2019 06:45:31 – [SEVR] [CORE] Notification connection(17) from (192.168.1.126:63604) terminated
    03/21/2019 06:45:32 – [SEVR] [CORE] Accepting new notificator connection(17) from 192.168.1.188:53266
    03/21/2019 06:45:32 – [SEVR] [CORE] Notification connection(17) from (192.168.1.188:53266) terminated
    03/21/2019 06:45:33 – [SEVR] [CORE] Accepting new notificator connection(17) from 192.168.1.230:59996
    03/21/2019 06:45:33 – [SEVR] [CORE] Notification connection(17) from (192.168.1.230:59996) terminated
    03/21/2019 06:45:37 – [SEVR] [CORE] Accepting new notificator connection(17) from 192.168.1.126:63605
    03/21/2019 06:45:37 – [SEVR] [CORE] Notification connection(17) from (192.168.1.126:63605) terminated
    03/21/2019 06:45:39 – [SEVR] [CORE] Accepting new notificator connection(17) from 192.168.1.188:53267
    03/21/2019 06:45:39 – [SEVR] [CORE] Notification connection(17) from (192.168.1.188:53267) terminated
    03/21/2019 06:45:40 – [SEVR] [CORE] Accepting new notificator connection(17) from 192.168.1.230:60003
    03/21/2019 06:45:40 – [SEVR] [CORE] Notification connection(17) from (192.168.1.230:60003) terminated
    03/21/2019 06:45:44 – [SEVR] [CORE] Accepting new notificator connection(17) from 192.168.1.126:63609
    03/21/2019 06:45:44 – [SEVR] [CORE] Notification connection(17) from (192.168.1.126:63609) terminated
    and so on…
    In Windows 7, there was no such problem with notifiers.

    21st March 2019 at 3:21 pm #17340

    On Linux do the following , when you experience the issue , send a soft restart , this should close as many threads as possible. After about 10-15 locate the dispatcher pid and collect the output of gstack .
    Send us the output as well as a complete dispatcher.log trough our ticketing support. Also be sure that you do not have more than 4 hosts racing for connection and that the notificator port is correct either on client and server side

    25th March 2019 at 10:41 am #17366

    Thanks for the response. Ticket created.

    25th March 2019 at 10:54 am #17367

    For the ones with a similar problem, be sure that no more than 4 clients are trying to connect to the Dispatcher when running in evaluation mode.

Viewing 4 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic.