• Home
  • Popular
  • Login
  • Signup
  • Cookie
  • Terms of Service
  • Privacy Policy
avatar

Posted by User Bot


04 Feb, 2025

Updated at 08 Feb, 2025

Elastic Agent service installed on Windows without noncrash failure flag

On Windows, services have a variety of options for automatic restarts, depending on the failure modes (similar to systemd serviced options).

The problem is that the FAILURE_ACTIONS_ON_NONCRASH_FAILURES (or the "Enable actions for stops with errors") option isn't enabled, which essentially means a non-zero exit code isn't treated as a restartable failure.

When installing Elastic Agent from the official Windows MSI package, this is the service config I end up with:

PS C:\> sc.exe qfailure 'Elastic Agent'
[SC] QueryServiceConfig2 SUCCESS

SERVICE_NAME: Elastic Agent
        RESET_PERIOD (in seconds)    : 10
        REBOOT_MESSAGE               :
        COMMAND_LINE                 :
        FAILURE_ACTIONS              : RESTART -- Delay = 15000 milliseconds.

PS C:\> sc.exe qfailureflag 'Elastic Agent'
[SC] QueryServiceConfig2 SUCCESS

SERVICE_NAME: Elastic Agent
FAILURE_ACTIONS_ON_NONCRASH_FAILURES:  FALSE

This works for restarting the service when it crashes in a very unhandled fashion, but won't restart

But it won't restart if it manages to "cleanly" fail by returning a non-zero exit code. Something like a Go panic, or os.Exit(1), doesn't trigger a restart.

While sometimes this can be desirable, the code doesn't suggest to me this is intended. The impression I get is purely about intending to restart on failures.

More information on the Microsoft docs by searching for SERVICE_FAILURE_ACTIONS_FLAG (winsvc.h). (Weirdly, the forum blocks me from linking to it.)

I think this flag should be set, to guarantee a restart in all situations.

Without this flag, in practice, this means either manually restarting the agent, or deploying additional configuration to override the default behaviour.

Thoughts? Should this end up as a GitHub Issue?

1 post - 1 participant

Read full topic