Repeated monit alerts about apnscp or httpd not running (they are)

I keep receiving monit alerts that apnscp is not running, when I tail the logs on the server, there are no issues and ApisCP is running and everything works without issue. The pid file exists and has a proper timestamp.

I had the same issue on another server, a upcp -sb and reboot solved that but now monit is reporting httpd isn’t running and it is with no logged errors and a pid file that exists and has proper timestamps.

What else can I check or do to solve this issue?

Pushover shows:

[p101] apache
restart - process is not running
[p108] restart - failed protocol test [generic] at /usr/local/apnscp/storage/run/apnscp.sock -- cannot create a unix socket for /usr/local/apnscp/storage/run/apnscp/sock

The last message just came at 9:15am but the files have a timestamp of 9:02 from my last restart of the service.

httpd service is Apache, specific check is http://localhost/monit.html, which returns a blank 200 typically. If a check happens to go through during a reload or a restart or a child dying unexpectedly, then this can result in a false positive. Likewise a restart that replaces the pid file in /run/httpd.pid can result in this.

During each check batch

Check storage/logs/start.log and possibly access_log. This either occurs when a panel update coincides with a monitoring check, bots behaving badly (that the panel eventually blocks) or organically you’ve hit the worker count from multiple concurrent users interacting with the panel. To check if workers maxed,

grep 'Maxed workers' /usr/local/apnscp/storage/logs/start.log

These values may be raised in [apnscpd] => max_workers.

# double worker limit
cpcmd scope:set cp.config apnscpd max_workers 10

The Apache thing stopped erroring, no idea what was going on there.

As for the p108 server and apnscp service not running, there’s nothing signaling an error in either start.log or access_log.

Lots of 200 codes in the access_log.

::1 - - [14/Mar/2025:13:51:10 -0500] "HEAD / HTTP/1.1" 302 - "-" "apnscp Internal check"
::1 - - [14/Mar/2025:13:51:10 -0500] "HEAD /apps/login HTTP/1.1" 200 - "-" "apnscp Internal check"
::1 - - [14/Mar/2025:13:51:42 -0500] "GET / HTTP/1.1" 302 - "-" "Monit/5.34.5"
::1 - - [14/Mar/2025:13:52:12 -0500] "HEAD / HTTP/1.1" 302 - "-" "apnscp Internal check"
::1 - - [14/Mar/2025:13:52:12 -0500] "HEAD /apps/login HTTP/1.1" 200 - "-" "apnscp Internal check"
::1 - - [14/Mar/2025:13:53:14 -0500] "HEAD / HTTP/1.1" 302 - "-" "apnscp Internal check"
::1 - - [14/Mar/2025:13:53:14 -0500] "HEAD /apps/login HTTP/1.1" 200 - "-" "apnscp Internal check"
::1 - - [14/Mar/2025:13:53:48 -0500] "GET / HTTP/1.1" 302 - "-" "Monit/5.34.5"
::1 - - [14/Mar/2025:13:54:21 -0500] "HEAD / HTTP/1.1" 302 - "-" "apnscp Internal check"
::1 - - [14/Mar/2025:13:54:22 -0500] "HEAD /apps/login HTTP/1.1" 200 - "-" "apnscp Internal check"
::1 - - [14/Mar/2025:13:55:26 -0500] "HEAD / HTTP/1.1" 302 - "-" "apnscp Internal check"
::1 - - [14/Mar/2025:13:55:26 -0500] "HEAD /apps/login HTTP/1.1" 200 - "-" "apnscp Internal check"
::1 - - [14/Mar/2025:13:55:53 -0500] "GET / HTTP/1.1" 302 - "-" "Monit/5.34.5"
::1 - - [14/Mar/2025:13:56:29 -0500] "HEAD / HTTP/1.1" 302 - "-" "apnscp Internal check"
::1 - - [14/Mar/2025:13:56:29 -0500] "HEAD /apps/login HTTP/1.1" 200 - "-" "apnscp Internal check"
::1 - - [14/Mar/2025:13:57:30 -0500] "HEAD / HTTP/1.1" 302 - "-" "apnscp Internal check"
::1 - - [14/Mar/2025:13:57:30 -0500] "HEAD /apps/login HTTP/1.1" 200 - "-" "apnscp Internal check"
::1 - - [14/Mar/2025:13:57:59 -0500] "GET / HTTP/1.1" 302 - "-" "Monit/5.34.5"
::1 - - [14/Mar/2025:13:58:32 -0500] "HEAD / HTTP/1.1" 302 - "-" "apnscp Internal check"
::1 - - [14/Mar/2025:13:58:32 -0500] "HEAD /apps/login HTTP/1.1" 200 - "-" "apnscp Internal check"
::1 - - [14/Mar/2025:13:59:34 -0500] "HEAD / HTTP/1.1" 302 - "-" "apnscp Internal check"
::1 - - [14/Mar/2025:13:59:34 -0500] "HEAD /apps/login HTTP/1.1" 200 - "-" "apnscp Internal check"

Yet the error comes up every 2 minutes.

Hmm, if it’s happening every 2 minutes then it sounds like “apnscp/sock” is to blame. It should be “apnscp.sock”. Check /etc/monit.d for that service definition, then when corrected run systemctl restart monit.

Sorry, typo. Here’s a copy and paste.

p108: [p108] apnscp
restart - failed protocol test [generic] at /usr/local/apnscp/storage/run/apnscp.sock – Cannot create unix socket for /usr/local/apnscp/storage/run/apnscp.sock

The sock exists as does the pid and nothing is in the logs to show a failure.

Every 2 minutes I get the monit alert via Pushover, but ApisCP is running and the pid and sock files are the ones created this morning after a server reboot.

[root@p108 ~]# ll /usr/local/apnscp/storage/run/
total 67684
-rw-r--r--  1 root   root          6 Mar 14 08:02 apnscpd.pid
srw-------  1 apnscp apnscp        0 Mar 14 08:02 apnscp.sock

Service was not restarting indefinitely but rather the event was stuck so every notification interval a new event dispatched as if it had restarted.

# monit validate apnscp

Process 'apnscp'
  status                       OK
  monitoring status            Monitored
  monitoring mode              active
  on reboot                    start
  pid                          204418
  parent pid                   1
  uid                          0
  effective uid                0
  gid                          0
  uptime                       1d 4h 47m
  threads                      1
  children                     5
  cpu                          0.0%
  cpu total                    0.1%
  memory                       0.1% [56.9 MB]
  memory total                 0.7% [351.3 MB]
  security attribute           -
  filedescriptors              10 [1.0% of 1024 limit]
  total filedescriptors        84
  read bytes                   1.4 B/s [27.4 GB total]
  disk read bytes              0 B/s [18.9 MB total]
  disk read operations         0.7 reads/s [11848154 reads total]
  write bytes                  0.2 B/s [422.6 MB total]
  disk write bytes             0 B/s [442.2 MB total]
  disk write operations        0.0 writes/s [163589 writes total]
  port response time           80.063 ms to localhost:2082 type TCP/IP protocol HTTP
  unix socket response time    204.255 ms to /usr/local/apnscp/storage/run/apnscp.sock type TCP protocol generic
  data collected               Sat, 15 Mar 2025 12:49:26

First clue is from monit validate apnscp above. The uptime field indicates how long the service has remained running (no panel update, no restart on “major” releases).

Second clue is verifying events in /var/spool/monit:

# ls -la /var/spool/monit

-rw-r--r--   1 root root 283 Mar 14 00:03 1741928608_564175c185a0
-rw-r--r--   1 root root 283 Mar 14 00:05 1741928744_564175c185a0
-rw-r--r--   1 root root 198 Mar 14 00:08 1741928880_564175c185a0

Remove those files, then systemctl restart monit. It may be related to tildeslash/monit #1120 that resulted in a regression. Addressing changes in Monit/Argos builds, I’ll have something out later this weekend once completed.