After upcp, email stopped working. Dovecot looking for wrong disk - quotactl(Q_XGETQUOTA, /dev/sda2) failed: No such device

After upcp, all email stopped working.
Checking dovecot I see the issue

Error: Failed to get quota resource STORAGE: quota-fs: errno=19, quotactl(Q_XGETQUOTA, /dev/sda2) failed: No such device

It’s trying to look for sda2 somehow. As for why I don’t know - the disk is vda2 not sda2

 lsblk
NAME   MAJ:MIN RM    SIZE RO TYPE MOUNTPOINT
sda      8:0    0    368K  1 disk 
sr0     11:0    1   1024M  0 rom  
vda    252:0    0   1001G  0 disk 
├─vda1 252:1    0    600M  0 part /boot/efi
└─vda2 252:2    0 1000,4G  0 part /

What’s reported from these 3 commands?

doveconf | grep -B2 -A2 quota
ls -la /home/virtual/FILESYSTEMTEMPLATE/siteinfo/dev /home/virtual/DOMAIN/dev
grep -Ei 'fatal|error' /var/log/maillog | grep dovecot | tail -n25
Jun 18 17:50:01 orion dovecot[4139419]: cgroup: Error: Script terminated abnormally, exit status 89
Jun 18 17:51:03 orion dovecot[4139419]: imap(asa@site.tld)<4182813><X>: Error: Failed to get quota resource STORAGE: quota-fs: errno=19, quotactl(Q_XGETQUOTA, /dev/sda2) failed: No such device
Jun 18 17:51:03 orion dovecot[4139419]: imap(asa@site.tld)<4182813><X>: Error: Failed to get quota resource STORAGE: quota-fs: errno=19, quotactl(Q_XGETQUOTA, /dev/sda2) failed: No such device
Jun 18 17:51:04 orion dovecot[4139419]: imap(asa@site.tld)<4182823><X>: Error: Failed to get quota resource STORAGE: quota-fs: errno=19, quotactl(Q_XGETQUOTA, /dev/sda2) failed: No such device
Jun 18 17:51:04 orion dovecot[4139419]: imap(asa@site.tld)<4182823><X>: Error: Failed to get quota resource STORAGE: quota-fs: errno=19, quotactl(Q_XGETQUOTA, /dev/sda2) failed: No such device
Jun 18 17:51:06 orion dovecot[4139419]: imap(asa@site.tld)<4182856><X>: Error: Failed to get quota resource STORAGE: quota-fs: errno=19, quotactl(Q_XGETQUOTA, /dev/sda2) failed: No such device
Jun 18 17:51:06 orion dovecot[4139419]: imap(asa@site.tld)<4182856><X>: Error: Failed to get quota resource STORAGE: quota-fs: errno=19, quotactl(Q_XGETQUOTA, /dev/sda2) failed: No such device
Jun 18 17:51:08 orion dovecot[4139419]: imap(asa@site.tld)<4182866><X>: Error: Failed to get quota resource STORAGE: quota-fs: errno=19, quotactl(Q_XGETQUOTA, /dev/sda2) failed: No such device
Jun 18 17:51:08 orion dovecot[4139419]: imap(asa@site.tld)<4182866><X>: Error: Failed to get quota resource STORAGE: quota-fs: errno=19, quotactl(Q_XGETQUOTA, /dev/sda2) failed: No such device
Jun 18 17:54:02 orion dovecot[4139419]: cgroup: Error: Fatal: Failed to locate process under PID 4184316
Jun 18 17:54:02 orion dovecot[4139419]: cgroup: Error: Script terminated abnormally, exit status 89
Jun 18 17:54:02 orion dovecot[4139419]: cgroup: Error: Fatal: Failed to locate process under PID 4184321
Jun 18 17:54:02 orion dovecot[4139419]: cgroup: Error: Script terminated abnormally, exit status 89
Jun 18 17:54:02 orion dovecot[4139419]: cgroup: Error: Fatal: Failed to locate process under PID 4184328
Jun 18 17:54:02 orion dovecot[4139419]: cgroup: Error: Script terminated abnormally, exit status 89
Jun 18 17:54:02 orion dovecot[4139419]: cgroup: Error: Fatal: Failed to locate process under PID 4184332
Jun 18 17:54:02 orion dovecot[4139419]: cgroup: Error: Script terminated abnormally, exit status 89
Jun 18 17:54:02 orion dovecot[4139419]: cgroup: Error: Fatal: Failed to locate process under PID 4184336
Jun 18 17:54:02 orion dovecot[4139419]: cgroup: Error: Script terminated abnormally, exit status 89
Jun 18 17:54:02 orion dovecot[4139419]: cgroup: Error: Fatal: Failed to locate process under PID 4184340
Jun 18 17:54:02 orion dovecot[4139419]: cgroup: Error: Script terminated abnormally, exit status 89
Jun 18 17:54:02 orion dovecot[4139419]: cgroup: Error: Fatal: Failed to locate process under PID 4184344
Jun 18 17:54:02 orion dovecot[4139419]: cgroup: Error: Script terminated abnormally, exit status 89
Jun 18 17:54:02 orion dovecot[4139419]: cgroup: Error: Fatal: Failed to locate process under PID 4184348
Jun 18 17:54:02 orion dovecot[4139419]: cgroup: Error: Script terminated abnormally, exit status 89

(replaced email and X)

dnf clean all
dnf update -y dovecot23-apnscp
systemctl restart dovecot

See how this works. Is it happening to just 1 site? Is procfs mounted within that filesystem?
chroot /home/virtual/siteXX/fst ls -1 /proc/cmdline

Dovecot is looking for the PID, which it cannot find. Either procfs isn’t mounted within the vfs or the process is terminating before the helper script has time to find it?

Yes it’s happening to more than 1 site and it’s mouted. Here is full logs

[root@orion ~]# chroot /home/virtual/site126/fst ls -1 /proc/cmdline
/proc/cmdline
[root@orion ~]# ls -la /home/virtual/FILESYSTEMTEMPLATE/siteinfo/dev /home/virtual/X/dev
/home/virtual/FILESYSTEMTEMPLATE/siteinfo/dev:
totalt 0
drwxr-xr-x 2 root root    146 25 maj  2024 .
drwxr-xr-x 9 root root    158 14 mar  2023 ..
lrwxrwxrwx 1 root root     12 14 mar  2023 log -> /.socket/log
crw-rw-rw- 1 root root   1, 3 14 mar  2023 null
crw-rw-rw- 1 root root   1, 8 14 mar  2023 random
brw-rw---- 1 root disk   8, 2 14 mar  2023 sda2
lrwxrwxrwx 1 root root     15 14 mar  2023 stderr -> /proc/self/fd/2
lrwxrwxrwx 1 root root     15 14 mar  2023 stdin -> /proc/self/fd/0
lrwxrwxrwx 1 root root     15 14 mar  2023 stdout -> /proc/self/fd/1
crw-rw-rw- 1 root tty    5, 0 14 mar  2023 tty
crw-rw-rw- 1 root root   1, 9 14 mar  2023 urandom
brw-rw---- 1 root disk 252, 2 25 maj  2024 vda2
crw-rw-rw- 1 root root   1, 5 14 mar  2023 zero

/home/virtual/X/dev:
totalt 0
drwxr-xr-x 1 root root    146 25 maj  2024 .
drwxr-xr-x 1 root root     74 16 maj  2023 ..
lrwxrwxrwx 1 root root     13 14 mar  2023 fd -> /proc/self/fd
lrwxrwxrwx 1 root root     12 14 mar  2023 log -> /.socket/log
crw-rw-rw- 1 root root   1, 3 14 mar  2023 null
crw-rw-rw- 1 root tty    5, 2 14 mar  2023 ptmx
drwxr-xr-x 2 root root      6 14 mar  2023 pts
crw-rw-rw- 1 root root   1, 8 14 mar  2023 random
brw-rw---- 1 root disk   8, 2 14 mar  2023 sda2
lrwxrwxrwx 1 root root     15 14 mar  2023 stderr -> /proc/self/fd/2
lrwxrwxrwx 1 root root     15 14 mar  2023 stdin -> /proc/self/fd/0
lrwxrwxrwx 1 root root     15 14 mar  2023 stdout -> /proc/self/fd/1
crw-rw-rw- 1 root tty    5, 0 14 mar  2023 tty
crw-rw-rw- 1 root root   1, 9 14 mar  2023 urandom
brw-rw---- 1 root disk 252, 2 25 maj  2024 vda2
crw-rw-rw- 1 root root   1, 5 14 mar  2023 zero
[root@orion ~]# doveconf | grep -B2 -A2 quota
lmtp_proxy_rawlog_dir = 
lmtp_rawlog_dir = 
lmtp_rcpt_check_quota = no
lmtp_save_to_detail_mailbox = no
lmtp_user_concurrency_limit = 0
--
mail_nfs_storage = no
mail_plugin_dir = /usr/lib64/dovecot
mail_plugins = quota acl cgroup fts fts_lucene zlib
mail_prefetch_count = 0
mail_privileged_group = 
--
  imapsieve_mailbox2_from = INBOX.Spam
  imapsieve_mailbox2_name = *
  quota = fs:User:user
  quota2 = fs:Account:group
  sieve_editheader_max_header_size = 1k
  sieve_execute_bin_dir = /usr/libexec/dovecot/sieve
--
process_shutdown_filter = 
protocols = imap pop3
quota_full_tempfail = no
rawlog_dir = 
recipient_delimiter = +
--
  imap_logout_format = bytes=%i/%o
  mail_max_userip_connections = 25
  mail_plugins = quota acl cgroup fts fts_lucene zlib imap_quota imap_acl imap_sieve
}
protocol pop3 {

The error is the same as before after the update
Error: Failed to get quota resource STORAGE: quota-fs: errno=19, quotactl(Q_XGETQUOTA, /dev/sda2) failed: No such device

That error is not relevant. It’s due to the following lines:

Jun 18 17:54:02 orion dovecot[4139419]: cgroup: Error: Script terminated abnormally, exit status 89
Jun 18 17:54:02 orion dovecot[4139419]: cgroup: Error: Fatal: Failed to locate process under PID 4184340

In particular, the source code read:

	if (NULL == (procname = read_process_name(pid))) {
		i_fatal("Failed to locate process under PID %d", pid);
	}

Now it reads:

	if (NULL == (procname = read_process_name(pid))) {
		i_error("Failed to locate process under PID %d", pid);
		return 1;
	}

This is updated in release 3. You can verify it’s installed with: rpm -q dovecot23-apnscp --queryformat="%{VERSION}-%{RELEASE}\n"

it’s installed yes:
Here is the new error log

Jun 18 18:52:02 orion dovecot[13840]: cgroup: Error: Error: Failed to locate process under PID 29626
Jun 18 18:52:02 orion dovecot[13840]: cgroup: Error: Error: Failed to locate process under PID 29630
Jun 18 18:52:02 orion dovecot[13840]: cgroup: Error: Error: Failed to locate process under PID 29634
Jun 18 18:52:02 orion dovecot[13840]: cgroup: Error: Error: Failed to locate process under PID 29638
Jun 18 18:52:02 orion dovecot[13840]: cgroup: Error: Error: Failed to locate process under PID 29642
Jun 18 18:52:02 orion dovecot[13840]: cgroup: Error: Error: Failed to locate process under PID 29646
Jun 18 18:52:02 orion dovecot[13840]: cgroup: Error: Error: Failed to locate process under PID 29650
Jun 18 18:52:02 orion dovecot[13840]: cgroup: Error: Error: Failed to locate process under PID 29654
..
Error: Failed to get quota resource STORAGE: quota-fs: errno=19, quotactl(Q_XGETQUOTA, /dev/sda2) failed: No such device
..
[root@orion ~]# rpm -q dovecot23-apnscp --queryformat="%{VERSION}-%{RELEASE}\n"
2.0-3

FYI: tested edge and Major stable

rpm -qi nss-apnscp

Sounds like you flipped to edge, which introduces significant changes to NSS/PAM services.

cpcmd scope:set cp.update-policy edge-major
upcp
upcp -sb mail/configure-dovecot system/pam system/nss

I’d need to login to the server to evaluate what’s going on if you’re still having issues after this. That quota message, while interesting, is a red herring.

No this seem to have made the issue worse with other errors. Is there no easy way to downgrade from edge > major in this case? I followed the docs for this but error persists.

cpcmd scope:set cp.update-policy major
upcp --reset
systemctl restart apiscp
upcp -sb mail/configure-dovecot system/pam system/nss

grep -Ei ‘fatal|error’ /var/log/maillog | grep dovecot | tail -n25 is now spamming error for all accounts so I assume mail is down for everyone.

I’m PM:d you.

System opted out of regular package updates and instead only installed package updates for 0-day exploits. IMAP logged in as expected and the error in the title above is irrelevant; the specific issue is that mail was not being delivered due to a mismatch between NSS/PAM packages + maildrop.

# cpcmd scope:get system.update-policy
security-severity

# dnf check-update | wc -l
174

The system was set to disallow bugfix package updates as well as packages delivering new features. An updated maildrop package was released April 8. This package update plans ahead for compatibility with the new PAM/NSS packages coming in the next major release. It’s also backward compatible with v1 PAM/NSS modules.

At no point in the last 60 days were system packages - outside of critical security fixes - applied to the server. This would include omitting packages that deliver critical features or regular bugfixes.

If you opt out of regular package updates, consider periodically performing a full package update to ensure you are properly protected.

Once maildrop was updated, local mail delivery continued as expected.