We had a server effectively go down this morning. SSH access cut out, and at least temporarily network access went down as well. We were able to log in using out-of-band access and were presented with a screen full of "Init: cannot fork, retry.." messages.
When trying to log in, when we typed in a userid and bad password, we got the normal "invalid user/pass" error. However if we typed in a correct userid and password, we were simply presented with the MOTD and the login screen again. It looks like the system was no longer able to launch any new processes (logging in successfully should launch a shell, if it can't I guess it drops you back at login?).
I found a description of the issue at Red Hat's knowledgebase (https://access.redhat.com/site/solutions/39497), but there is very little supplementary information on the error, just a suggested solution.
What exactly does nproc do? Is it a hard limit on the number of processes the system can have running at any point in time? When nproc is exceeded does it cause impacts like we saw? Is there any way to set it to unlimited? If not, how can we know what a safe or unsafe range is?
Any help or guidance would be very much appreciated, since it caused production issues and is now on the plate of several layer-8 folks :(
Edit: Also in /var/log/messages:
May 31 15:26:00 servername udevd[1637]: udev_event_run: fork of child failed: Resource temporarily unavailable
May 31 15:26:00 servername last message repeated 3 times
May 31 15:26:00 servername udevd-event[2461]: run_program: fork of '/lib/udev/udev_run_hotplugd' failed: Resource temporarily unavailable
May 31 15:26:00 servername udevd-event[2461]: run_program: fork of '/lib/udev/udev_run_devd' failed: Resource temporarily unavailable
May 31 15:26:00 servername udevd[1637]: udev_event_run: fork of child failed: Resource temporarily unavailable