My mother’s Yahrzeit is coming up, and her name will be on the Kaddish list this Shabbat, so perhaps it’s appropriate that I’m making a posting she would have considered complete gibberish.
For the last three weeks, my MacBook Pro has been giving me fits. When I tried to start a program, sometimes it just wouldn’t start. And, when I looked in /var/log/system.log, it was littered with lovely messages like these:
Apr 17 00:45:31 dssmac com.apple.launchd[103] ([0x0-0x2effefd].com.apple.systemevents): fork() failed, will try again in one second: Resource temporarily unavailable Apr 17 00:45:31 dssmac com.apple.launchd[103] ([0x0-0x2effefd].com.apple.systemevents): Bug: launchd_core_logic.c:6780 (23714):35: jr->p Apr 17 00:45:36 dssmac /usr/bin/osascript[13552]: spawn_via_launchd() failed, errno=12 label=[0x0-0x2f01eff].com.apple.systemevents path=/System/Library/CoreServices/System Events.app/Contents/MacOS/System Events flags=1 Apr 17 00:45:36 dssmac com.apple.launchd[103] ([0x0-0x2f01eff].com.apple.systemevents): fork() failed, will try again in one second: Resource temporarily unavailable Apr 17 00:45:36 dssmac com.apple.launchd[103] ([0x0-0x2f01eff].com.apple.systemevents): Bug: launchd_core_logic.c:6780 (23714):35: jr->p Apr 17 00:45:42 dssmac /usr/bin/osascript[13553]: spawn_via_launchd() failed, errno=12 label=[0x0-0x2f03f01].com.apple.systemevents path=/System/Library/CoreServices/System Events.app/Contents/MacOS/System Events flags=1 Apr 17 00:45:42 dssmac com.apple.launchd[103] ([0x0-0x2f03f01].com.apple.systemevents): fork() failed, will try again in one second: Resource temporarily unavailable Apr 17 00:45:42 dssmac com.apple.launchd[103] ([0x0-0x2f03f01].com.apple.systemevents): Bug: launchd_core_logic.c:6780 (23714):35: jr->p
with the occasional
Apr 15 14:41:42 dssmac kernel[0]: proc: table is full
thrown in for bad measure.
I couldn’t figure out what was going wrong (Activity Monitor only showed between 60-80 processes, far fewer than the system limit), so yesterday, I reinstalled Mac OS X (using the archive-and-install method) — and it didn’t help.
I was, needless to say, unhappy. I hadn’t brought my external drive to the office, so I couldn’t do a bare-metal reinstall yet. But I could (and did) tweet about my problem:
Still getting fork(1) failures (“resource not available” — which one, dammit?), so I guess it’s time for a full reinstall. Crud.
This one caught the eye of many people who wanted to help, and I want to mention two in particular:
Ed Costello thought it might be hardware — I ran the hardware diagnostics, which showed nothing.
Rich Berlin (from Sun) made the suggestion which wound up putting me on the right path — he suggested running:
sudo dtrace -n 'syscall::fork*:entry{printf("%s %d",execname,pid);}'
which showed two Eclipse-based processes forking their little hearts out. So I did a “ps” to discover what they were (unsurprisingly, Lotus Notes and Lotus Sametime), but what startled me was how many “(NotesDynConfig)” processes there were in the process table. I wondered how many, so I ran
ps -aA | wc
and was shocked to see a result of about 160, compared with the 70 processes shown in Activity Monitor. So I stopped Notes and suddenly, I was down to 70 processes via both methods.
It seems that Activity Monitor doesn’t report zombie processes. Neither does the line at the top of “top(1)”, which I’d also used while trying to troubleshoot.
Given that discrepancy, I can now understand why the system was running out of processes. I don’t know why Notes is leaving zombies around, but that’s a problem for another day (my next step is to upgrade to the latest beta and see if it helps — I’ve also reported the problem, of course).
And I guess I probably don’t really have to do a full reinstall…though I might, anyway — it’s my Windows training coming to the fore.