Bookmark and Share

11949894331101577697bomb_01.svg.thumbEver run into this one? Under Leopard (OSX 10.5.8), the above message is posted to the system console every time cron launches a process. If you use cron for anything and the console for anything, this can be very, very annoying. Apple’s known about the problem for years, and has done nothing. Since I fall into that class of folks who use both the console and cron, I’ve been more or less quietly steaming about it. I finally decided to do something about it. Be warned; the following is technical and to-the-metal hackery. Don’t try this at home unless you’re very confident in your skill(z).

Now, as this hack is fairly specific to the problem, I don’t think it can hurt anything else (if performed correctly), and it serves my needs perfectly. Your mileage may differ for any number of reasons, and if you’re not a technical person, you should stop reading now and just forget your ever saw this post. Really. Stop now.

No? Still here? Ok, then…

Before you start, make a copy of the unmolested version of launchd, or you may be very, very sorry. I would also recommend that this only be undertaken if you have a second Mac around that you can use to get at the HD of your machine via FireWire if you foul this up and it fails to reboot — because if launchd won’t run, neither will much of anything else. You’ve been warned: this has to be done exactly right, or you may find yourself bringing your machine back up from the stone age, and cursing everything in sight.

Still here? LOL. Ok then, you asked for it:

Searching for the error message, or fragments of it, through the 10.5.8 OSX source code at http://www.opensource.apple.com/ eventually turns up launchd_core_logic.c:



kern_return_t
job_mig_post_fork_ping(job_t j, task_t child_task)
{
    struct machservice *ms;

    if (!launchd_assumes(j != NULL)) {
        return BOOTSTRAP_NO_MEMORY;
    }

    job_log(j, LOG_DEBUG, "Post fork ping.");

    job_setup_exception_port(j, child_task);

    SLIST_FOREACH(ms, &special_ports, special_port_sle) {
        if (j->per_user && (ms->special_port_num != TASK_ACCESS_PORT)) {
            /* The TASK_ACCESS_PORT funny business is to workaround 5325399. */
            continue;
        }

        errno = task_set_special_port(child_task, ms->special_port_num, ms->port);

        if (errno) {
            int desired_log_level = LOG_ERR;

            if (j->anonymous) {
                /* 5338127 */

                desired_log_level = LOG_WARNING;

                if (ms->special_port_num == TASK_SEATBELT_PORT) {
                    desired_log_level = LOG_DEBUG;
                }
            }

            job_log(j, desired_log_level, "Could not setup Mach task special port %u: %s", ms->special_port_num, mach_error_string(errno));
        }
    }

    job_assumes(j, launchd_mport_deallocate(child_task) == KERN_SUCCESS);

    return 0;
}

So basically, as can be determined by perusing the above posting of the relevant chunk of source code from launchd, there’s a function call that reports this particular error. And pretty much just this particular error.

It has four parameters, one of which is returned from a call to mach_error_string(). The only parameter that is a pointer (carrying the implication that the called routine could somehow get back at the original parameter) is the string pointer, and self-modification of a static format string… nah. That’s not the kind of thing the serious Koolaid drinkers at Apple would ever do. So it is clear that called procedure isn’t doing anything to those parameters; they are just used to report the error if it occurs — because if the error doesn’t occur, the call isn’t made. So, we don’t need to make the call for this routine to function correctly. Ah-ha. :)

This procedure call to mach_error_string turns out to be a unique signature in the 10.5.8 version of launchd — it only occurs once in the source code. That in turn allowed me to precisely locate the exact binary machine code within the launchd executable on my machine using a disassembler.

Once there, it was pretty obvious what is going on; there were the appropriate number of move instructions for the number of parameters for the call plus a little bit of indirect parameter retrieval, then the call, then the loop is checked and either exited or re-run. Here’s the relevant portion of the disassembly:



+193    0000fcd9  e87e050100              calll       0x0002025c                    _mach_error_string
+198    0000fcde  89442410                movl        %eax,0x10(%esp)
+202    0000fce2  8b462c                  movl        0x2c(%esi),%eax
+205    0000fce5  c7442408b8b90100        movl        $0x0001b9b8,0x08(%esp)
+213    0000fced  895c2404                movl        %ebx,0x04(%esp)
+217    0000fcf1  893c24                  movl        %edi,(%esp)
+220    0000fcf4  c1e808                  shrl        $0x08,%eax
+223    0000fcf7  25ff030000              andl        $0x000003ff,%eax
+228    0000fcfc  8944240c                movl        %eax,0x0c(%esp)
+232    0000fd00  e857afffff              calll       0x0000ac5c

The binary signature of the call (again, I emphasize, in the 10.5.8 version of launchd on my machine) is: E857AFFFFF. There’s only one instance of this binary string in the entire file, so again, an easy marker.

What we need to do here is replace that with something harmless (in context) of the same length. nearby, there’s an immediate AND instruction for eax; that’ll do. The value in eax isn’t used again as the loop ends after the call. The binary for that AND instruction is: 25FF030000. Even if, worst case, at load time, the code were relinked so that (what was) the calll address was changed, all that would happen is the and instruction would have a different AND pattern. So no matter what, this should be ok to do.

So we replace the E857AFFFFF with 25FF030000, and now what the code does is load up those parameters, AND the eax register to no point at all, and go on with life. No more “Could not setup Mach task special port %u: %s” messages. Ever. And launchd will continue to work just like it always did, because all that has been done here is to excise a call to log an error message.

Now… just a couple closing remarks.

First, although the hack is quite specific, it isn’t quite what I’d call surgically precise; it is possible that there might be lurking somewhere a situation that would legitimately call for this message, or a message emitted using this format string and parameters with a different port number and/or string at the end; and that’s not going to be logged if it happens. So be aware of that. Error handling, such as it is, won’t change, but the logging… that’s now impossible.

Second, this is hacking in the classic sense; for me, it was both fun and very satisfying, serving the purpose of raising my central digit in Apple’s general direction for inconveniencing me with something they could have easily fixed themselves in a matter of seconds; for you, though, if you’re not comfortable with machine code and binary, and/or not very certain you can do this exactly right… and you’re not completely prepared to have to firewire into your machine and replace the hacked launchd with the original again… or you don’t have 10.5.8… don’t even try it. Just write Apple and tell ‘em to fix their broken launchd.

So… Apple gives us a bug; drops the ball on fixing it; I entirely lost patience and hacked it out of my face. And there you have it. Pbbffft.