The design of my magic getopt
When I started writing the blog post announcing my magic getopt, I was intending to write about some of the design decisions which went into it. I changed my mind partway through writing: My readers probably cared about the functionality, but not about the ugly implementation details. It turns out that my original plan was the right one, as I've received questions about nearly every design decision I made. Since this clearly is of interest to my readers, here's the reasons behind some of the decisions I made while writing that code.
Why getopt? Why not <insert alternative command-line
processing system here>?
There are two reasons here. First, while there is a wide variety
of systems for parsing command lines, most of them are limited to
"set a flag" or "store a value" options. In the cases where they
can be instructed to execute arbitary code when an option is found,
the code is inevitably somewhere else, eliminating the
crucial "all the option handling is done in one place" property.
Second, and more importantly: I wanted a drop-in replacement for
getopt, so that existing UNIX software can migrate easily.
So how does it work?
Each GETOPT_* produce a case statement based on
the line number in the source file (this is why you can't have
two such statements on the same line). When we first enter the
getopt loop, we hit every line number between the line number of
the GETOPT_SWITCH and the line number of the
GETOPT_DEFAULT (this is why that needs to occur last)
and each statement registers its line number and the option string
it handles. Finally, once the initialization is completed, we
process the command line, with GETOPT extracting options
and GETOPT_SWITCH switching on the line number of the
registered option.
Why __LINE__? Why not __COUNTER__?
Because __COUNTER__ is not standardized, and thus not
portable. It would be tempting to check if __COUNTER__
is available and use it if present — this would avoid the
"only one option per line" restriction — but in fact this
would be a horrible idea: Code which had multiple options
one the same line would compile without warnings right up until
someone used a compiler which lacked __COUNTER__. Better
to use __LINE__ and avoid encouraging unportable people
to write unportable code.
Isn't iterating through every line quite slow?
Not at all. It takes just a few clock cycles to hit the
default case statement and jump back; even with a getopt
loop 1000 lines long you'd only be wasting a few microseconds. For
large loops the far larger cost is to compare each encountered
option against all of the registered option strings, and this is
the same in every getopt_long implementation I've read.
In the extremely unlikely event that options-parsing becomes a
performance problem, I would recommend switching the option
search to use a binary search and thus far fewer string compares.
Why <setjmp.h>?
It's possible to for a program to have multiple getopt loops; this
may be desireable for programs which operate differently depending
on the name with which they are invoked, for example. Unfortunately,
during the initial option-registering loop, it's necessary to jump
back to the top — but we can't use goto because the
identifiers of labeled statements in C have function scope, and
someone might want to put two getopt loops into the same function.
Automatic variables in C, however, have block scope — and so I
use setjmp and longjmp simply to perform a block-scope goto.
(Incidentally, I considered using these in a computed-jump lookup
table; but it turned out that it wouldn't accomplish anything which
I wasn't already getting via the case statement.)
Why does GETOPT_MISSING_ARG: disable warnings?
Because that's how getopt works. Beyond that? Beats me.