2023-01-20
This commit is contained in:
parent
c55ceaed1f
commit
31a4c35b4a
272
homepage
272
homepage
@ -525,6 +525,278 @@ pre { /* DRY who? */
|
||||
}
|
||||
|
||||
|
||||
/blah/2024-01-20.html
|
||||
|
||||
: why mm(1)
|
||||
|
||||
I started working on mm(1) probably around 2020-2021, when I was first
|
||||
acquainting myself with the inner workings of UNIX-like operating systems which
|
||||
I had been using for a couple years by then. I can't remember how I noticed it
|
||||
but it bothered me that there was this cat(1p) utility which took multiple
|
||||
input files and streamed them successively to standard output:
|
||||
|
||||
[ input ] [ input ] [ input ]...
|
||||
|_______ | _______|
|
||||
_|_|_|_
|
||||
| |
|
||||
|cat(1p)|
|
||||
|_______|
|
||||
|
|
||||
V
|
||||
standard output
|
||||
|
||||
And then this tee(1p) utility which took from standard input and streamed its
|
||||
bytes to multiple outputs:
|
||||
|
||||
standard input
|
||||
V
|
||||
___|___
|
||||
| |
|
||||
|tee(1p)|
|
||||
|_______|
|
||||
______| | |__________
|
||||
| | |
|
||||
[ output ] [ output ] [ output ]...
|
||||
|
||||
And they were separate utilities despite both doing the job of writing input(s)
|
||||
to output(s). I imagined a hypothetical utility mm(1) that does it all:
|
||||
|
||||
[ input ] [ input ] [ input ]...
|
||||
|_______ | _______|
|
||||
_|_|_|_
|
||||
| |
|
||||
| mm(1) |
|
||||
|_______|
|
||||
______| | |__________
|
||||
| | |
|
||||
[ output ] [ output ] [ output ]...
|
||||
|
||||
And attempted to write this magical "mm" (as in, "middleman") utility that
|
||||
would act as a "middleman" for streams before giving up (due to lack of C or
|
||||
POSIX API experience) for a couple years to practice making easier programs in
|
||||
UNIX environments.
|
||||
|
||||
There are a couple reasons to implement cat(1p) and tee(1p) as separate
|
||||
utilities:
|
||||
|
||||
1) Ease of implementation
|
||||
|
||||
Differentiating input arguments from output arguments would require
|
||||
either having a separator mark (which would be ineligant and exclude
|
||||
that mark from being a useable file name) or option parsing.
|
||||
|
||||
Imagine a separator mark in the context of a hypothetical utility
|
||||
insouts(1):
|
||||
|
||||
$ PS1='\n$ '
|
||||
|
||||
$ insouts -h
|
||||
Usage: insouts (input...) "][" (output...)
|
||||
|
||||
$ printf %s\\n hello\ world
|
||||
hello world
|
||||
|
||||
$ printf %s\\n hello\ world >in1
|
||||
|
||||
$ insouts <in1
|
||||
hello world
|
||||
|
||||
$ insouts in1 ][ out1
|
||||
|
||||
$ insouts <out1
|
||||
hello world
|
||||
|
||||
$ insouts <in1 >][
|
||||
|
||||
$ insouts ][ ][ /dev/stdout
|
||||
Usage: insouts (input...) "][" (output...)
|
||||
|
||||
$ insouts ./][ ][ /dev/stdout
|
||||
hello world
|
||||
|
||||
What a mess! The file ][ can no longer easily be used with insouts(1),
|
||||
which may be acceptable (it's not a sensible file name anyway), but
|
||||
it's sacrificed for horrendously ugly syntax featuring stressfully
|
||||
unmatched square brackets.
|
||||
|
||||
I've written programs that have used separator marks for arguments,
|
||||
namely pscat(1), psrelay(1), and psroute(1) so far, and there are a
|
||||
number of additional caveats that come with their particular flavor of
|
||||
marker and I've been hesitant about the syntax since I came up with it
|
||||
half a year ago. Best not to make more things about which to fret.
|
||||
|
||||
Now imagine option parsing:
|
||||
|
||||
$ PS1='\n$ '
|
||||
|
||||
$ insouts
|
||||
Usage: insouts (-i [input])... (-o [output])...
|
||||
|
||||
$ insouts -i in1
|
||||
hello world
|
||||
|
||||
$ insouts -i in1 -i ][ -i out1
|
||||
hello world
|
||||
hello world
|
||||
hello world
|
||||
|
||||
This works for everything and is how mm(1) works. The issue is with
|
||||
regards to code itself. Imagine a very basic cat(1) implementation in
|
||||
C:
|
||||
|
||||
#include <stdio.h>
|
||||
int main(int argc, char *argv[]){
|
||||
int c;
|
||||
FILE *f;
|
||||
int i;
|
||||
|
||||
for(i = 1; i < argc; ++i){
|
||||
if((f = fopen(argv[i])) == NULL){
|
||||
perror(argv[i]);
|
||||
return 1;
|
||||
}
|
||||
while((c = getc(f)) != EOF)
|
||||
putchar(c);
|
||||
fclose(f);
|
||||
}
|
||||
}
|
||||
|
||||
This doesn't conform to POSIX (which requires 'cat -u' to be supported)
|
||||
but illustrates the ease of using cat(1)'s arguments: For each
|
||||
argument, open it as a file, write it out, close it, and that's it.
|
||||
|
||||
mm(1)'s option parsing for '-i' and '-o' alone, as of writing, are 24
|
||||
lines alone, excluding the functions they call. The above program is 16
|
||||
lines of code. This weight does also come from supporting "-" as a
|
||||
euphemism for /dev/stdin or /dev/stdout depending on whether it was
|
||||
used for '-i' or '-o' and trying to create an output file if it doesn't
|
||||
exist and without these two features that are unsupported by the above
|
||||
program the code for '-i' and '-o' would be considerably lighter, but
|
||||
the point is that option parsing adds complexity that can be avoided by
|
||||
simply having two utilities.
|
||||
|
||||
Furthermore, options have drawbacks for users.
|
||||
|
||||
2) Ease of use
|
||||
|
||||
One relatively common use of cat(1p) is to catenate all files matching
|
||||
a glob pattern. Imagine:
|
||||
|
||||
$ PS1='\n$ '
|
||||
|
||||
$ ls
|
||||
in1
|
||||
in2
|
||||
in3
|
||||
|
||||
$ cat <in1
|
||||
hello
|
||||
|
||||
$ cat <in2
|
||||
world
|
||||
|
||||
$ cat <in3
|
||||
!!!
|
||||
|
||||
$ cat in*
|
||||
hello
|
||||
world
|
||||
!!!
|
||||
|
||||
This use becomes much more tedious with argument parsing:
|
||||
|
||||
$ for f in in*; do mm -i "$f"; done
|
||||
hello
|
||||
world
|
||||
!!!
|
||||
|
||||
And is difficult when it comes to multiple outputs rather than inputs,
|
||||
like tee(1p):
|
||||
|
||||
$ ls
|
||||
in1
|
||||
in2
|
||||
in3
|
||||
|
||||
$ touch out1 out2 out3
|
||||
|
||||
$ ls
|
||||
in1
|
||||
in2
|
||||
in3
|
||||
out1
|
||||
out2
|
||||
out3
|
||||
|
||||
$ cat in* | tee out*
|
||||
|
||||
$ cat <out2
|
||||
hello
|
||||
world
|
||||
!!!
|
||||
|
||||
$ for f in out*; do for g in in*; do mm -i "$g"; done >"$f"; done
|
||||
|
||||
$ mm <out2
|
||||
hello
|
||||
world
|
||||
!!!
|
||||
|
||||
3) Separation of concepts
|
||||
|
||||
cat(1p) accepts inputs. tee(1p) accepts outputs. It's possible to pipe
|
||||
cat(1p) to tee(1p) to glean the benefits of multiple inputs and
|
||||
multiple outputs without mm(1).
|
||||
|
||||
So why on earth should cat(1p) and tee(1p) be supported by the same utility?
|
||||
|
||||
Both cat(1p) and tee(1p) according to POSIX must support options, necessitating
|
||||
the use of getopt(3p) from <unistd.h>. While '-i' and '-o' are 24 lines in
|
||||
total, the rest of the options logic is necessary for cat(1p) and tee(1p) and
|
||||
is unavoidable and outweighs the '-i' and '-o' options, plus much of the '-i'
|
||||
and '-o' logic is still necessary in both cat(1p) and tee(1p) (supporting "-"
|
||||
and, in tee(1p)'s case, creating an output if it doesn't exist). Though there
|
||||
is additional memory juggling due to supporting arbitrary inputs and outputs,
|
||||
in most uses actual memory use isn't noticeably affected (10 extra bytes for 5
|
||||
file arguments, or one tenth of the data used by this parenthetical statement).
|
||||
|
||||
It is possible to write implementations of cat(1p) and tee(1p) in POSIX shell
|
||||
script as wrappers on mm(1) and I have done so, so users who want to use globs
|
||||
can simply call cat or tee as usual.
|
||||
|
||||
mm -i input -o output tends to be intuitive for existing shell users once they
|
||||
learn the name "middleman".
|
||||
|
||||
|
||||
/blah/2024-01-17.html
|
||||
|
||||
Read American Psycho (1991). I need a cigarette really, really bad.
|
||||
|
||||
I can't afford to renew my SourceHut account right now so these blog posts are
|
||||
going up on my wobsite in A Bit, whenever I get around to manually building
|
||||
them. I might set up a build server on feeling.murderu.us for small jobs but I
|
||||
don't know. I also want to set up a proper VPS for trinity.moe but $60/year
|
||||
(for Capsul) is a hell of a lot more than $20/year for SourceHut.
|
||||
|
||||
It feels weird to have long fingernails.
|
||||
|
||||
The Japanese Zen monk tradition according to No Recipe (2018) which someone
|
||||
with which I'm staying is reading is to not have animals killed specifically
|
||||
for you but always eat what you are served. I interpret this as well-spirited
|
||||
and not a rule to dance around, having others act as go-betweens, because that
|
||||
would suck. I sort of like this and have been rethinking veganism because it is
|
||||
really inconvenient to have to restrict others' treatment of me; that is, I
|
||||
can't eat meat that was prepared for me by people who don't know I'm vegan.
|
||||
Most people don't have a good conception of what is and isn't vegan and will
|
||||
serve me things that aren't vegan unknowingly.
|
||||
|
||||
I wish everyone was vegan but I don't wish to impose my will on others.
|
||||
|
||||
I feel shame at the notion that I have eaten something that died, except when
|
||||
it comes to humans, at which notion I instead feel powerful, because I'm fucked
|
||||
in the head.
|
||||
|
||||
|
||||
/blah/2024-01-12.html
|
||||
|
||||
Read Finding the Still Point (2007).
|
||||
|
Loading…
Reference in New Issue
Block a user