2023-01-20
This commit is contained in:
parent
c55ceaed1f
commit
31a4c35b4a
272
homepage
272
homepage
@ -525,6 +525,278 @@ pre { /* DRY who? */
|
|||||||
}
|
}
|
||||||
|
|
||||||
|
|
||||||
|
/blah/2024-01-20.html
|
||||||
|
|
||||||
|
: why mm(1)
|
||||||
|
|
||||||
|
I started working on mm(1) probably around 2020-2021, when I was first
|
||||||
|
acquainting myself with the inner workings of UNIX-like operating systems which
|
||||||
|
I had been using for a couple years by then. I can't remember how I noticed it
|
||||||
|
but it bothered me that there was this cat(1p) utility which took multiple
|
||||||
|
input files and streamed them successively to standard output:
|
||||||
|
|
||||||
|
[ input ] [ input ] [ input ]...
|
||||||
|
|_______ | _______|
|
||||||
|
_|_|_|_
|
||||||
|
| |
|
||||||
|
|cat(1p)|
|
||||||
|
|_______|
|
||||||
|
|
|
||||||
|
V
|
||||||
|
standard output
|
||||||
|
|
||||||
|
And then this tee(1p) utility which took from standard input and streamed its
|
||||||
|
bytes to multiple outputs:
|
||||||
|
|
||||||
|
standard input
|
||||||
|
V
|
||||||
|
___|___
|
||||||
|
| |
|
||||||
|
|tee(1p)|
|
||||||
|
|_______|
|
||||||
|
______| | |__________
|
||||||
|
| | |
|
||||||
|
[ output ] [ output ] [ output ]...
|
||||||
|
|
||||||
|
And they were separate utilities despite both doing the job of writing input(s)
|
||||||
|
to output(s). I imagined a hypothetical utility mm(1) that does it all:
|
||||||
|
|
||||||
|
[ input ] [ input ] [ input ]...
|
||||||
|
|_______ | _______|
|
||||||
|
_|_|_|_
|
||||||
|
| |
|
||||||
|
| mm(1) |
|
||||||
|
|_______|
|
||||||
|
______| | |__________
|
||||||
|
| | |
|
||||||
|
[ output ] [ output ] [ output ]...
|
||||||
|
|
||||||
|
And attempted to write this magical "mm" (as in, "middleman") utility that
|
||||||
|
would act as a "middleman" for streams before giving up (due to lack of C or
|
||||||
|
POSIX API experience) for a couple years to practice making easier programs in
|
||||||
|
UNIX environments.
|
||||||
|
|
||||||
|
There are a couple reasons to implement cat(1p) and tee(1p) as separate
|
||||||
|
utilities:
|
||||||
|
|
||||||
|
1) Ease of implementation
|
||||||
|
|
||||||
|
Differentiating input arguments from output arguments would require
|
||||||
|
either having a separator mark (which would be ineligant and exclude
|
||||||
|
that mark from being a useable file name) or option parsing.
|
||||||
|
|
||||||
|
Imagine a separator mark in the context of a hypothetical utility
|
||||||
|
insouts(1):
|
||||||
|
|
||||||
|
$ PS1='\n$ '
|
||||||
|
|
||||||
|
$ insouts -h
|
||||||
|
Usage: insouts (input...) "][" (output...)
|
||||||
|
|
||||||
|
$ printf %s\\n hello\ world
|
||||||
|
hello world
|
||||||
|
|
||||||
|
$ printf %s\\n hello\ world >in1
|
||||||
|
|
||||||
|
$ insouts <in1
|
||||||
|
hello world
|
||||||
|
|
||||||
|
$ insouts in1 ][ out1
|
||||||
|
|
||||||
|
$ insouts <out1
|
||||||
|
hello world
|
||||||
|
|
||||||
|
$ insouts <in1 >][
|
||||||
|
|
||||||
|
$ insouts ][ ][ /dev/stdout
|
||||||
|
Usage: insouts (input...) "][" (output...)
|
||||||
|
|
||||||
|
$ insouts ./][ ][ /dev/stdout
|
||||||
|
hello world
|
||||||
|
|
||||||
|
What a mess! The file ][ can no longer easily be used with insouts(1),
|
||||||
|
which may be acceptable (it's not a sensible file name anyway), but
|
||||||
|
it's sacrificed for horrendously ugly syntax featuring stressfully
|
||||||
|
unmatched square brackets.
|
||||||
|
|
||||||
|
I've written programs that have used separator marks for arguments,
|
||||||
|
namely pscat(1), psrelay(1), and psroute(1) so far, and there are a
|
||||||
|
number of additional caveats that come with their particular flavor of
|
||||||
|
marker and I've been hesitant about the syntax since I came up with it
|
||||||
|
half a year ago. Best not to make more things about which to fret.
|
||||||
|
|
||||||
|
Now imagine option parsing:
|
||||||
|
|
||||||
|
$ PS1='\n$ '
|
||||||
|
|
||||||
|
$ insouts
|
||||||
|
Usage: insouts (-i [input])... (-o [output])...
|
||||||
|
|
||||||
|
$ insouts -i in1
|
||||||
|
hello world
|
||||||
|
|
||||||
|
$ insouts -i in1 -i ][ -i out1
|
||||||
|
hello world
|
||||||
|
hello world
|
||||||
|
hello world
|
||||||
|
|
||||||
|
This works for everything and is how mm(1) works. The issue is with
|
||||||
|
regards to code itself. Imagine a very basic cat(1) implementation in
|
||||||
|
C:
|
||||||
|
|
||||||
|
#include <stdio.h>
|
||||||
|
int main(int argc, char *argv[]){
|
||||||
|
int c;
|
||||||
|
FILE *f;
|
||||||
|
int i;
|
||||||
|
|
||||||
|
for(i = 1; i < argc; ++i){
|
||||||
|
if((f = fopen(argv[i])) == NULL){
|
||||||
|
perror(argv[i]);
|
||||||
|
return 1;
|
||||||
|
}
|
||||||
|
while((c = getc(f)) != EOF)
|
||||||
|
putchar(c);
|
||||||
|
fclose(f);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
This doesn't conform to POSIX (which requires 'cat -u' to be supported)
|
||||||
|
but illustrates the ease of using cat(1)'s arguments: For each
|
||||||
|
argument, open it as a file, write it out, close it, and that's it.
|
||||||
|
|
||||||
|
mm(1)'s option parsing for '-i' and '-o' alone, as of writing, are 24
|
||||||
|
lines alone, excluding the functions they call. The above program is 16
|
||||||
|
lines of code. This weight does also come from supporting "-" as a
|
||||||
|
euphemism for /dev/stdin or /dev/stdout depending on whether it was
|
||||||
|
used for '-i' or '-o' and trying to create an output file if it doesn't
|
||||||
|
exist and without these two features that are unsupported by the above
|
||||||
|
program the code for '-i' and '-o' would be considerably lighter, but
|
||||||
|
the point is that option parsing adds complexity that can be avoided by
|
||||||
|
simply having two utilities.
|
||||||
|
|
||||||
|
Furthermore, options have drawbacks for users.
|
||||||
|
|
||||||
|
2) Ease of use
|
||||||
|
|
||||||
|
One relatively common use of cat(1p) is to catenate all files matching
|
||||||
|
a glob pattern. Imagine:
|
||||||
|
|
||||||
|
$ PS1='\n$ '
|
||||||
|
|
||||||
|
$ ls
|
||||||
|
in1
|
||||||
|
in2
|
||||||
|
in3
|
||||||
|
|
||||||
|
$ cat <in1
|
||||||
|
hello
|
||||||
|
|
||||||
|
$ cat <in2
|
||||||
|
world
|
||||||
|
|
||||||
|
$ cat <in3
|
||||||
|
!!!
|
||||||
|
|
||||||
|
$ cat in*
|
||||||
|
hello
|
||||||
|
world
|
||||||
|
!!!
|
||||||
|
|
||||||
|
This use becomes much more tedious with argument parsing:
|
||||||
|
|
||||||
|
$ for f in in*; do mm -i "$f"; done
|
||||||
|
hello
|
||||||
|
world
|
||||||
|
!!!
|
||||||
|
|
||||||
|
And is difficult when it comes to multiple outputs rather than inputs,
|
||||||
|
like tee(1p):
|
||||||
|
|
||||||
|
$ ls
|
||||||
|
in1
|
||||||
|
in2
|
||||||
|
in3
|
||||||
|
|
||||||
|
$ touch out1 out2 out3
|
||||||
|
|
||||||
|
$ ls
|
||||||
|
in1
|
||||||
|
in2
|
||||||
|
in3
|
||||||
|
out1
|
||||||
|
out2
|
||||||
|
out3
|
||||||
|
|
||||||
|
$ cat in* | tee out*
|
||||||
|
|
||||||
|
$ cat <out2
|
||||||
|
hello
|
||||||
|
world
|
||||||
|
!!!
|
||||||
|
|
||||||
|
$ for f in out*; do for g in in*; do mm -i "$g"; done >"$f"; done
|
||||||
|
|
||||||
|
$ mm <out2
|
||||||
|
hello
|
||||||
|
world
|
||||||
|
!!!
|
||||||
|
|
||||||
|
3) Separation of concepts
|
||||||
|
|
||||||
|
cat(1p) accepts inputs. tee(1p) accepts outputs. It's possible to pipe
|
||||||
|
cat(1p) to tee(1p) to glean the benefits of multiple inputs and
|
||||||
|
multiple outputs without mm(1).
|
||||||
|
|
||||||
|
So why on earth should cat(1p) and tee(1p) be supported by the same utility?
|
||||||
|
|
||||||
|
Both cat(1p) and tee(1p) according to POSIX must support options, necessitating
|
||||||
|
the use of getopt(3p) from <unistd.h>. While '-i' and '-o' are 24 lines in
|
||||||
|
total, the rest of the options logic is necessary for cat(1p) and tee(1p) and
|
||||||
|
is unavoidable and outweighs the '-i' and '-o' options, plus much of the '-i'
|
||||||
|
and '-o' logic is still necessary in both cat(1p) and tee(1p) (supporting "-"
|
||||||
|
and, in tee(1p)'s case, creating an output if it doesn't exist). Though there
|
||||||
|
is additional memory juggling due to supporting arbitrary inputs and outputs,
|
||||||
|
in most uses actual memory use isn't noticeably affected (10 extra bytes for 5
|
||||||
|
file arguments, or one tenth of the data used by this parenthetical statement).
|
||||||
|
|
||||||
|
It is possible to write implementations of cat(1p) and tee(1p) in POSIX shell
|
||||||
|
script as wrappers on mm(1) and I have done so, so users who want to use globs
|
||||||
|
can simply call cat or tee as usual.
|
||||||
|
|
||||||
|
mm -i input -o output tends to be intuitive for existing shell users once they
|
||||||
|
learn the name "middleman".
|
||||||
|
|
||||||
|
|
||||||
|
/blah/2024-01-17.html
|
||||||
|
|
||||||
|
Read American Psycho (1991). I need a cigarette really, really bad.
|
||||||
|
|
||||||
|
I can't afford to renew my SourceHut account right now so these blog posts are
|
||||||
|
going up on my wobsite in A Bit, whenever I get around to manually building
|
||||||
|
them. I might set up a build server on feeling.murderu.us for small jobs but I
|
||||||
|
don't know. I also want to set up a proper VPS for trinity.moe but $60/year
|
||||||
|
(for Capsul) is a hell of a lot more than $20/year for SourceHut.
|
||||||
|
|
||||||
|
It feels weird to have long fingernails.
|
||||||
|
|
||||||
|
The Japanese Zen monk tradition according to No Recipe (2018) which someone
|
||||||
|
with which I'm staying is reading is to not have animals killed specifically
|
||||||
|
for you but always eat what you are served. I interpret this as well-spirited
|
||||||
|
and not a rule to dance around, having others act as go-betweens, because that
|
||||||
|
would suck. I sort of like this and have been rethinking veganism because it is
|
||||||
|
really inconvenient to have to restrict others' treatment of me; that is, I
|
||||||
|
can't eat meat that was prepared for me by people who don't know I'm vegan.
|
||||||
|
Most people don't have a good conception of what is and isn't vegan and will
|
||||||
|
serve me things that aren't vegan unknowingly.
|
||||||
|
|
||||||
|
I wish everyone was vegan but I don't wish to impose my will on others.
|
||||||
|
|
||||||
|
I feel shame at the notion that I have eaten something that died, except when
|
||||||
|
it comes to humans, at which notion I instead feel powerful, because I'm fucked
|
||||||
|
in the head.
|
||||||
|
|
||||||
|
|
||||||
/blah/2024-01-12.html
|
/blah/2024-01-12.html
|
||||||
|
|
||||||
Read Finding the Still Point (2007).
|
Read Finding the Still Point (2007).
|
||||||
|
Loading…
Reference in New Issue
Block a user