1
0

2023-01-20

This commit is contained in:
dtb 2024-01-20 10:47:18 -07:00
parent c55ceaed1f
commit 31a4c35b4a

272
homepage
View File

@ -525,6 +525,278 @@ pre { /* DRY who? */
}
/blah/2024-01-20.html
: why mm(1)
I started working on mm(1) probably around 2020-2021, when I was first
acquainting myself with the inner workings of UNIX-like operating systems which
I had been using for a couple years by then. I can't remember how I noticed it
but it bothered me that there was this cat(1p) utility which took multiple
input files and streamed them successively to standard output:
[ input ] [ input ] [ input ]...
|_______ | _______|
_|_|_|_
| |
|cat(1p)|
|_______|
|
V
standard output
And then this tee(1p) utility which took from standard input and streamed its
bytes to multiple outputs:
standard input
V
___|___
| |
|tee(1p)|
|_______|
______| | |__________
| | |
[ output ] [ output ] [ output ]...
And they were separate utilities despite both doing the job of writing input(s)
to output(s). I imagined a hypothetical utility mm(1) that does it all:
[ input ] [ input ] [ input ]...
|_______ | _______|
_|_|_|_
| |
| mm(1) |
|_______|
______| | |__________
| | |
[ output ] [ output ] [ output ]...
And attempted to write this magical "mm" (as in, "middleman") utility that
would act as a "middleman" for streams before giving up (due to lack of C or
POSIX API experience) for a couple years to practice making easier programs in
UNIX environments.
There are a couple reasons to implement cat(1p) and tee(1p) as separate
utilities:
1) Ease of implementation
Differentiating input arguments from output arguments would require
either having a separator mark (which would be ineligant and exclude
that mark from being a useable file name) or option parsing.
Imagine a separator mark in the context of a hypothetical utility
insouts(1):
$ PS1='\n$ '
$ insouts -h
Usage: insouts (input...) "][" (output...)
$ printf %s\\n hello\ world
hello world
$ printf %s\\n hello\ world >in1
$ insouts <in1
hello world
$ insouts in1 ][ out1
$ insouts <out1
hello world
$ insouts <in1 >][
$ insouts ][ ][ /dev/stdout
Usage: insouts (input...) "][" (output...)
$ insouts ./][ ][ /dev/stdout
hello world
What a mess! The file ][ can no longer easily be used with insouts(1),
which may be acceptable (it's not a sensible file name anyway), but
it's sacrificed for horrendously ugly syntax featuring stressfully
unmatched square brackets.
I've written programs that have used separator marks for arguments,
namely pscat(1), psrelay(1), and psroute(1) so far, and there are a
number of additional caveats that come with their particular flavor of
marker and I've been hesitant about the syntax since I came up with it
half a year ago. Best not to make more things about which to fret.
Now imagine option parsing:
$ PS1='\n$ '
$ insouts
Usage: insouts (-i [input])... (-o [output])...
$ insouts -i in1
hello world
$ insouts -i in1 -i ][ -i out1
hello world
hello world
hello world
This works for everything and is how mm(1) works. The issue is with
regards to code itself. Imagine a very basic cat(1) implementation in
C:
#include <stdio.h>
int main(int argc, char *argv[]){
int c;
FILE *f;
int i;
for(i = 1; i < argc; ++i){
if((f = fopen(argv[i])) == NULL){
perror(argv[i]);
return 1;
}
while((c = getc(f)) != EOF)
putchar(c);
fclose(f);
}
}
This doesn't conform to POSIX (which requires 'cat -u' to be supported)
but illustrates the ease of using cat(1)'s arguments: For each
argument, open it as a file, write it out, close it, and that's it.
mm(1)'s option parsing for '-i' and '-o' alone, as of writing, are 24
lines alone, excluding the functions they call. The above program is 16
lines of code. This weight does also come from supporting "-" as a
euphemism for /dev/stdin or /dev/stdout depending on whether it was
used for '-i' or '-o' and trying to create an output file if it doesn't
exist and without these two features that are unsupported by the above
program the code for '-i' and '-o' would be considerably lighter, but
the point is that option parsing adds complexity that can be avoided by
simply having two utilities.
Furthermore, options have drawbacks for users.
2) Ease of use
One relatively common use of cat(1p) is to catenate all files matching
a glob pattern. Imagine:
$ PS1='\n$ '
$ ls
in1
in2
in3
$ cat <in1
hello
$ cat <in2
world
$ cat <in3
!!!
$ cat in*
hello
world
!!!
This use becomes much more tedious with argument parsing:
$ for f in in*; do mm -i "$f"; done
hello
world
!!!
And is difficult when it comes to multiple outputs rather than inputs,
like tee(1p):
$ ls
in1
in2
in3
$ touch out1 out2 out3
$ ls
in1
in2
in3
out1
out2
out3
$ cat in* | tee out*
$ cat <out2
hello
world
!!!
$ for f in out*; do for g in in*; do mm -i "$g"; done >"$f"; done
$ mm <out2
hello
world
!!!
3) Separation of concepts
cat(1p) accepts inputs. tee(1p) accepts outputs. It's possible to pipe
cat(1p) to tee(1p) to glean the benefits of multiple inputs and
multiple outputs without mm(1).
So why on earth should cat(1p) and tee(1p) be supported by the same utility?
Both cat(1p) and tee(1p) according to POSIX must support options, necessitating
the use of getopt(3p) from <unistd.h>. While '-i' and '-o' are 24 lines in
total, the rest of the options logic is necessary for cat(1p) and tee(1p) and
is unavoidable and outweighs the '-i' and '-o' options, plus much of the '-i'
and '-o' logic is still necessary in both cat(1p) and tee(1p) (supporting "-"
and, in tee(1p)'s case, creating an output if it doesn't exist). Though there
is additional memory juggling due to supporting arbitrary inputs and outputs,
in most uses actual memory use isn't noticeably affected (10 extra bytes for 5
file arguments, or one tenth of the data used by this parenthetical statement).
It is possible to write implementations of cat(1p) and tee(1p) in POSIX shell
script as wrappers on mm(1) and I have done so, so users who want to use globs
can simply call cat or tee as usual.
mm -i input -o output tends to be intuitive for existing shell users once they
learn the name "middleman".
/blah/2024-01-17.html
Read American Psycho (1991). I need a cigarette really, really bad.
I can't afford to renew my SourceHut account right now so these blog posts are
going up on my wobsite in A Bit, whenever I get around to manually building
them. I might set up a build server on feeling.murderu.us for small jobs but I
don't know. I also want to set up a proper VPS for trinity.moe but $60/year
(for Capsul) is a hell of a lot more than $20/year for SourceHut.
It feels weird to have long fingernails.
The Japanese Zen monk tradition according to No Recipe (2018) which someone
with which I'm staying is reading is to not have animals killed specifically
for you but always eat what you are served. I interpret this as well-spirited
and not a rule to dance around, having others act as go-betweens, because that
would suck. I sort of like this and have been rethinking veganism because it is
really inconvenient to have to restrict others' treatment of me; that is, I
can't eat meat that was prepared for me by people who don't know I'm vegan.
Most people don't have a good conception of what is and isn't vegan and will
serve me things that aren't vegan unknowingly.
I wish everyone was vegan but I don't wish to impose my will on others.
I feel shame at the notion that I have eaten something that died, except when
it comes to humans, at which notion I instead feel powerful, because I'm fucked
in the head.
/blah/2024-01-12.html
Read Finding the Still Point (2007).