src/wiki/unix/posix.m4
2022-11-24 11:19:54 -05:00

265 lines
13 KiB
Plaintext

_header(`POSIX')
_bibliography(`
_bentr(`_link(`IEEE Std 1003.1-2017', `https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/contents.html')')
_bentr(`_link(`The origin of the name POSIX.', `https://stallman.org/articles/posix.html')')
_bentr(`_link(`POSIX', `https://en.wikipedia.org/wiki/POSIX') (Wikipedia)')
_bentr(`_link(`POSIX™ 1003.1 Frequently Asked Questions', `https://www.opengroup.org/austin/papers/posix_faq.html')')
')
_subheader(`as(1)')
_bibliography(`
_bentr(`_link(`as', `https://en.wikipedia.org/wiki/As_(Unix)') (Wikipedia)')
_bentr(`_link(`UNIX Assembler Reference Manual', `https://www.tom-yam.or.jp/2238/ref/as.pdf')')
_bentr(`_link(`UNIX Operating System Porting Experiences', `https://www.bell-labs.com/usr/dmr/www/otherports/newp.pdf')')
')
_subsubheader(`GAS')
_bibliography(`
_bentr(`_link(`What I Dislike About GAS', `http://x86asm.net/articles/what-i-dislike-about-gas/')')
')
_subheader(`cat(1)')
_bibliography(`
_bentr(`_link(`4.4BSD-Lite2', `https://en.wikipedia.org/wiki/Berkeley_Software_Distribution')/_link(`usr/src/bin/cat/cat.c', `https://github.com/sergev/4.4BSD-Lite2/blob/master/usr/src/bin/cat/cat.c')')
_bentr(`_link(`busybox', `https://git.busybox.net/busybox/')/_link(`coreutils/cat.c', `https://git.busybox.net/busybox/tree/coreutils/cat.c')')
_bentr(`_link(`cat(1)', `http://man.cat-v.org/unix-1st/1/cat') (UNIX v1)')
_bentr(`_link(`cat(1p)', `https://www.unix.com/man-page/posix/1posix/cat/')')
_bentr(`_link(`UNIX Style, or cat -v Considered Harmful', `http://harmful.cat-v.org/cat-v/')')
_bentr(`_link(`dd(1p)', `https://www.unix.com/man-page/posix/1posix/dd/')')
_bentr(`_link(`FreeBSD', `https://www.freebsd.org/')/_link(`bin/cat/cat.c', `https://github.com/freebsd/freebsd-src/blob/main/bin/cat/cat.c')')
_bentr(`_link(`GNU coreutils', `https://www.gnu.org/software/coreutils/')/_link(`src/cat.c', `https://git.savannah.gnu.org/cgit/coreutils.git/tree/src/cat.c')')
_bentr(`_link(`The history of why cat -v is considered harmful', `https://lyngvaer.no/log/cat-v-history')')
_bentr(`_link(`NetBSD', `https://www.netbsd.org/')/_link(`bin/cat/cat.c', `https://github.com/NetBSD/src/blob/trunk/bin/cat/cat.c')')
_bentr(`_link(`Plan 9 from Bell Labs Fourth Edition', `https://9p.io/plan9/')/_link(`sys/src/cmd/cat.c', `https://github.com/plan9foundation/plan9/blob/main/sys/src/cmd/cat.c')')
_bentr(`_link(`Program Design in the UNIX Environment', `https://harmful.cat-v.org/cat-v/unix_prog_design.pdf')')
_bentr(`_link(`A Research Unix Reader', `https://www.cs.dartmouth.edu/~doug/reader.pdf')')
_bentr(`_link(`UNIX v7', `https://en.wikipedia.org/wiki/Unix')/_link(`usr/src/cmd/cat.c', `https://www.tuhs.org/cgi-bin/utree.pl?file=V7/usr/src/cmd/cat.c')')
_bentr(`Thanks to Miles and WeedSmokingJew for help with content.')
_bentr(`Thanks to adamz01h and wiresToGround for help with the JavaScript that used to accompany this article (to facilitate syntax highlighting in code samples using _link(`highlight.js', `https://highlightjs.org/')).')
_bentr(`Thanks to Ando_Bando, Miles, u/oh5nxo, and WeedSmokingJew for help with the accompanying code samples.')
')
<P>
_man(`cat(1)') is a program that exists to catenate files; to "join" one file at its end to another at its start.
</P>
<P>
_man(`cat(1)') was introduced in UNIX's first edition to succeed _man(`pr(1)'), which prints the contents of a single file to the screen.
Most use of _man(`cat(1)') is similar; it's often introduced to beginners as a means to print the contents of a file to the screen, which is why many implementations include options that modify output to make it easier to read on a display.
POSIX requires only _code(`-u') to be implemented, which guarantees output is unbuffered - on some systems output is buffered in 512-byte blocks, which is also the default of _man(`dd(1)'), though most current implementations (busybox, GNU coreutils) don't buffer output regardless.
Various implementations include _code(`-s') to strip duplicate blank lines (<CODE>cat "$@" | sed '/^\s*$/d'</CODE> would also work),
_code(`-n') to number lines (to which Pike and Kernighan offered <CODE>awk '{ print NR "\t" $0 }' "$@"</CODE> as a replacement)
and _code(`-b') to number non-blank lines (both cases for which _man(`nl(1)') was later made),
and _code(`-v') to mark invisible characters.
</P>
<P>
Additions to _code(`man(1)') are controversial; Rob Pike and Brian Kernighan explain this in _cite(`Program Design in the UNIX Environment'), the paper that accompanied Rob Pike's presentation _cite(`UNIX Style, or cat -v Considered Harmful') at the 1983 USENIX Summer Conference.
</P>
<P>
The following shell script is a POSIX-compliant implementation of _man(`cat(1)'):
</P>
<PRE>
#!/bin/sh
set -e
DD=dd
# usage with 0 arguments - print standard input to standard output
if test -z "$1"; then
dd 2&gt;/dev/null
exit $?
fi
while test -n "$1"; do
# Parse options
if test -z "$DONT_PARSE_ARGS"
then case "$1" in
--)
DONT_PARSE_ARGS=1
shift; continue; ;;
-u)
DD="dd bs=1"
shift; continue; ;;
-)
$DD &lt;/dev/stdin 2&gt;/dev/null
shift; continue; ;;
esac
fi
# Print input to output.
$DD &lt;"$1" 2&gt;/dev/null
shift
done
exit 0
</PRE>
_subheader(`echo(1)')
_bibliography(`
_bentr(`_link(`echo', `https://en.wikipedia.org/wiki/Echo_(command)') (Wikipedia)')
_bentr(`_link(`echo(1p)', `https://man7.org/linux/man-pages/man1/echo.1p.html') (man7)')
_bentr(`_link(`NetBSD', `https://www.netbsd.org/')/_link(`bin/echo/echo.sh', `https://github.com/NetBSD/src/blob/trunk/bin/echo/echo.c')')
_bentr(`_link(`UNIX v5', `#UNIX')/_link(`usr/source/s1/echo.c', `https://www.tuhs.org/cgi-bin/utree.pl?file=V5/usr/source/s1/echo.c')')
_bentr(`_link(`Variations in echo implementations', `https://www.in-ulm.de/~mascheck/various/echo+printf/')')
')
<P>
Don't use _man(`echo(1)'), use _man(`printf(1)').
_man(`printf(1)') simulates the _man(`printf(3)') function in the C standard I/O library which has no significant variations, whereas the functionality of _man(`echo(1)') can vary between vendors.
</P>
<P>
_code(`printf "%s" "$*"') does not work as _man(`echo(1)') though it's been said to do so (including by this page).
</P>
<P>
The following is an implementation of _man(`echo(1)') in the C programming language, using the standard library.
</P>
<PRE>
#include &lt;stdio.h&gt;
int main(int argc, char *argv[]) {
int i;
for(i = 1; ; ) {
if(i >= argc)
break;
printf("%s", argv[i]);
++i;
if(i == argc)
putchar('\n');
else
putchar(' ');
}
return 0;
}
</PRE>
<P>
The following is an implementation of _man(`echo(1)') in shell.
</P>
<PRE>
while :; do
if test -z "$1"
then break
fi
printf "%s" "$1"
`shift'
if test -z "$1"; then
printf "\n"
break
else
printf " "
fi
done
</PRE>
_subheader(`ed(1)')
_bibliography(`
_bentr(`_link(`A Tutorial Introduction to the Unix Text Editor', `https://verticalsysadmin.com/vi/a_tutorial_introduction_to_the_unix_text_editor.pdf')')
')
_subheader(`find(1)')
_bibliography(`
_bentr(`_link(`find', `https://en.wikipedia.org/wiki/Find_(Unix)') (Wikipedia)')
_bentr(`_link(`"Has this only been added in the last 20 years?"', `https://news.ycombinator.com/item?id=10318841')')
_bentr(`<A HREF="http://doc.cat-v.org/unix/find-history">The History of the Design of Unix's Find Command</A>')
')
_subheader(`echo(1)')
_bibliography(`
_bentr(`<A HREF="https://catonmat.net/ftp/ed.text.editor.cheat.sheet.txt">Ed Cheat Sheet</A>')
')
<P>
A particularly shoddy attempt at _man(`ed(1)') is provided by _code(`busybox').
A traditional _man(`ed(1)') implementation is in plan9ports.
I'm pretty sure some later UNIX-based OSes doubled the _man(`ed(1)') buffers, there's pretty much no downside to doing so in the modern era but it should be very easy to do yourself if it hasn't already been done (just double some of the array sizes in the beginning of _code(`ed.c')).
</P>
_subheader(`m4(1)')
_bibliography(`
_bentr(`_link(`m4', `https://en.wikipedia.org/wiki/`M4_'(computer_language)') (Wikipedia)')
_bentr(`_link(`Notes on the M4 Macro Language', `https://mbreen.com/m4.html')')
')
_subheader(`make(1)')
<P>
_man(`make(1)') in modern times is fragmented into the GNU version <I>gmake</I> and the BSD version <I>bmake</I>.
Complex Makefiles may not be useable in both.
Usually Linux systems have GNU Make as _command(`make') and BSD Make as _command(`bmake'),
and BSD systems to have BSD Make as _command(`make') and GNU Make as _command(`gmake');
the native Make is simply _command(`make') and the external Make gets a name designating its source.
</P>
_subheader(`mkfifo(1)')
_bibliography(`
_bentr(`_link(`mkfifo(1)', `https://man.netbsd.org/mkfifo.1') (NetBSD)')
_bentr(`_link(`mkfifo(2)', `https://man.netbsd.org/mkfifo.2') (NetBSD)')
_bentr(`_link(`Use mkfifo to create named pipe', `https://dev.to/0xbf/use-mkfifo-to-create-named-pipe-linux-tips-5bbk')')
_bentr(`_link(`What is the purpose of using a FIFO vs a temporary file or a pipe?', `https://unix.stackexchange.com/questions/433488/what-is-the-purpose-of-using-a-fifo-vs-a-temporary-file-or-a-pipe')')
')
_subheader(`sh(1)')
_bibliography(`
_bentr(`_link(`DASH', `http://gondor.apana.org.au/~herbert/dash/')')
_bentr(`_link(`DASH (cgit)', `https://git.kernel.org/pub/scm/utils/dash/dash.git')')
_bentr(`_link(`Interview with Ken Thompson`,' 9-6-89', `https://tuhs.v6sh.org/UnixArchiveMirror/Documentation/OralHistory/transcripts/thompson.htm')')
')
_passage(`Interview with Ken Thompson, 9-6-89', `<P>We stole a shell out of a MULTICS, the concept of a shell. We stole per process execution. You know create a process -execute the command. From a combination of the two, although, neither of them really did it, MULTICS wanted to do it. But, it was so expensive creating a process that it ended up creating a few processes and then using them and putting them back on the shelf, then picking them up and reinitializing them. So, they never really created a process for command because it was just too expensive. The ION direction and the stuff like that and later in fact streams came from um the IO switch, that we worked on in MULTICS. Having everything work the same and just directing, you know, changing what it really pointed to.</P>')
_subheader(`true(1)')
_bibliography(`
_bentr(`<A HREF="http://trillian.mit.edu/~jc/;-)/ATT_Copyright_true.html">CHAMBERS John - The /bin/true Command and Copyright</A>')
_bentr(`<A HREF="https://twitter.com/rob_pike/status/966896123548872705">PIKE Rob - "/bin/true used to be an empty file."</A>')
_bentr(`<A HREF="https://www.muppetlabs.com/~breadbox/software/tiny/teensy.html">RAITER Brian - A Whirlwind Tutorial on Creating Really Teensy ELF Executables for Linux</A>')
_bentr(`<A HREF="https://www.unix.com/man-page/posix/1p/true/">true(1p)</A> (The Open Group, 2003)')
_bentr(`<A HREF="https://www.gnu.org/">GNU</A>/<A HREF="https://www.gnu.org/software/coreutils/">coreutils</A>/<A HREF="https://git.savannah.gnu.org/cgit/coreutils.git/tree/src/true.c">src/true.c</A>')
_bentr(`<A HREF="https://www.netbsd.org/">NetBSD</A>/<A HREF="https://github.com/NetBSD/src/blob/trunk/usr.bin/true/true.sh">usr.bin/true/true.sh</A>')
')
<P>
_man(`true(1)') is a tool that <I>only</I> quits silently with an exit status of 0.
Similarly, _man(`false(1)') is a tool that <I>only</I> quits silently with an exit status of 1.
Recognizing arguments, printing to standard output, reading from standard input, or otherwise exiting with any other status of 0, is a violation of the POSIX specification for _man(`true(1)').
These utilities find use in shell scripting, which, though extremely relevant to these utilities, is beyond the scope of this article.
</P>
<P>
Because _man(`true(1)')'s required functionality is so simple a POSIX-compliant implementation is a one-liner in most languages, so long as you're willing to make an exception in your code styling.
For example, in C:
</P>
<PRE>
int main(void) { return 0; }
</PRE>
<P>
Because executing an empty shellscript file will in most shells do nothing and return an exit status of 0, technically an empty shellscript file is a POSIX-compliant _man(`true(1)') implementation in 0 bytes.
This was the _man(`true(1)') implementation on early versions of UNIX, including Research UNIX, System V, and Sun's Solaris, according to both Rob Pike and John Chambers.
A more explicit implementation also exists in POSIX shell:
</P>
<PRE>
#!/bin/sh
exit 0
</PRE>
<P>
This happens to be nearly identical in source to the implementation used by NetBSD.
</P>
<P>
Python has the same 0 byte _man(`true(1)') implementation feature as most shells.
Here's _man(`false(1)') in Python rather than _man(`true(1)') to demonstrate how exiting with an arbitrary exit status can be done:
</P>
<PRE>
import sys
sys.exit(1)
</PRE>
<P>
In some shells, _man(`true(1)') is a shell built-in command, so running _program(`true') will run the shell author's implementation of _man(`true(1)') rather than the system implementation.
</P>
<P>
GNU _man(`true(1)'), from the GNU coreutils, is well known for being a maximalist implementation - it's eighty lines long and directly includes four C header files.
Their _code(`true.c') is 2.3 kilobytes and parses the arguments _code(`--help') and _code(`--version') (only if either are the first argument to the program).
The GNU coreutils implementation of _man(`true(1)') is not POSIX compliant.
</P>
_subheader(`vi(1)')
_bibliography(`
_bentr(`_link(`vi(1p)', `https://man7.org/linux/man-pages/man1/vi.1p.html') (man7)')
')
<P>
Unlike _code(`busybox')'s _man(`ed(1)') implementation, its _man(`vi(1)') is very useable.
_man(`vim(1)') is a popular re-implementation of _man(`vi(1)').
</P>