Bonsai coreutils by Example #70

Open
opened 2024-02-20 13:59:43 +00:00 by trinity · 7 comments
Owner

We should have a guidebook for the coreutils that teaches each utility by example, or put examples in the man pages (cURL does this very well). I also want to have a Bonsai from POSIX guide that covers the full section 1 toolset and how to adapt existing POSIX code into Bonsai code (including examples for both the POSIX and qi(1) shells). I would be happy to write this but want to find out where to put it and any considerations I may have missed.

I'm gonna keep track of ideas in this thread so I don't lose them.

We should have a guidebook for the coreutils that teaches each utility by example, or put examples in the man pages (cURL does this very well). I also want to have a Bonsai from POSIX guide that covers the full section 1 toolset and how to adapt existing POSIX code into Bonsai code (including examples for both the POSIX and qi(1) shells). I would be happy to write this but want to find out where to put it and any considerations I may have missed. I'm gonna keep track of ideas in this thread so I don't lose them.
trinity added the
enhancement
label 2024-02-20 13:59:43 +00:00
trinity self-assigned this 2024-02-20 13:59:43 +00:00
Author
Owner

It may be better to just have a Bonsai system wiki.

It may be better to just have a Bonsai system wiki.
Author
Owner

dd(1p)

POSIX dd(1p) is a clone of the Data Definition (DD) command in IBM's Job Control Language (JCL) [origin of the UNIX dd command] originally implemented for the fifth edition of UNIX in 1974 by (AUTHOR). dd(1p)'s syntax was designed to match JCL's DD in what is commonly thought to be "clearly a prank" [dd] though it could also have been a comfort to UNIX immigrants from JCL. This syntax was ubiquitous and standardized as such, sticking dd(1p) in POSIX with unwieldy unconventional usage that tends to confuse new users and frustrate those of intermediate skill, an unfortunate interface issue in a command that in the present day is most commonly used for drive management.

Originally, dd(1p) was used for conversion of data from and to other systems' conventions, as is evident in its featureset - conv=swab swapping endianness, conv=ascii, conv=ebcdic, conv=ibm each being used for encoding conversion, conv=lower and conv=upper changing letter case (useful if converting from monocase systems), and conv=sync having use in certain types of block devices. conv=block and conv=unblock "convert variable-length ASCII records to fixed length" [UNIX v8 dd(1)].

conv=block and conv=unblock weren't included by the sixth edition UNIX's release [UNIX v6 dd(1)]. The POSIX dd(1p) man page's RATIONALE section implies they hail from BSD dd(1). They were in the 4th BSD release [BSD 4.4 dd(1)] and included in the eighth editon of UNIX, but not the seventh [UNIX v7 dd(1)]

In the present day, nearly all systems primarily use ASCII or a superset of it, nearly all file formats store data independent of the system processor's endianness, and block devices tend to have no problem storing data contiguously, or at least pretending they do, obsoleting dd(1p)'s convert usage. It's instead used for its ability to write arbitrary data to arbitrary sections of a file - often used particularly to write data to block devices.

The Bonsai analogue, dj(1), targets this particular use case.

Rather than use archaic JCL commands, it uses common, conventional UNIX option flags. The following two utility usages are functionally equivalent except for standard error output:

dd if=input of=output
dj -i input -o output

Options in dj(1), as well as every other Bonsai utility, are caps-sensitive, though this may be especially confusing in dj(1) as capitalized options configure the output and lowercase options configure the input. The following two utility usages are functionally equivalent except for standard error output:

dd if=input skip=1 ibs=2 of=output seek=3 obs=4
dj -i input -s 1 -b 2 -o output -S 3 -B 4

There's no way to specify both the input block size and output block size as is possible with dd(1p):

dd if=input bs=5 of=output
dj -i input -b 5 -o output -B 5

dj(1)'s equivalent to dd(1p)'s count= is -c, and, like dd(1p), dj(1) counts input blocks specifically:

dd ibs=1 count=2
dj -b 1 -c 2

dd(1p)'s default block size is 512B, though certain standards-breaking implementations use 1024B.

dd
dj -b 512 -B 512

dj(1)'s default block size is 1024B.

dd bs=1024
dj

conv=swab

The "swab" (short for swap bytes) conversion changes the endianness of input data. Evidence is circumstantial but, as UNIX was first written in PDP-11 assembly, and the authors of UNIX invented dd(1), it was likely based on the PDP-11 instruction of the same name, though the instruction itself wasn't directly used in the program which was written in C rather than assembly.

conv=swab is a somewhat uncommonly performed function. swab(1) performs the same function by default and examples of the transformation are in the swab(1) proposal, #22. It should be noted that swapping the bytes in multi-byte runes, which are used in all Unicode encodings, will mangle the data (this can be undone by passing the data again through swab(1)).

swab(1) can be plumbed out of dj(1) to fulfill the same functionality as the original:

dd if=input of=output ibs=5 conv=swab
dj -i input -o output  -b 5    | swab
## dd(1p) [POSIX dd(1p)](https://pubs.opengroup.org/onlinepubs/9699919799/utilities/dd.html) is a clone of the Data Definition (DD) command in IBM's Job Control Language (JCL) [[origin of the UNIX dd command](https://groups.google.com/g/alt.folklore.computers/c/HAWoZ8g-xYk/m/HDUVxwTVLKAJ)] originally implemented for the fifth edition of UNIX in 1974 by (AUTHOR). dd(1p)'s syntax was designed to match JCL's DD in what is commonly thought to be "clearly a prank" [[dd](http://www.catb.org/jargon/html/D/dd.html)] though it could also have been a comfort to UNIX immigrants from JCL. This syntax was ubiquitous and standardized as such, sticking dd(1p) in POSIX with unwieldy unconventional usage that tends to confuse new users and frustrate those of intermediate skill, an unfortunate interface issue in a command that in the present day is most commonly used for drive management. Originally, dd(1p) was used for conversion of data from and to other systems' conventions, as is evident in its featureset - `conv=swab` swapping endianness, `conv=ascii`, `conv=ebcdic`, `conv=ibm` each being used for encoding conversion, `conv=lower` and `conv=upper` changing letter case (useful if converting from monocase systems), and `conv=sync` having use in certain types of block devices. `conv=block` and `conv=unblock` "convert variable-length ASCII records to fixed length" [[UNIX v8 dd(1)](https://man.cat-v.org/unix_8th/1/dd)]. `conv=block` and `conv=unblock` weren't included by the sixth edition UNIX's release [[UNIX v6 dd(1)](https://man.cat-v.org/unix-6th/1/dd)]. The POSIX dd(1p) man page's *RATIONALE* section implies they hail from BSD dd(1). They were in the 4th BSD release [[BSD 4.4 dd(1)](https://www.unix.com/man-page/bsd/1/dd/)] and included in the eighth editon of UNIX, but not the seventh [[UNIX v7 dd(1)](https://man.cat-v.org/unix_7th/1/dd)] In the present day, nearly all systems primarily use ASCII or a superset of it, nearly all file formats store data independent of the system processor's endianness, and block devices tend to have no problem storing data contiguously, or at least pretending they do, obsoleting dd(1p)'s *convert* usage. It's instead used for its ability to write arbitrary data to arbitrary sections of a file - often used particularly to write data to block devices. The Bonsai analogue, dj(1), targets this particular use case. Rather than use archaic JCL commands, it uses common, conventional UNIX option flags. The following two utility usages are functionally equivalent except for standard error output: ```sh dd if=input of=output ``` ```sh dj -i input -o output ``` Options in dj(1), as well as every other Bonsai utility, are caps-sensitive, though this may be especially confusing in dj(1) as capitalized options configure the output and lowercase options configure the input. The following two utility usages are functionally equivalent except for standard error output: ```sh dd if=input skip=1 ibs=2 of=output seek=3 obs=4 ``` ```sh dj -i input -s 1 -b 2 -o output -S 3 -B 4 ``` There's no way to specify both the input block size and output block size as is possible with dd(1p): ```sh dd if=input bs=5 of=output ``` ```sh dj -i input -b 5 -o output -B 5 ``` dj(1)'s equivalent to dd(1p)'s `count=` is `-c`, and, like dd(1p), dj(1) counts input blocks specifically: ```sh dd ibs=1 count=2 ``` ```sh dj -b 1 -c 2 ``` dd(1p)'s default block size is 512B, though certain standards-breaking implementations use 1024B. ```sh dd ``` ```sh dj -b 512 -B 512 ``` dj(1)'s default block size is 1024B. ```sh dd bs=1024 ``` ```sh dj ``` ### `conv=swab` The "swab" (short for *swap bytes*) conversion changes the endianness of input data. Evidence is circumstantial but, as UNIX was first written in PDP-11 assembly, and the authors of UNIX invented dd(1), it was likely based on the PDP-11 instruction of the same name, though the instruction itself wasn't directly used in [the program which was written in C](https://github.com/dspinellis/unix-history-repo/blob/Research-V5-Snapshot-Development/usr/source/s1/dd.c) rather than assembly. `conv=swab` is a somewhat uncommonly performed function. swab(1) performs the same function by default and examples of the transformation are in the swab(1) proposal, #22. It should be noted that swapping the bytes in multi-byte runes, which are used in all Unicode encodings, will mangle the data (this can be undone by passing the data again through swab(1)). swab(1) can be plumbed out of dj(1) to fulfill the same functionality as the original: ```sh dd if=input of=output ibs=5 conv=swab ``` ```sh dj -i input -o output -b 5 | swab ```
Owner

I really like this idea.

I really like this idea.
Author
Owner

Should this go in docs/ or a separate repo?

Should this go in docs/ or a separate repo?
Owner

I don’t think it should go in the docs dir, but I’m not sure how we should make it yet.

I don’t think it should go in the docs dir, but I’m not sure how we should make it yet.

It may be better to just have a Bonsai system wiki.

how about a coreutil that does this? a general purpose tool to make wikis, documentations and even man pages. suckless does this i think.

> It may be better to just have a Bonsai system wiki. how about a coreutil that does this? a general purpose tool to make wikis, documentations and even man pages. suckless does this i think.
Author
Owner

It may be better to just have a Bonsai system wiki.

how about a coreutil that does this? a general purpose tool to make wikis, documentations and even man pages. suckless does this i think.

See #41.

> > It may be better to just have a Bonsai system wiki. > > how about a coreutil that does this? a general purpose tool to make wikis, documentations and even man pages. suckless does this i think. > See #41.
Sign in to join this conversation.
No Milestone
No project
No Assignees
3 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: bonsai/coreutils#70
No description provided.