harakit/docs/dj.1

.\" Copyright (c) 2024 DTB <trinity@trinity.moe>
.\" Copyright (c) 2024 Emma Tebibyte <emma@tebibyte.media>
.\"
.\" This work is licensed under CC BY-SA 4.0. To see a copy of this license,
.\" visit <http://creativecommons.org/licenses/by-sa/4.0/>.
.\"
.TH DJ 1 2024-07-14 "Harakit X.X.X"
.SH NAME
dj \(en disk jockey
.\"
.SH SYNOPSIS

dj
.RB [ -Hn ]
.RB [ -a\ byte ]
.RB [ -c\ count ]

.RB [ -i\ file ]
.RB [ -b\ block_size ]
.RB [ -s\ offset ]

.RB [ -o\ file ]
.RB [ -B\ block_size ]
.RB [ -S\ offset ]
.\"
.SH DESCRIPTION

Perform precise read and write operations on files. This utility is useful for
reading and writing binary data to and from disks.

This manual page uses the terms \(lqskip\(rq and \(lqseek\(rq to refer to moving
to a specified byte by index in the input and output of the program
respectively. This language is inherited from the
.BR dd (1p)
utility and used here to decrease ambiguity.

The offset used when skipping or seeking refers to how many bytes are skipped
or sought. Running
.BR dj (1)
with a skip offset of 1 skips one byte into the input and reads from the second
byte onwards. A programmer may think of a file as a zero-indexed array of
bytes; in this analogy, the offset given is the index of the byte at which to
start reading or writing.
.\"
.SH OPTIONS

.IP \fB-i\fP\ \fIfile\fP
Takes a file path as an argument and opens it for use as an input.
.IP \fB-b\fP\ \fIblock_size\fP
Takes a numeric argument as the size in bytes of the input buffer, the default
being 1024.
.IP \fB-s\fP
Takes a numeric argument as the index of the byte at which reading will
commence; \(lqskips\(rq that number of bytes. If the standard input is used,
bytes read to this point are discarded.
.IP \fB-o\fP
Takes a file path as an argument and opens it for use as an output.
.IP \fB-B\fP\ \fIblock_size\fP
Takes a numeric argument as the size in bytes of the output buffer, the default
being 1024. Note that this option only affects the size of output writes and not
the amount of output data itself. See the CAVEATS section.
.IP \fB-S\fP
Takes a numeric argument as the index of the byte at which writing will
commence; \(lqseeks\(rq that number of bytes. If the standard output is used,
null characters are printed.
.IP \fB-a\fP
Accepts a single literal byte with which the input buffer is padded in the event
of an incomplete read from the input file. If the option argument is empty, the
null byte is used.
.IP \fB-c\fP
Specifies a number of blocks to read. The default is 0, in which case the input
is read until a partial or empty read is made.
.IP \fB-H\fP
Prints diagnostic messages in a human-readable manner as described in the
DIAGNOSTICS section.
.IP \fB-n\fP
Retries failed reads once before exiting.
.\"
.SH STANDARD INPUT

The standard input shall be used as an input if no inputs are specified or if
input file is \(lq-\(rq.
.\"
.SH STANDARD OUTPUT
The standard output shall be used as an output if no inputs are specified or if
the output file is \(lq-\(rq.
.\"
.SH EXAMPLES

The following
.BR sh (1p)
line:

.RS
printf 'Hello, world!\(rsn' | dj -c 1 -b 7 -s 7 2>/dev/null
.RE

Produces the following output:

.RS
world!
.RE

The following
.BR sh (1p)
lines run sequentially:

.RS
tr '\(rs0' 0    </dev/zero | dj -c 1 -b 6 -o hello.txt

tr '\(rs0' H    </dev/zero | dj -c 1 -b 1 -o hello.txt

tr '\(rs0' e    </dev/zero | dj -c 1 -b 1 -o hello.txt -S 1

tr '\(rs0' l    </dev/zero | dj -c 1 -b 2 -o hello.txt -S 2

tr '\(rs0' o    </dev/zero | dj -c 1 -b 1 -o hello.txt -S 4

tr '\(rs0' '\(rsn' </dev/zero | dj -c 1 -b 1 -o hello.txt -S 5

dj -i hello.txt
.RE

Produce the following output:

.RS
Hello
.RE

It may be particularly illuminating to print the contents of the example
.B hello.txt
after each
.BR dj (1)
invocation.
.\"
.SH DIAGNOSTICS

On a partial or empty read, a diagnostic message is printed. Then, the program
exits unless the
.B -n
option is specified.

By default, statistics are printed for input and output to the standard error in
the following format:

.RS
{records read} {ASCII unit separator} {partial records read}
{ASCII record separator} {records written} {ASCII unit separator}
{partial records written} {ASCII group separator} {bytes read}
{ASCII record separator} {bytes written} {ASCII file separator}
.RE

This format for diagnostic output is designed to be machine-parseable for
convenience. For a more human-readable format, the
.B -H
option may be specified. In this event, the following format is used instead:

.RS
{records read} '+' {partial records read} '>' {records written}
'+' {partial records written} ';' {bytes read} '>' {bytes written}
{ASCII line feed}
.RE

In non-recoverable errors that don\(cqt pertain to the read-write cycle, a
diagnostic message is printed and the program exits with the appropriate
.BR sysexits.h (3)
status.
.\"
.SH BUGS

If
.B -n
is specified along with the
.B -c
option and a count, actual byte output is the product of the count and the input
block size and therefore may be lower than expected. If the
.B -a
option is specified, this could make written data nonsensical.
.\"
.SH CAVEATS

Existing files are not truncated on ouput and are instead overwritten.

Option variants that have lowercase and uppercase forms could be confused for
each other. The former affects input and the latter affects output.

The
.B -B
option could be mistaken for the count in bytes of data written to the output.
This conception is intuitive but incorrect, as the
.B -c
option controls the number of blocks to read and the
.B -b
option sets the size of the blocks. The
.B -B
option is similar to the latter but sets the size of blocks to be written,
regardless of the amount of data that will actually be written. In practice,
this means the input buffer should be very large to make use of modern hardware
input and output speeds.

The skipped or sought bytes while processing irregular files, such as streams,
are reported in the diagnostic output, because they were actually read or
written. This is as opposed to bytes skipped while processing regular files,
which are not reported.
.\"
.SH RATIONALE

This program was based on the
.BR dd (1p)
utility as specified in POSIX. While character conversion may have been the
original intent of
.BR dd (1p),
it is irrelevant to its modern use. Because of this, this program eschews
character conversion and adds typical option formatting, allowing seeks to be
specified in bytes rather than in blocks, allowing arbitrary bytes as padding,
and printing in a format that\(cqs easy for machines to parse.
.\"
.SH COPYRIGHT

Copyright \(co 2023 DTB. License AGPLv3+: GNU AGPL version 3 or later
<https://gnu.org/licenses/agpl.html>.
.\"
.SH SEE ALSO
.BR dd (1p)
.BR lseek (3p)
.BR mm (1)