str(1): design regrets #79

Open
opened 2024-03-03 23:21:03 +00:00 by trinity · 1 comment
Owner

str isdigit "$var" looks nice and all but it warps the ctype.h(0p) meaning a little to apply the tests to each character in a string rather than an individual character. The usage is also inconsistent with scrut(1) and every other utility. I sort of wanna use getopt(3p) options instead.

ctype.h(0p) has a number of issues. Localization of messages is fine but the isproperty functions returning different booleans depending on locale is tricky - though LANG=C and associated envvar changes on invocation make this avoidable, so perhaps this is useful behavior.

This utility was at one point stris(1) and that might be a better name. str(1) is brief to the point of confusion.

If we used options the following tests would be necessary:

  • (sectional) isascii(3p), iscntrl(3p), isblank(3p)
  • (printing) isdigit(3p), isupper(3p), islower(3p)

I think these options should be -7, -c, -b, -d, -u, -l respectively.

A -p [permitted] could include additional glyphs and a -i could read glyphs from standard input so that for lots of data stris(1) could exit early.

These changes also make UTF-8 supportable, whereas the current str(1) is inherently incompatible with any non-ASCII character encoding. I'd like to write this in Rust, also, though implementation is irrelevant.

`str isdigit "$var"` looks nice and all but it warps the [ctype.h(0p)](https://www.man7.org/linux/man-pages/man0/ctype.h.0p.html) meaning a little to apply the tests to each character in a string rather than an individual character. The usage is also inconsistent with scrut(1) and every other utility. I sort of wanna use getopt(3p) options instead. ctype.h(0p) has a number of issues. Localization of messages is fine but the isproperty functions returning different booleans depending on locale is tricky - though LANG=C and associated envvar changes on invocation make this avoidable, so perhaps this is useful behavior. This utility was at one point stris(1) and that might be a better name. str(1) is brief to the point of confusion. If we used options the following tests would be necessary: - (sectional) isascii(3p), iscntrl(3p), isblank(3p) - (printing) isdigit(3p), isupper(3p), islower(3p) I think these options should be `-7`, `-c`, `-b`, `-d`, `-u`, `-l` respectively. A `-p [permitted]` could include additional glyphs and a `-i` could read glyphs from standard input so that for lots of data stris(1) could exit early. These changes also make UTF-8 supportable, whereas the current str(1) is inherently incompatible with any non-ASCII character encoding. I'd like to write this in Rust, also, though implementation is irrelevant.
trinity added the
enhancement
question
labels 2024-03-03 23:21:03 +00:00
trinity self-assigned this 2024-03-03 23:21:03 +00:00
Author
Owner

I implemented stris(1) based on this issue. I think it's a cleaner and more elegant solution to the problem.

I [implemented stris(1)](https://git.tebibyte.media/bonsai/coreutils/src/branch/stris/docs/stris.1) based on this issue. I think it's a cleaner and more elegant solution to the problem.
Sign in to join this conversation.
No Milestone
No project
No Assignees
1 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: bonsai/coreutils#79
No description provided.