qi(1): Syntax #152

Open
opened 2024-08-04 09:37:47 -06:00 by emma · 9 comments
Owner

Taking some inspiration from Drew Devault’s rc(1) seems like an interesting way to go here as it has a lot of similar properties to what we wanted in #8.

Taking some inspiration from [Drew Devault’s `rc(1)`](https://drewdevault.com/2023/07/31/The-rc-shell-and-whitespace.html) seems like an interesting way to go here as it has a lot of similar properties to what we wanted in #8.
emma added the
enhancement
help wanted
question
labels 2024-08-04 09:37:47 -06:00
emma added this to the `qi(1)` project 2024-08-04 09:38:04 -06:00
Author
Owner

I've been asked to submit my grumblings about POSIX shell quoting to this issue. I am lacking sufficient energy to write eloquently (or at all) about this so here's an example.

# normal, makes sense
$ echo "foo"
foo

# normal, makes sense
$ echo "\"foo\""
"foo"

# normal, makes sense
$ echo 'foo'
foo

# what the fuck
$ echo '\'foo\''
# invalid syntax.
# string 1: \
# string 2: foo
# string 3: '
# error bc unclosed '

# solved by bashism. this should not have to be necessary
$ echo $'\'foo\''
'foo'

Originally posted by @silt in #8 (comment)

I've been asked to submit my grumblings about POSIX shell quoting to this issue. I am lacking sufficient energy to write eloquently (or at all) about this so here's an example. ```bash # normal, makes sense $ echo "foo" foo # normal, makes sense $ echo "\"foo\"" "foo" # normal, makes sense $ echo 'foo' foo # what the fuck $ echo '\'foo\'' # invalid syntax. # string 1: \ # string 2: foo # string 3: ' # error bc unclosed ' # solved by bashism. this should not have to be necessary $ echo $'\'foo\'' 'foo' ``` _Originally posted by @silt in https://git.tebibyte.media/bonsai/harakit/issues/8#issuecomment-2990_
Author
Owner

Plan 9's rc(1) doubles quote runes to escape them:

; echo "foo"
foo
; echo ""foo""
"foo"
; echo ''foo''
'foo'

Which I've always felt is quite nice.

The reason sh(1p)'s quotes are fucked is because not only is the backslash used to escape quotes in "-wrapped strings but it's also used for regular escape sequences. For example:

$ echo "test"  | wc -c # ['t','e','s','t','\n']
5
$ echo "test\n" | wc -c # ['t','e','s','t','\n','\n']
6
$ echo 'test\n' | wc -c # ['t','e','s','t','\\','n','\n']
7

POSIX shell quoting is abhorrent and the worst part about shell scripting by far. Escape sequences shouldn't be supported except by a theoretical format(1) (printf(1p) but Bonsai and improved).

Drew Devault's rc(1) bastardization (it is probably nice in practice) seems consistent but still as overcomplicated as shell is.

I'd like Plan 9 rc(1) quotes/escapes without a difference between "- and '-wrapped strings and no escape sequences besides "" or '', but I think this is better rather than good. I'm not sure what good would look like. Probably something to do with ASV.

Originally posted by @trinity in #8 (comment)

Plan 9's rc(1) doubles quote runes to escape them: ```sh ; echo "foo" foo ; echo ""foo"" "foo" ; echo ''foo'' 'foo' ``` Which I've always felt is quite nice. The reason sh(1p)'s quotes are fucked is because not only is the backslash used to escape quotes in `"`-wrapped strings but it's also used for regular escape sequences. For example: ```sh $ echo "test" | wc -c # ['t','e','s','t','\n'] 5 $ echo "test\n" | wc -c # ['t','e','s','t','\n','\n'] 6 $ echo 'test\n' | wc -c # ['t','e','s','t','\\','n','\n'] 7 ``` POSIX shell quoting is abhorrent and the worst part about shell scripting by far. Escape sequences shouldn't be supported except by a theoretical format(1) (printf(1p) but Bonsai and improved). Drew Devault's rc(1) bastardization (it is probably nice in practice) seems consistent but still as overcomplicated as shell is. I'd like Plan 9 rc(1) quotes/escapes without a difference between `"`- and `'`-wrapped strings and no escape sequences besides `""` or `''`, but I think this is *better* rather than *good*. I'm not sure what good would look like. Probably something to do with ASV. _Originally posted by @trinity in https://git.tebibyte.media/bonsai/harakit/issues/8#issuecomment-2992_
Author
Owner

The way POSIX shell does variable assignment is awful.

#!/bin/env -i /bin/sh
# ^^ don't inherit an existing environment

echo "$x" # unassigned (vars are "" by default), so it'll echo ['\n']

#x = hello # syntax error
# no spaces can be between the name, '=', and the value

x=hello # infix (subject verb object)
echo "$x" # prefix (verb subject)

Shell variable assignment isn't consistent with most syntax and really finnicky about whitespace. I propose a better way:

set x hello

set would need to be a shell builtin to change the local environment; it would have the usage set [variables...] [value] to facilitate setting multiple variables at the same time:

set x y z hello
echo "$x $y $z" # hello hello hello

POSIX shell already has a set builtin for modifying "$@" and its components (as well as configuring the behavior of the shell itself); we should rethink the functionality offered by POSIX set (reserved shell variables for configuration, for example).

Originally posted by @trinity in #8 (comment)

The way POSIX shell does variable assignment is awful. ```sh #!/bin/env -i /bin/sh # ^^ don't inherit an existing environment echo "$x" # unassigned (vars are "" by default), so it'll echo ['\n'] #x = hello # syntax error # no spaces can be between the name, '=', and the value x=hello # infix (subject verb object) echo "$x" # prefix (verb subject) ``` Shell variable assignment isn't consistent with most syntax and really finnicky about whitespace. I propose a better way: ```sh set x hello ``` `set` would need to be a shell builtin to change the local environment; it would have the usage `set [variables...] [value]` to facilitate setting multiple variables at the same time: ```sh set x y z hello echo "$x $y $z" # hello hello hello ``` POSIX shell already has a `set` builtin for modifying `"$@"` and its components (as well as configuring the behavior of the shell itself); we should rethink the functionality offered by POSIX `set` (reserved shell variables for configuration, for example). _Originally posted by @trinity in https://git.tebibyte.media/bonsai/harakit/issues/8#issuecomment-3440_
Author
Owner

The way POSIX shell does variable assignment is awful.

#!/bin/env -i /bin/sh
# ^^ don't inherit an existing environment

echo "$x" # unassigned (vars are "" by default), so it'll echo ['\n']

#x = hello # syntax error
# no spaces can be between the name, '=', and the value

x=hello # infix (subject verb object)
echo "$x" # prefix (verb subject)

Shell variable assignment isn't consistent with most syntax and really finnicky about whitespace. I propose a better way:

set x hello

set would need to be a shell builtin to change the local environment; it would have the usage set [variables...] [value] to facilitate setting multiple variables at the same time:

set x y z hello
echo "$x $y $z" # hello hello hello

POSIX shell already has a set builtin for modifying "$@" and its components (as well as configuring the behavior of the shell itself); we should rethink the functionality offered by POSIX set (reserved shell variables for configuration, for example).

I like this, but could we use let instead?

Originally posted by @emma in #8 (comment)

> The way POSIX shell does variable assignment is awful. > > ```sh > #!/bin/env -i /bin/sh > # ^^ don't inherit an existing environment > > echo "$x" # unassigned (vars are "" by default), so it'll echo ['\n'] > > #x = hello # syntax error > # no spaces can be between the name, '=', and the value > > x=hello # infix (subject verb object) > echo "$x" # prefix (verb subject) > ``` > > Shell variable assignment isn't consistent with most syntax and really finnicky about whitespace. I propose a better way: > > ```sh > set x hello > ``` > > `set` would need to be a shell builtin to change the local environment; it would have the usage `set [variables...] [value]` to facilitate setting multiple variables at the same time: > > ```sh > set x y z hello > echo "$x $y $z" # hello hello hello > ``` > > POSIX shell already has a `set` builtin for modifying `"$@"` and its components (as well as configuring the behavior of the shell itself); we should rethink the functionality offered by POSIX `set` (reserved shell variables for configuration, for example). I like this, but could we use `let` instead? _Originally posted by @emma in https://git.tebibyte.media/bonsai/harakit/issues/8#issuecomment-3452_
Author
Owner

I have further thoughts on shell quoting.

When I think of program execution I think of the exec function family in C's <unistd.h>:

#include <unistd.h>

static char *args[] = {
    (char []){ "mm" },
    (char []){ "-i" },
    (char []){ "a b" }
};

int main(){
    execvp("mm", args);
}

Or subprocess.run in Python:

#!/usr/bin/env python3

import subprocess

subprocess.run(["mm", "-i", "a b"])

Or Rust's std::process::Command:

use std::process::Command;

// I don't know Rust well but I think this is valid
fn main() {
    let output = Command::new("mm")
        .args(["-i", "a b"]).output()
}

What these all have in common is that they have clear distinction between arguments, and if one wanted to use a variable as an argument it would be easy:

import subprocess
var="a b"
subprocess.run(["mm", "-i", var])

Meanwhile POSIX shell wants you to die:

#!/bin/sh

var="a b"

var2=$var # expands to `var2=a b`
# this runs `b` with `var2` equivalent to `a` in the
# child's environment

mm -i $var
# expands to `mm -i a b` which is invalid usage

No wonder people are desperate to use any interpreted programming language as a shell, asking if Python is a good fit and actually using Common Lisp. That being said, quoting every shell argument is at best inconvenient, with the example "mm" "-i" "a b" being 4 extra keypresses to type and up to 8 including the shift key.

I think we should start by mandating some useful rules that are already often followed by cautious scribes:

  • Always quote strings that contain whitespace; do not escape whitespace.

I don't think this will be very controversial. While escapes are convenient (an easy way to avoid navigating back to the beginning of the line, adding a quote, and then going back to the end just for one or two spaces) they're easy to mess up catastrophically:

#!/bin/sh

# removes one file
rm -f "A Super Duper Story (Draft).tex"

# removes one file
rm -f A\ Super\ Duper\ Story\ \(Draft\).tex

# removes two files (spot the typo!)
rm -f A\ Super\ Duper\ Story \(Draft\).tex
  • Always quote values in variable assignments.
#!/bin/sh

# again as in the example earlier, runs `b` with `var` set to
# `a` in the child environment
var=a b

# sets `var` to `a b`
var="a b"

# this is functionally equivalent to line 5 but makes
# it clear that the intention is that behavior rather
# than line 8's
var="a" b

This also seems non-controversial. I have more to say but am out of time to write so will comment this right now.

Originally posted by @trinity in #8 (comment)

I have further thoughts on shell quoting. When I think of program execution I think of [the `exec` function family in C's `<unistd.h>`](https://www.man7.org/linux/man-pages/man3/execvp.3p.html): ```c #include <unistd.h> static char *args[] = { (char []){ "mm" }, (char []){ "-i" }, (char []){ "a b" } }; int main(){ execvp("mm", args); } ``` Or [`subprocess.run` in Python](https://docs.python.org/3/library/subprocess.html#subprocess.run): ```python #!/usr/bin/env python3 import subprocess subprocess.run(["mm", "-i", "a b"]) ``` Or [Rust's `std::process::Command`](https://doc.rust-lang.org/std/process/struct.Command.html): ```rs use std::process::Command; // I don't know Rust well but I think this is valid fn main() { let output = Command::new("mm") .args(["-i", "a b"]).output() } ``` What these all have in common is that they have clear distinction between arguments, and if one wanted to use a variable as an argument it would be easy: ```py import subprocess var="a b" subprocess.run(["mm", "-i", var]) ``` Meanwhile POSIX shell wants you to die: ```sh #!/bin/sh var="a b" var2=$var # expands to `var2=a b` # this runs `b` with `var2` equivalent to `a` in the # child's environment mm -i $var # expands to `mm -i a b` which is invalid usage ``` No wonder people are desperate to use any interpreted programming language as a shell, [asking if Python is a good fit](https://softwareengineering.stackexchange.com/questions/182077/is-it-possible-to-use-python-as-a-shell-replacement) and [actually using Common Lisp](https://clisp.sourceforge.io/clash.html). That being said, quoting every shell argument is at best inconvenient, with the example `"mm" "-i" "a b"` being 4 extra keypresses to type and up to 8 including the shift key. I think we should start by mandating some useful rules that are already often followed by cautious scribes: - Always quote strings that contain whitespace; do not escape whitespace. I don't think this will be very controversial. While escapes are convenient (an easy way to avoid navigating back to the beginning of the line, adding a quote, and then going back to the end just for one or two spaces) they're easy to mess up catastrophically: ```sh #!/bin/sh # removes one file rm -f "A Super Duper Story (Draft).tex" # removes one file rm -f A\ Super\ Duper\ Story\ \(Draft\).tex # removes two files (spot the typo!) rm -f A\ Super\ Duper\ Story \(Draft\).tex ``` - Always quote values in variable assignments. ```sh #!/bin/sh # again as in the example earlier, runs `b` with `var` set to # `a` in the child environment var=a b # sets `var` to `a b` var="a b" # this is functionally equivalent to line 5 but makes # it clear that the intention is that behavior rather # than line 8's var="a" b ``` This also seems non-controversial. I have more to say but am out of time to write so will comment this right now. _Originally posted by @trinity in https://git.tebibyte.media/bonsai/harakit/issues/8#issuecomment-3504_
Author
Owner

Alright, this is the continuation of my last comment.

The behavior of the traditional POSIX shell with regards to unquoted variable expansion is useful, sometimes, but usually unwanted and a pain to deal with. In Python if I wanted that behavior I'd use str.split:

import subprocess

var="a b"

subprocess.run(
    # ["mm", "-i", "a", "b"]
    ["mm", "-i"] + var.split()
)

The C standard library has no such helper function (the functionality offered by str.split could be replicated though) and Rust is as of now beyond me.

Originally posted by @trinity in #8 (comment)

Alright, this is the continuation of my last comment. The behavior of the traditional POSIX shell with regards to unquoted variable expansion is useful, sometimes, but usually unwanted and a pain to deal with. In Python if I wanted that behavior I'd use [str.split](https://docs.python.org/3/library/stdtypes.html#str.split): ```py import subprocess var="a b" subprocess.run( # ["mm", "-i", "a", "b"] ["mm", "-i"] + var.split() ) ``` The C standard library has no such helper function (the functionality offered by `str.split` could be replicated though) and Rust is as of now beyond me. _Originally posted by @trinity in https://git.tebibyte.media/bonsai/harakit/issues/8#issuecomment-3505_
Author
Owner

alright i haven't slept in a few days and have important things to be doing rn, what better idea than to propose shell syntax. this is poorly thought-out and full of holes. have fun deciphering and feel free to harass me if that takes too long

i haven't read through like any of the posts here so i might repeat or redefine things that've been discussed already

i will be using words here. maybe (read: probably) even misusing words. here's a best-effort explanation of my nonsense:

  • term: an evaluable substring of a term (yeah it's a recursive definition, weep). terms may evaluate to their literal text, or evaluate to the result of their contained sub-terms if present. dw if this doesn't make sense, i'll explain it more below.

editor's note: i simplified this quite a bit, only one word remains standing. you're welcome.

emma and i were discussing variable assignment and the potential usage of a let term, which is used to define terms. some pseudoishcode snippets from the conversation to elucidate on that a bit:

; let a meow  # the term `a` evaluates to `meow`
; let b "hru 2 3 -"  # the term `b` evaluates to `hru 2 3 -` which itself evaluates to the results of the command
; a
/bin/qi: meow: not found
; b
-1
; let a meowzers
# a <- meowzers
; let meowzers b
; let a meowzers
# a <- b
; let 'the bomb dot com' meow
; 'the bomb dot com'
/bin/qi: meow: not found
; let a b c meowzers
# a, b, and c all equal meowzers
;let a let
;a b c
# b <- c

some important observations from this that i've already made on your behalf:

  • a term is any valid rust string. keywords can be redefined. let itself can be redefined. also, spaces symbols are valid. # can be redefined.
  • a term may take arguments from its right hand side. let does this. hru (we have an hru?) also does this. let's call these argument-taking terms "operator terms", or maybe just "operators", and their arguments "operands".
  • one could describe # as an operator that takes all subsequent terms as its operands and does nothing
  • looking at let's operands, they're just terms. they may be taken literally, but they may also be operator terms.
  • this all unlocks some very interesting metaprogramming capabilities

so we have operators, which operate on and consume the terms to their right. these terms may be operators themselves, but they may also just be literals. do you smell it yet? i smell it. it's the smell of polish notation. alright so what if we did polish notation in more places.

suppose there was a pn(1) utility, serving as the prefix version of rpn(1):

# math in the shell can still be concise!
let + 'pn +'
let - 'pn-ng -'  # you could even swap out implementations for various operations if desirable
+ 3 - 2 1  # `pn + 3 pn-ng - 2 1` -> `pn + 3 1` -> `4`

as the reader it is now your job to come up with a more exciting example than that bc that's as far as my thoughts are willing to go at this time of night.

there are issues with this. the main one that jumps out is that in your typical pn language, the interpreter is aware of how many terms will be consumed as operands by a given operator. however, when an operator can be an arbitrary executable, the number of terms consumed is completely ambiguous. there is a fix, which is to let the programmer define those bounds themselves:

(let + (pn +))
(let - (pn-ng -)) (# is it just me or does this syntax seem kinda familiar)
(+ 3 (- 2 1))

oops. we should probably not do this.

realized that about halfway through typing out this textwall but was told to post it anyways so here ya go

Originally posted by @silt in #8 (comment)

alright i haven't slept in a few days and have important things to be doing rn, what better idea than to propose shell syntax. this is poorly thought-out and full of holes. have fun deciphering and feel free to harass me if that takes too long i haven't read through like any of the posts here so i might repeat or redefine things that've been discussed already i will be using words here. maybe (read: probably) even misusing words. here's a best-effort explanation of my nonsense: - term: an evaluable substring of a term (yeah it's a recursive definition, weep). terms may evaluate to their literal text, or evaluate to the result of their contained sub-terms if present. dw if this doesn't make sense, i'll explain it more below. *editor's note: i simplified this quite a bit, only one word remains standing. you're welcome.* emma and i were discussing variable assignment and the potential usage of a `let` term, which is used to define terms. some pseudoishcode snippets from the conversation to elucidate on that a bit: ``` ; let a meow # the term `a` evaluates to `meow` ; let b "hru 2 3 -" # the term `b` evaluates to `hru 2 3 -` which itself evaluates to the results of the command ; a /bin/qi: meow: not found ; b -1 ``` ``` ; let a meowzers # a <- meowzers ; let meowzers b ; let a meowzers # a <- b ``` ``` ; let 'the bomb dot com' meow ; 'the bomb dot com' /bin/qi: meow: not found ``` ``` ; let a b c meowzers # a, b, and c all equal meowzers ``` ``` ;let a let ;a b c # b <- c ``` some important observations from this that i've already made on your behalf: - a term is *any* valid rust string. keywords can be redefined. `let` itself can be redefined. also, spaces symbols are valid. `#` can be redefined. - a term may take arguments from its right hand side. `let` does this. `hru` (we have an `hru`?) also does this. let's call these argument-taking terms "operator terms", or maybe just "operators", and their arguments "operands". - one could describe `#` as an operator that takes all subsequent terms as its operands and does nothing - looking at `let`'s operands, they're just terms. they may be taken literally, but they may also be operator terms. - this all unlocks some very interesting metaprogramming capabilities so we have operators, which operate on and consume the terms to their right. these terms may be operators themselves, but they may also just be literals. do you smell it yet? i smell it. it's the smell of polish notation. alright so what if we did polish notation in more places. suppose there was a `pn(1)` utility, serving as the prefix version of `rpn(1)`: ``` # math in the shell can still be concise! let + 'pn +' let - 'pn-ng -' # you could even swap out implementations for various operations if desirable + 3 - 2 1 # `pn + 3 pn-ng - 2 1` -> `pn + 3 1` -> `4` ``` as the reader it is now your job to come up with a more exciting example than that bc that's as far as my thoughts are willing to go at this time of night. there are issues with this. the main one that jumps out is that in your typical pn language, the interpreter is aware of how many terms will be consumed as operands by a given operator. however, when an operator can be an arbitrary executable, the number of terms consumed is completely ambiguous. there is a fix, which is to let the programmer define those bounds themselves: ``` (let + (pn +)) (let - (pn-ng -)) (# is it just me or does this syntax seem kinda familiar) (+ 3 (- 2 1)) ``` [oops](http://lambda-the-ultimate.org/node/2352). we should probably not do this. realized that about halfway through typing out this textwall but was told to post it anyways so here ya go _Originally posted by @silt in https://git.tebibyte.media/bonsai/harakit/issues/8#issuecomment-3516_
Author
Owner

The rest of the comments in #8 are the discussion of a system of variable expansion that does not distinguish between plaintext and variable names. This syntax has been verbally discussed by @trinity and I and we have agreed that it is ill-advised.

The rest of the comments in #8 are the discussion of a system of variable expansion that does not distinguish between plaintext and variable names. This syntax has been verbally discussed by @trinity and I and we have agreed that it is ill-advised.
Owner

Some thoughts on dereferencing.

  • Prefixing with a symbol may be the only good option.
    • nasm's [var] would be simple but ugly. I have the same opinion of any paired symbols for this purpose.
    • A suffixed operator would be weird and probably less readable.
  • C uses the unary operator *, to which I'm partial, but this might interfere with user expectations for globbing.
  • sh(1p) uses $, which has accumulated a fair bit of precedent. I haven't made up my mind about this.
  • Maybe % would work?
Some thoughts on dereferencing. - Prefixing with a symbol may be the only good option. - [nasm's `[var]`](https://www.nasm.us/doc/nasmdoc2.html#section-2.2.2) would be simple but ugly. I have the same opinion of any paired symbols for this purpose. - A suffixed operator would be weird and probably less readable. - C uses the unary operator `*`, to which I'm partial, but this might interfere with user expectations for globbing. - sh(1p) uses [`$`](https://en.m.wikipedia.org/wiki/Dollar_sign), which has accumulated a fair bit of precedent. I haven't made up my mind about this. - Maybe `%` would work?
Sign in to join this conversation.
No Milestone
No project
No Assignees
2 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: bonsai/harakit#152
No description provided.