ASV in Bonsai #19

Open
opened 2023-12-29 20:46:29 +00:00 by emma · 17 comments
Owner

I think it would be better to have ASCII horizontal tabs as the delimiter if standard output is a tty so output is readable in all fonts and on all terminals. At least npc(1) exists.

Originally posted by @trinity in /bonsai/coreutils/pulls/18#issuecomment-2722

Currently the idea is that the coreutils will speak ASV natively. Should that be represented differently in the output of commands if stdout is a terminal?

> I think it would be better to have ASCII horizontal tabs as the delimiter if standard output is a tty so output is readable in all fonts and on all terminals. At least npc(1) exists. _Originally posted by @trinity in /bonsai/coreutils/pulls/18#issuecomment-2722_ Currently the idea is that the coreutils will speak ASV natively. Should that be represented differently in the output of commands if stdout is a terminal?
emma added the
question
label 2023-12-29 20:46:40 +00:00
emma self-assigned this 2023-12-29 20:46:46 +00:00
trinity was assigned by emma 2023-12-29 20:46:46 +00:00
silt was assigned by emma 2023-12-29 20:46:46 +00:00
Owner

I don't think the representation should be changed just because of where stdout is going. This unfortunately commonplace behavior already causes a frustrating level of unpredictability with certain tools. ripgrep, for as much as I sing its praises, is one of the worst examples that comes to mind, significantly changing output formatting depending on whether stdout is a TTY or not.

I believe it's helpful in this case to think about what the user would expect to happen in a reasonably-written program, and what existing utilities are already doing as a result. Most people aren't accustomed to working with ASV delimiters, but there is a usually-unprintable delimiter that comes up pretty frequently: null! While a tool such as npc (or the much-maligned cat -v) may render a null byte as something like ^@, under normal circumstances, null bytes are simply dumped into the terminal as-is. A lot of programs already support using it as a delimiter in normal output specifically because of the limitations of newline or comma delimitation, and I've yet to see any calls to special-case their output formatting for TTYs.

I don't think the representation should be changed just because of where stdout is going. This unfortunately commonplace behavior already causes a frustrating level of unpredictability with certain tools. [ripgrep](https://github.com/BurntSushi/ripgrep), for as much as I sing its praises, is one of the worst examples that comes to mind, significantly changing output formatting depending on whether stdout is a TTY or not. I believe it's helpful in this case to think about what the user would expect to happen in a reasonably-written program, and what existing utilities are already doing as a result. Most people aren't accustomed to working with ASV delimiters, but there *is* a usually-unprintable delimiter that comes up pretty frequently: null! While a tool such as `npc` (or the much-maligned `cat -v`) may render a null byte as something like `^@`, under normal circumstances, null bytes are simply dumped into the terminal as-is. A lot of programs already support using it as a delimiter in normal output *specifically because of* the limitations of newline or comma delimitation, and I've yet to see any calls to special-case their output formatting for TTYs.
Author
Owner

Another concern that has been brought up a number of times is that some fonts do not print the ASCII field separator character; however, I see this as an issue with the fonts and not an issue for our utilities to solve. Bending over backward to solve issues caused by other software on users’ systems isn’t, in my view, what we should be doing.

Another concern that has been brought up a number of times is that some fonts do not print the ASCII field separator character; however, I see this as an issue with the fonts and not an issue for our utilities to solve. Bending over backward to solve issues caused by other software on users’ systems isn’t, in my view, what we should be doing.
Owner

I had a number of qualms about ASCII separated values I've answered for myself.

Another concern that has been brought up a number of times is that some fonts do not print the ASCII field separator character; however, I see this as an issue with the fonts and not an issue for our utilities to solve. Bending over backward to solve issues caused by other software on users’ systems isn’t, in my view, what we should be doing.

Having to change swathes of a system to accomodate Bonsai will make potential users hesitant. My Linux framebuffer doesn't display the field separator, I'm pretty sure neither does rxvt or unscii or the combination of the two. I can cope because npc(1) is in the tree and ASCII FS, GS, RS, and US display ^\, ^], ^^, and ^_ respectively when piped through it.

My concern was that it may be better for the sake of beginners to have visually comprehensible output if outputting to a tty and a tab is a visual, intuitive separator. However I think even if an ASCII separator isn't displayed in any form, that nearly-correct output would lead an informed beginner to read the utility's man page and learn about ASCII separators and why we use them.

I don't think the representation should be changed just because of where stdout is going. This unfortunately commonplace behavior already causes a frustrating level of unpredictability with certain tools. ripgrep, for as much as I sing its praises, is one of the worst examples that comes to mind, significantly changing output formatting depending on whether stdout is a TTY or not.

I agree now that this would cause more confusion than it's worth. I initially proposed this as a compromise with Emma; fae wanted ASV by default and I wanted TSV by default with an ASV option.

I believe it's helpful in this case to think about what the user would expect to happen in a reasonably-written program, and what existing utilities are already doing as a result. Most people aren't accustomed to working with ASV delimiters, but there is a usually-unprintable delimiter that comes up pretty frequently: null! While a tool such as npc (or the much-maligned cat -v) may render a null byte as something like ^@, under normal circumstances, null bytes are simply dumped into the terminal as-is. A lot of programs already support using it as a delimiter in normal output specifically because of the limitations of newline or comma delimitation, and I've yet to see any calls to special-case their output formatting for TTYs.

Can you name a certain tool? All the tools I know have -0 as a special case to support this, I don't think I know any that use nul as a delimiter as a default.

As an aside, nul as a delimiter is interesting because the only time nul is really used is in binary data; it's disallowed in filenames and practically never used in text data (as it's the string terminator in the C standard library). ASCII separators are also practically never used in text data but there's no reason they couldn't be, they can be used in C standard library strings and filenames. Will this be flatly disallowed, or how will they be quoted?

I'd like to clarify that I agree with the use of ASV, as an unconditional default, for program output in Bonsai. Emma and I discussed this tonight and I came to that conclusion based on what I've mentioned in this comment.

I had a number of qualms about ASCII separated values I've answered for myself. > Another concern that has been brought up a number of times is that some fonts do not print the ASCII field separator character; however, I see this as an issue with the fonts and not an issue for our utilities to solve. Bending over backward to solve issues caused by other software on users’ systems isn’t, in my view, what we should be doing. Having to change swathes of a system to accomodate Bonsai will make potential users hesitant. My Linux framebuffer doesn't display the field separator, I'm pretty sure neither does rxvt or unscii or the combination of the two. I can cope because npc(1) is in the tree and ASCII `FS`, `GS`, `RS`, and `US` display `^\`, `^]`, `^^`, and `^_` respectively when piped through it. My concern was that it may be better for the sake of beginners to have visually comprehensible output if outputting to a tty and a tab is a visual, intuitive separator. However I think even if an ASCII separator isn't displayed in any form, that nearly-correct output would lead an informed beginner to read the utility's man page and learn about ASCII separators and why we use them. > I don't think the representation should be changed just because of where stdout is going. This unfortunately commonplace behavior already causes a frustrating level of unpredictability with certain tools. [ripgrep](https://github.com/BurntSushi/ripgrep), for as much as I sing its praises, is one of the worst examples that comes to mind, significantly changing output formatting depending on whether stdout is a TTY or not. I agree now that this would cause more confusion than it's worth. I initially proposed this as a compromise with Emma; fae wanted ASV by default and I wanted TSV by default with an ASV option. > I believe it's helpful in this case to think about what the user would expect to happen in a reasonably-written program, and what existing utilities are already doing as a result. Most people aren't accustomed to working with ASV delimiters, but there *is* a usually-unprintable delimiter that comes up pretty frequently: null! While a tool such as `npc` (or the much-maligned `cat -v`) may render a null byte as something like `^@`, under normal circumstances, null bytes are simply dumped into the terminal as-is. A lot of programs already support using it as a delimiter in normal output *specifically because of* the limitations of newline or comma delimitation, and I've yet to see any calls to special-case their output formatting for TTYs. Can you name a certain tool? All the tools I know have `-0` as a special case to support this, I don't think I know any that use nul as a delimiter as a default. As an aside, nul as a delimiter is interesting because the only time nul is really used is in binary data; it's disallowed in filenames and practically never used in text data (as it's the string terminator in the C standard library). ASCII separators are also practically never used in text data but there's no reason they couldn't be, they can be used in C standard library strings and filenames. Will this be flatly disallowed, or how will they be quoted? I'd like to clarify that I agree with the use of ASV, as an unconditional default, for program output in Bonsai. Emma and I discussed this tonight and I came to that conclusion based on what I've mentioned in this comment.
Author
Owner

i’m going to close this but feel free to continue discussing

i’m going to close this but feel free to continue discussing
emma closed this issue 2023-12-30 18:00:06 +00:00
emma added the
wontfix
label 2023-12-30 18:00:19 +00:00
Owner

Damn, I replied to this last night via email but it looks like that never got posted. Pasting this from my sent folder and hoping the formatting isn't broken:

My Linux framebuffer doesn't display the field separator, I'm pretty sure neither does rxvt or unscii or the combination of the two.

It seems to be hardcoded in most terminals that control characters (0x00 to 0x1F) should not even be allocated a cell, denying fonts, and users, the ability to render glyphs in their place. This behavior is consistent across at least foot, alacritty, Konsole, u?rxvt, and the Linux VT. I believe this to be a deficiency in the design of terminals—one that has only been allowed to persist for so long due to the neglect shown towards the information separators. I'm not naïve enough to even gesture at this project changing that, but I am spiteful enough to continue pushing for ASV in the face of it, and I have been considering looking into what'd be required to patch foot to allow the printing of certain control codes. I do acknowledge this as a problem, but not enough of one to change my stance.

I wanted TSV by default with an ASV option.

I'd like to see a program that takes ASV output and formats it in a nice little TSV table. Perhaps this would be better suited as a feature of betta, I don't know.

Can you name a certain tool? All the tools I know have -0 as a special case to support this, I don't think I know any that use nul as a delimiter as a default.

Minor miscommunication, what I meant was that there are tools that do support switches like -0 and will happily dump nulls into a TTY without any special reformatting.

As an aside, nul as a delimiter is interesting because the only time nul is really used is in binary data; it's disallowed in filenames and practically never used in text data (as it's the string terminator in the C standard library). ASCII separators are also practically never used in text data but there's no reason they couldn't be, they can be used in C standard library strings and filenames. Will this be flatly disallowed, or how will they be quoted?

Yeah, this is an issue. Even some kernel people have expressed a desire to disallow control characters in filenames as there's genuinely never a legitimate reason for them to be there in the first place, but this is unfortunately the userspace we're stuck with now. Insert vague grumbling about Torvalds. While I don't agree with everything being said in it, https://dwheeler.com/essays/fixing-unix-linux-filenames.html goes into just how horribly borked most software already is when it comes to handling filenames with control characters and escape sequences. This is an already-existing problem, and there's no great solution to it apart from letting users figure out that doing stupid things to their filenames is, in fact, stupid.

Damn, I replied to this last night via email but it looks like that never got posted. Pasting this from my sent folder and hoping the formatting isn't broken: > My Linux framebuffer doesn't display the field separator, I'm pretty sure neither does rxvt or unscii or the combination of the two. It seems to be hardcoded in most terminals that control characters (`0x00` to `0x1F`) should not even be allocated a cell, denying fonts, and users, the ability to render glyphs in their place. This behavior is consistent across at least foot, alacritty, Konsole, u?rxvt, and the Linux VT. I believe this to be a deficiency in the design of terminals—one that has only been allowed to persist for so long due to the neglect shown towards the information separators. I'm not naïve enough to even gesture at this project changing that, but I am spiteful enough to continue pushing for ASV in the face of it, and I have been considering looking into what'd be required to patch foot to allow the printing of certain control codes. I do acknowledge this as a problem, but not enough of one to change my stance. > I wanted TSV by default with an ASV option. I'd like to see a program that takes ASV output and formats it in a nice little TSV table. Perhaps this would be better suited as a feature of betta, I don't know. > Can you name a certain tool? All the tools I know have -0 as a special case to support this, I don't think I know any that use nul as a delimiter as a default. Minor miscommunication, what I meant was that there are tools that do support switches like `-0` and will happily dump nulls into a TTY without any special reformatting. > As an aside, nul as a delimiter is interesting because the only time nul is really used is in binary data; it's disallowed in filenames and practically never used in text data (as it's the string terminator in the C standard library). ASCII separators are also practically never used in text data but there's no reason they couldn't be, they can be used in C standard library strings and filenames. Will this be flatly disallowed, or how will they be quoted? Yeah, this is an issue. Even some kernel people have expressed a desire to disallow control characters in filenames as there's genuinely never a legitimate reason for them to be there in the first place, but this is unfortunately the userspace we're stuck with now. Insert vague grumbling about Torvalds. While I don't agree with everything being said in it, https://dwheeler.com/essays/fixing-unix-linux-filenames.html goes into just how horribly borked most software already is when it comes to handling filenames with control characters and escape sequences. This is an already-existing problem, and there's no great solution to it apart from letting users figure out that doing stupid things to their filenames is, in fact, stupid.
Author
Owner

@silt the worst part of Gitea and the reason I want to make Mintee is that Gitea doesn’t do e-mail thread replies.

@silt the worst part of Gitea and the reason I want to make Mintee is that Gitea doesn’t do e-mail thread replies.
Author
Owner

It seems to be hardcoded in most terminals that control characters (0x00 to 0x1F) should not even be allocated a cell, denying fonts, and users, the ability to render glyphs in their place. This behavior is consistent across at least foot, alacritty, Konsole, u?rxvt, and the Linux VT. I believe this to be a deficiency in the design of terminals—one that has only been allowed to persist for so long due to the neglect shown towards the information separators. I'm not naïve enough to even gesture at this project changing that, but I am spiteful enough to continue pushing for ASV in the face of it, and I have been considering looking into what'd be required to patch foot to allow the printing of certain control codes. I do acknowledge this as a problem, but not enough of one to change my stance.

My kitty install prints the record separator character.

> It seems to be hardcoded in most terminals that control characters (0x00 to 0x1F) should not even be allocated a cell, denying fonts, and users, the ability to render glyphs in their place. This behavior is consistent across at least foot, alacritty, Konsole, u?rxvt, and the Linux VT. I believe this to be a deficiency in the design of terminals—one that has only been allowed to persist for so long due to the neglect shown towards the information separators. I'm not naïve enough to even gesture at this project changing that, but I am spiteful enough to continue pushing for ASV in the face of it, and I have been considering looking into what'd be required to patch foot to allow the printing of certain control codes. I do acknowledge this as a problem, but not enough of one to change my stance. My kitty install prints the record separator character.
Owner

I have been considering looking into what'd be required to patch foot to allow the printing of certain control codes.

foot's codebase is delightfully readable and this ended up being quite easy to do. I've instructed it to, upon encountering an information separator, print the appropriate glyph from the Control Pictures block. Haven't noticed any serious issues and I don't particularly expect to. The fact that they are entirely indistinguishable from the actual Unicode glyphs used to indicate their presence is... not great in my opinion. However, at least for my usage, I think this is better than simply ignoring them. One thing to note is that copy-pasting this from the terminal window will not copy the control codes, but rather the Unicode glyphs. I don't love this behavior, but it is again Good Enough For Me.

Here's the patch for completeness's sake:

diff --git a/vt.c b/vt.c
index 0f7bfe63..715614a9 100644
--- a/vt.c
+++ b/vt.c
@@ -1,4 +1,5 @@
 #include "vt.h"
+#include "terminal.h"
 
 #include <stdlib.h>
 #include <string.h>
@@ -250,6 +251,26 @@ action_execute(struct terminal *term, uint8_t c)
         term_update_ascii_printer(term);
         break;
 
+    case '\x1c':
+        /* FS - \x1c - file separator */
+        term_print(term, U'␜', 1);
+        break;
+
+    case '\x1d':
+        /* GS - \x1d - group separator */
+        term_print(term, U'␝', 1);
+        break;
+
+    case '\x1e':
+        /* RS - \x1e - record separator */
+        term_print(term, U'␞', 1);
+        break;
+
+    case '\x1f':
+        /* US - \x1f - unit separator */
+        term_print(term, U'␟', 1);
+        break;
+
         /*
          * 8-bit C1 control characters
          *
> I have been considering looking into what'd be required to patch foot to allow the printing of certain control codes. foot's codebase is delightfully readable and this ended up being quite easy to do. I've instructed it to, upon encountering an information separator, print the appropriate glyph from the Control Pictures block. Haven't noticed any serious issues and I don't particularly expect to. The fact that they are entirely indistinguishable from the actual Unicode glyphs used to indicate their presence is... not great in my opinion. However, at least for my usage, I think this is better than simply ignoring them. One thing to note is that copy-pasting this from the terminal window will *not* copy the control codes, but rather the Unicode glyphs. I don't love this behavior, but it is again Good Enough For Me. Here's the patch for completeness's sake: ```diff diff --git a/vt.c b/vt.c index 0f7bfe63..715614a9 100644 --- a/vt.c +++ b/vt.c @@ -1,4 +1,5 @@ #include "vt.h" +#include "terminal.h" #include <stdlib.h> #include <string.h> @@ -250,6 +251,26 @@ action_execute(struct terminal *term, uint8_t c) term_update_ascii_printer(term); break; + case '\x1c': + /* FS - \x1c - file separator */ + term_print(term, U'␜', 1); + break; + + case '\x1d': + /* GS - \x1d - group separator */ + term_print(term, U'␝', 1); + break; + + case '\x1e': + /* RS - \x1e - record separator */ + term_print(term, U'␞', 1); + break; + + case '\x1f': + /* US - \x1f - unit separator */ + term_print(term, U'␟', 1); + break; + /* * 8-bit C1 control characters * ```
Owner

I think we ought to reopen this issue as an ongoing discussion regarding ASV handling in Bonsai.

I think we ought to reopen this issue as an ongoing discussion regarding ASV handling in Bonsai.
Author
Owner

agreed

agreed
emma reopened this issue 2023-12-31 22:56:59 +00:00
emma pinned this 2023-12-31 22:57:54 +00:00
Owner

I was planning on creating a new issue for this, but I guess I'll put it here if this is turning into the ASV discussion thread.

While we have agreed to "use ASV", I have yet to see any discussion of what that practically means. How will ASV be used? USAS X3.4-1968 was already very loose in its description of proper IS usage (see page 10), only asking that their hierarchical relationship be preserved. A look at a more modern version of the ASCII spec shows only further loosening in this regard, with INCITS 4-1986[R2007] dropping the hierarchical requirement. While the proper usage is defined in several places within the document, "4.1.5 Information Separators" most completely explains the current situation. While I was unable to find a more recent revision of the standard, I think it's safe to assume that things haven't gotten any stricter since then. All this is to say that there is no standard for us to follow here, and the precise ways in which Bonsai utilities should/do/will interact with ASV (and IS characters in general) needs to be formally specified.

I was planning on creating a new issue for this, but I guess I'll put it here if this is turning into the ASV discussion thread. While we have agreed to "use ASV", I have yet to see any discussion of what that practically means. *How* will ASV be used? [USAS X3.4-1968](https://ia800800.us.archive.org/35/items/enf-ascii-1968-1970/Image070917151315.pdf) was already *very* loose in its description of proper IS usage (see page 10), only asking that their hierarchical relationship be preserved. A look at a more modern version of the ASCII spec shows only further loosening in this regard, with [INCITS 4-1986[R2007]](http://sliderule.mraiow.com/w/images/7/73/ASCII.pdf) dropping the hierarchical requirement. While the proper usage is defined in several places within the document, "4.1.5 Information Separators" most completely explains the current situation. While I was unable to find a more recent revision of the standard, I think it's safe to assume that things haven't gotten any stricter since then. All this is to say that *there is no standard for us to follow here*, and the precise ways in which Bonsai utilities should/do/will interact with ASV (and IS characters in general) *needs* to be formally specified.
Author
Owner

@trinity has some good input on how to do this

@trinity has some good input on how to do this
Author
Owner

we can write a man(7) page on our asv usage

we can write a man(7) page on our asv usage
emma removed their assignment 2024-01-01 06:37:07 +00:00
trinity was unassigned by emma 2024-01-01 06:37:07 +00:00
Owner

My rough interpretation of ASV dating back some years (this is why I wrote ascii.h) was the following:

  • ASCII_US is the unit separator. This is for cells on a spreadsheet.
  • ASCII_RS is the record separator. This is for rows on a spreadsheet.
  • ASCII_GS is the group separator. This is for sheets in a spreadsheet document.
  • ASCII_FS is the file separator. This is for terminating files, and means proper ASV files can be cat(1p)ed together without loss of content.

Coded Character Sets, History and Development may have historical information but I'm looking into some other stuff right now.

My rough interpretation of ASV dating back some years (this is why I wrote [ascii.h](https://git.sr.ht/~trinity/src/tree/main/item/ascii/ascii.h)) was the following: - `ASCII_US` is the unit separator. This is for cells on a spreadsheet. - `ASCII_RS` is the record separator. This is for rows on a spreadsheet. - `ASCII_GS` is the group separator. This is for sheets in a spreadsheet document. - `ASCII_FS` is the file separator. This is for terminating files, and means proper ASV files can be cat(1p)ed together without loss of content. [Coded Character Sets, History and Development](https://textfiles.meulie.net/bitsaved/Books/Mackenzie_CodedCharSets.pdf) may have historical information but I'm looking into some other stuff right now.
emma changed title from ASV terminal output/representation to ASV in Bonsai 2024-01-01 07:15:43 +00:00
emma added
enhancement
and removed
wontfix
labels 2024-01-01 07:16:54 +00:00
Owner

While we have agreed to "use ASV", I have yet to see any discussion of what that practically means. How will ASV be used? USAS X3.4-1968 was already very loose in its description of proper IS usage (see page 10), only asking that their hierarchical relationship be preserved. A look at a more modern version of the ASCII spec shows only further loosening in this regard, with INCITS 4-1986[R2007] dropping the hierarchical requirement. While the proper usage is defined in several places within the document, "4.1.5 Information Separators" most completely explains the current situation. While I was unable to find a more recent revision of the standard, I think it's safe to assume that things haven't gotten any stricter since then. All this is to say that there is no standard for us to follow here, and the precise ways in which Bonsai utilities should/do/will interact with ASV (and IS characters in general) needs to be formally specified.

I've been reading the terrific Coded Character Sets, History and Development which was written by someone who was involved with the creation of ASCII itself. It seems the earliest use of encoded field separators was the Hollerith Card Code (which predates the identically-named Hollerith Card Code used for representing ASCII on punched cards(? - I skimmed the article only long enough to realize it wasn't what I wanted)), named for Herman Hollerith, who designed the code for his tabulation machine, itself designed for the 1890 census and wrote the paper An Electric Tabulating System in 1889 about this work. The company Hollerith founded eventually became IBM. The information I can find on the Hollerith encoding is scarce and mainly comes from the mentioned book Coded Character Sets.

I haven't thoroughly enough looked into following text encodings (or the preceding ones but Hollerith's may be the first).

> While we have agreed to "use ASV", I have yet to see any discussion of what that practically means. *How* will ASV be used? [USAS X3.4-1968](https://ia800800.us.archive.org/35/items/enf-ascii-1968-1970/Image070917151315.pdf) was already *very* loose in its description of proper IS usage (see page 10), only asking that their hierarchical relationship be preserved. A look at a more modern version of the ASCII spec shows only further loosening in this regard, with [INCITS 4-1986[R2007]](http://sliderule.mraiow.com/w/images/7/73/ASCII.pdf) dropping the hierarchical requirement. While the proper usage is defined in several places within the document, "4.1.5 Information Separators" most completely explains the current situation. While I was unable to find a more recent revision of the standard, I think it's safe to assume that things haven't gotten any stricter since then. All this is to say that *there is no standard for us to follow here*, and the precise ways in which Bonsai utilities should/do/will interact with ASV (and IS characters in general) *needs* to be formally specified. I've been reading the terrific *Coded Character Sets, History and Development* which was written by someone who was involved with the creation of ASCII itself. It seems the earliest use of encoded field separators was the Hollerith Card Code (which predates the identically-named [Hollerith Card Code](https://dl.acm.org/doi/pdf/10.1145/362991.363052) used for representing ASCII on punched cards(? - I skimmed the article only long enough to realize it wasn't what I wanted)), named for [Herman Hollerith](https://en.m.wikipedia.org/wiki/Herman_Hollerith), who designed the code for [his tabulation machine, itself designed for the 1890 census](https://www.computerhistory.org/collections/catalog/X193.83) and wrote the paper [*An Electric Tabulating System*](https://web.archive.org/web/20231117234140/http://www.columbia.edu/cu/computinghistory/hh/index.html) in 1889 about this work. The [company Hollerith founded](https://en.m.wikipedia.org/wiki/Computing-Tabulating-Recording_Company) eventually became IBM. The information I can find on the Hollerith encoding is scarce and mainly comes from the mentioned book *Coded Character Sets*. I haven't thoroughly enough looked into following text encodings (or the preceding ones but Hollerith's may be the first).
Owner

Relevant: https://github.com/SixArm/usv

They seem to believe they have monopolized Unicode Separated Values:

The USV project aims to become a free open source IANA standard, much like the IANA standard for CSV.

Until the standardization happens, the terms "USV" and "Unicode Separated Values" are trademarks of this project, and this repository is copyright 2022-2024. When IANA approves the standard, then the trademarks and copyrights become public domain.

Alrighty... it would be more egregious if they weren't giving the trademarks to the public domain eventually^(TM).

The thing that really grinds my gears is that they aren't using the Unicode control characters but the runes representing their graphical representation. Yuck! They did find some issues with using the control characters which compelled them to switch to the displayed runes but they're working on a data format while we're working on an ecosystem so we can actually solve these issues (for ourselves). And most of these caveats we've already run into.

They have a history of ASV document which pilfers Wikipedia and this blog post: https://www.lammertbies.nl/comm/info/ascii-characters which is probably interesting but I haven't dug into it yet.

I find their use of the graphical representation for ESC to be interesting... Maybe we should use ASCII ESC to escape literal field separators?

Relevant: https://github.com/SixArm/usv They seem to believe they have monopolized Unicode Separated Values: > The USV project aims to become a free open source IANA standard, much like the IANA standard for CSV. > Until the standardization happens, the terms "USV" and "Unicode Separated Values" are trademarks of this project, and this repository is copyright 2022-2024. When IANA approves the standard, then the trademarks and copyrights become public domain. Alrighty... it would be more egregious if they weren't giving the trademarks to the public domain eventually^(TM). The thing that really grinds my gears is that they aren't using the Unicode control characters but the runes representing their graphical representation. Yuck! They did find some [issues with using the control characters](https://github.com/SixArm/usv/blob/main/doc/objections.md) which compelled them to switch to the displayed runes but they're working on a data format while we're working on an ecosystem so we can actually solve these issues (for ourselves). And most of these caveats we've already run into. They have a history of ASV document which pilfers Wikipedia and this blog post: https://www.lammertbies.nl/comm/info/ascii-characters which is probably interesting but I haven't dug into it yet. I find their use of the graphical representation for ESC to be interesting... Maybe we should use ASCII ESC to escape literal field separators?
emma added a new dependency 2024-02-24 21:48:02 +00:00
Owner

More discussion involving the proposed USV: https://news.ycombinator.com/item?id=39679378.

More discussion involving the proposed USV: https://news.ycombinator.com/item?id=39679378.
Sign in to join this conversation.
No Milestone
No project
No Assignees
3 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Blocks
#74 `ls(1p)` analogue
bonsai/coreutils
Reference: bonsai/coreutils#19
No description provided.