Clarify reasoning for STYLE #29 #170

Open
opened 2025-04-04 14:03:46 -06:00 by silt · 6 comments
Owner

STYLE Line 188 in 8eece4cf84
29. C struct bitfields in unions, to access certain bits of bigger data types,

in what way is this poorly specified? i'm not an expert in this area, but i've used bitfields within unions before and didn't find any ambiguity apart from the standards deferring to the target platform's abi for bitfield ordering. if this is the concern, either mention it or include a relevant source.

https://git.tebibyte.media/bonsai/harakit/src/commit/8eece4cf84cac2b2bad0cff939afc219b9aea738/STYLE#L188 in what way is this poorly specified? i'm not an expert in this area, but i've used bitfields within unions before and didn't find any ambiguity apart from the standards deferring to the target platform's abi for bitfield ordering. if this is the concern, either mention it or include a relevant source.
trinity was assigned by silt 2025-04-04 14:03:46 -06:00
Owner

It is because of ordering.


Long winded explanation:

From the C89 draft: 3.5.2.1 Structure and union specifiers

 An implementation may allocate any addressable storage unit large enough to hold a bit-field. If enough space remains, a bit-field that immediately follows another bit-field in a structure shall be packed into adjacent bits of the same unit. If insufficient space remains, whether a bit-field that does not fit is put into the next unit or overlaps adjacent units is implementation-defined. The order of allocation of bit-fields within a unit (high-order to low-order or low-order to high-order) is implementation-defined. The alignment of the addressable storage unit is unspecified. 

the standards deferring to the target platform's abi for bitfield ordering

The standards do not defer except to the compiler writer, who is (knowing the type of person that works on compilers) probably a chainsaw-wielding lunatic.

There are some portable uses of bitfields, e.g.

struct {
   char *s;
   size_t l: 4; /* length will NEVER >15 */
   size_t a: 4; /* allocation will NEVER >15 */
   char: 0;
   bool is_neat: 1;
} str;

or even

struct {
   union {
      struct {
         char *s;
         size_t l: 4;
         size_t a: 4;
      } d;
      char *s;
   } str;
   bool is_tracked: 1;
} str;

to have two possible layouts for keeping track of strings or whatever. But this:

union Ieee754 {
   struct {
      unsigned int sign: 1;
      unsigned int exponent: 11;
      unsigned int fraction: 52;
   } fields;
   double literal;
};

while probably working on x64, only works if

  • the CPU designers didn't deviate from IEEE 754 (likely, but less likely than you'd think)
  • double is 64b
  • the compiler didn't move the bitfields around (which it is allowed to do)

and while the first two are sloppy assumptions (especially the latter, which seems obvious to me, perhaps because I have used multiple C compilers), the third one is a subtler thing. That means this code

#define ui unsigned int
enum { X_A = 0x1; X_B=0x2; X_C=0x4; X_D=0x8; };
union { struct { ui a: 1; ui b: 1; ui c: 1; ui d: 1; } fields; ui literal: 4; } x;
int main() { x.literal = X_A; return x.fields.a; }

could return 0 or 1, unpredictably.

So bitfields within unions can theoretically be used just fine, it's just that their typical use (chopping a variable into fields) results in very subtly buggy code, and their correct use is not very practical.


It is because of ordering. --- Long winded explanation: From [the C89 draft: 3.5.2.1 Structure and union specifiers](http://jfxpt.com/library/c89-draft.html#3.5.2.1) ``` An implementation may allocate any addressable storage unit large enough to hold a bit-field. If enough space remains, a bit-field that immediately follows another bit-field in a structure shall be packed into adjacent bits of the same unit. If insufficient space remains, whether a bit-field that does not fit is put into the next unit or overlaps adjacent units is implementation-defined. The order of allocation of bit-fields within a unit (high-order to low-order or low-order to high-order) is implementation-defined. The alignment of the addressable storage unit is unspecified. ``` > the standards deferring to the target platform's abi for bitfield ordering The standards do not defer except to the compiler writer, who is (knowing the type of person that works on compilers) probably a chainsaw-wielding lunatic. There are some portable uses of bitfields, e.g. ```c struct { char *s; size_t l: 4; /* length will NEVER >15 */ size_t a: 4; /* allocation will NEVER >15 */ char: 0; bool is_neat: 1; } str; ``` or even ```c struct { union { struct { char *s; size_t l: 4; size_t a: 4; } d; char *s; } str; bool is_tracked: 1; } str; ``` to have two possible layouts for keeping track of strings or whatever. But this: ```c union Ieee754 { struct { unsigned int sign: 1; unsigned int exponent: 11; unsigned int fraction: 52; } fields; double literal; }; ``` while probably working on x64, only works if - the CPU designers didn't deviate from IEEE 754 (likely, but less likely than you'd think) - `double` is 64b - the compiler didn't move the bitfields around (which it is allowed to do) and while the first two are sloppy assumptions (especially the latter, which seems obvious to me, perhaps because I have used multiple C compilers), the third one is a subtler thing. That means this code ```c #define ui unsigned int enum { X_A = 0x1; X_B=0x2; X_C=0x4; X_D=0x8; }; union { struct { ui a: 1; ui b: 1; ui c: 1; ui d: 1; } fields; ui literal: 4; } x; int main() { x.literal = X_A; return x.fields.a; } ``` could return `0` or `1`, unpredictably. So bitfields within unions can theoretically be used just fine, it's just that their typical use (chopping a variable into fields) results in very subtly buggy code, and their correct use is not very practical. ---
Author
Owner

got it, thanks for clarifying. any thoughts on moving it to the "Avoid" section and adding a reference to cppreference's bit-field page?
cc @emma

(sorry for the premature comment+close misclick)

got it, thanks for clarifying. any thoughts on moving it to the "Avoid" section and adding a reference to [cppreference's bit-field page](https://en.cppreference.com/w/c/language/bit_field)? cc @emma (sorry for the premature comment+close misclick)
silt closed this issue 2025-04-06 01:39:36 -06:00
silt reopened this issue 2025-04-06 01:41:25 -06:00
Owner

I'm fine with "avoid" rather than a strict ban, though I would prefer we either cite a standard/draft directly.

I'm fine with "avoid" rather than a strict ban, though I would prefer we either cite a standard/draft directly.
Author
Owner

i agree that citing an actual standard would be preferable. i only mentioned cppreference because the actual targeted standard by bonsai has never been entirely clear to me and it saves us the effort of tracking down the right spot again if the target ever does change.

i agree that citing an actual standard would be preferable. i only mentioned cppreference because the actual targeted standard by bonsai has never been entirely clear to me and it saves us the effort of tracking down the right spot again if the target ever does change.
Owner

Just give me the standards hyperlink and I will make it happen.

Just give me the standards hyperlink and I will make it happen.
Owner

I wondered if targeted standard matters, but the newest draft has roughly the same language:

An implementation may allocate any addressable storage unit large enough to hold a bit-field. If enough space remains, a bit-field that immediately follows another bit-field in a structure shall be packed into adjacent bits of the same unit. If insufficient space remains, whether a bit-field that does not fit is put into the next unit or overlaps adjacent units is implementation-defined. The order of allocation of bit-fields within a unit (high-order to low-order or low-order to high-order) is implementation-defined. The alignment of the addressable storage unit is unspecified.

(C standard draft N3220, section 6.7.3.2, paragraph 13)

I don't remember what C standard we target. I've made a note to get back to this and decide on a hyperlink to use.

I wondered if targeted standard matters, but the newest draft has roughly the same language: >An implementation may allocate any addressable storage unit large enough to hold a bit-field. If enough space remains, a bit-field that immediately follows another bit-field in a structure shall be packed into adjacent bits of the same unit. If insufficient space remains, whether a bit-field that does not fit is put into the next unit or overlaps adjacent units is implementation-defined. The order of allocation of bit-fields within a unit (high-order to low-order or low-order to high-order) is implementation-defined. The alignment of the addressable storage unit is unspecified. ([C standard draft N3220](https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3220.pdf), section 6.7.3.2, paragraph 13) I don't remember what C standard we target. I've made a note to get back to this and decide on a hyperlink to use.
Sign in to join this conversation.
No Milestone
No project
No Assignees
3 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: bonsai/harakit#170
No description provided.