Topics

Pulling Arduino data apart

Jack, W8TEE
 

I've been going through some of the BITX/µBITX code and see a number of places where bit shifting and/or masking is being done to get rid of "unwanted" bits of data. Doing this means you have solved the "Endian" problem for the microcontroller being used. For example, on some processors, the number 5 stored as an int data type appears in memory as: 00000000 00000101. On other processors, it is stored as: 00000101 00000000. If you are bit shifting or masking, you need to know the "Endian" order for the bytes. On an Arduino, you are given two functions: lowByte() and highByte() to allow you to extract the order to determine how the data are organized in an int. Knowing the byte order can be important, such as transferring binary data from one place to another over a serial link. But what if you are working with a long data type? The lowByte() and highByte() functions don't work since a long is 4 bytes. The solution is to use a C structure called a union. Think of the union as a buffer; a small chunk of memory. I think of it as a bucket, the size of which is determined by the biggest piece of data that will be stored in the union. For example:

union {
  byte array[4];
  byte b;
  char c;
  int i;
  long L;
  float f;
} myUnion;

The long and the float data types are the biggest in the union, so the C compiler will allocate 4 bytes to the union named myUnion. It's a 4-byte bucket. If you want to, you can pour those four bytes into another long variable, or a float variable, or you could use a 1-byte dipper and spoon the 4 bytes into a 4-byte array. Your choice. Suppose, for some reason, the long in the union (i.e., the union member named L) needs to hold an RGB value for a color display. Further suppose you need to know the byte that isn't used as a color value. The following short program will show you how a union works. Note how I can fill the union with any data type, but extract it as a byte array. This makes it easy to observe the byte-order of the data. unions are a great C structure to understand as it gives you a portable way to determine the byte order of the data for a given processor.


union {
  byte array[4];
  byte b;
  char c;
  int i;
  long L;
  float f;
} myUnion;      // Define a union

void setup() {
  byte b = 255;      // create a list of variables...
  char c = 'A';
  int i = 5;
  long L = 10000000L;
  float f = 3.14;

  Serial.begin(9600);

  Serial.print("low byte = ");          // the non-portable way to see the byte order of  variable i
  Serial.print(lowByte(i));
  Serial.print("   high byte = ");
  Serial.println(highByte(i));

  myUnion.i = i;                              // Stuff the int into the union, then look at it
  Serial.print("Union:  low byte = ");
  Serial.print(myUnion.array[0]);
  Serial.print("   high byte = ");
  Serial.println(myUnion.array[1]);

  myUnion.b = b;                        // same for a byte, in DEC and HEX
  Serial.print("Union: byte = ");
  Serial.println(myUnion.array[0]);
  Serial.print("  byte (hex) = ");
  Serial.println(myUnion.array[0], HEX);

  myUnion.L = L;                           // same for a long
  Serial.print("Union: long[0] = ");
  Serial.print(myUnion.array[0]);
  Serial.print("    long[1] = ");
  Serial.print(myUnion.array[1]);
  Serial.print("    long[2] = ");
  Serial.print(myUnion.array[2]);
  Serial.print("  long[3] = ");
  Serial.println(myUnion.array[3]);
                                                         // Do some others on your own...
}

void loop() {
}

Jerry Gaffke
 

You definitely need to be endian-aware when coding in assembly language.
But I can code all day in C without worrying about big vs little endian.

If you have a 32 bit integer and want to send the 8 msb's over a serial link, do something like this:
    sendbyte(data32>>24);
I'm assuming sendbyte() accepts an 8 bit argument, so no need to explicitly strip off unused bits.
This might make it more obvious:
    sendbyte((data32>>24)&0xff);
That code is bulletproof in C, should work the same on a PDP-8 with 12 bit hardware 
as it does on the the latest stuff from Intel.  Unions will undoubtedly get packed differently
on a PDP-8, in case anybody still cares these days.  C does not specify that unions get packed
in any particular manner, that's true even on an 8/16/32/64 bit machine.

Unions almost always work, might be a good idea on a minimalist machine like the ATMega328P.
Shifting a 32 bit integer is relatively painful with an instruction set that can only shift an 8 bit word
left or right by one bit at a time.  However, while it might take 100 clock ticks to execute, the code to
shift a 32 bit word is instantiated only once as a function call, and 100 ticks happens much faster
than sending a single byte out through the I2C interface.  (100 ticks is a wild guess.)

If 100 ticks is too painful, then it's time to sack the Nano and move on to the $2 STM32 Blue Pill.

Take a look at the si5351 routines in Allard's code and on the uBitx.
Totally endian agnostic.
I'm looking forward to trying them out on a PDP-8 someday.

Jerry, KE7ER



On Thu, Mar 8, 2018 at 07:51 am, Jack Purdum wrote:
.....  If you are bit shifting or masking, you need to know the "Endian" order for the bytes. On an Arduino, you are given two functions: lowByte() and highByte() to allow you to extract the order to determine how the data are organized in an int. Knowing the byte order can be important, such as transferring binary data from one place to another over a serial link. But what if you are working with a long data type? The lowByte() and highByte() functions don't work since a long is 4 bytes. The solution is to use a C structure called a union. Think of the union as a buffer; a small chunk of memory. I think of it as a bucket, the size of which is determined by the biggest piece of data that will be stored in the union. For example: .....

Jerry Gaffke
 

It would be interesting to see what the Arduino compiler does with this code:
    sendbyte(data32>>24);

That sort of thing is going to happen a lot in the code it sees, at least if I am writing it.
With optimization turned on, the compiler should recognize that everything remains
on byte boundaries and implement it as something very much like Jack's union trick. 
Should do this properly for big or little endian machines without me thinking about it.
No bit shifts.

Jerry



On Thu, Mar 8, 2018 at 08:57 am, Jerry Gaffke wrote:
If you have a 32 bit integer and want to send the 8 msb's over a serial link, do something like this:
    sendbyte(data32>>24);

Jack, W8TEE
 

OK, so what happens if you send an int from Allard's code to a 64 Intel I7? Compiler vendors are completely free to decide the byte order of all of their data types. My software company used to produced C programming tools (compilers, editors, assemblers, linkers) for both 8 bit and 16 bit machines. We made sure our Endians were the same, simply from a marketing standpoint. However, sending binary data from a 8 bit compiler to someone else's 16 bit compiler has no guarantee of working. Data structure packing and endian use is totally up to the compiler vendor. Indeed, there was one 8-bit MSDOS compiler vendor who chose to use -1 for NULL. The old XJ11 C standards committee made no restrictions on such things and the are defined as "implimentation dependent". That's why you should use NULL instead of 0 when checking string lengths. Now you could send the data as ASCII, but then you slow the transmission because values 0 through 255 only take 1 binary byte, but up to 3 ASCII bytes.

Your statement that "I can code all day in C without worrying about the big vs little endian" issue is only true at the source code level. If you are sending binary data, which is what I said in my post, you very definitely need to worry about the endian problem. As to 100 clock ticks, that seems high. An ldi assembler instruction take 3 clock cycles or 12 for a 32 bit long. Each rotate left (or right) is a single clock cycle, so I get 42 clock cycles to rotate a long off the map, and that includes the time to load it. So 0.000002625 of a second seems pretty quick Still, that's neither here nor there.

Your sendbyte() example, the sendbyte(data32>>24) leaves the high byte for sending. If you don't know the endian order, how do you know you didn't just rotate the data of interest onto the floor? The second example is no different. Indeed, since the shift right operator "backfills" with 0's and has higher precedence that the bitwise AND operator, you example always sends 0 to the function. Why bother?

Nope, there are times when you need to know the endian order and you can use a union to find it out. It can also be used to send binary data for a serial connection to a total different platform and still have it work. Knowing how to use a union is a good thing.
Jack, W8TEE


From: Jerry Gaffke via Groups.Io <jgaffke@...>
To: BITX20@groups.io
Sent: Thursday, March 8, 2018 11:57 AM
Subject: Re: [BITX20] Pulling Arduino data apart

You definitely need to be endian-aware when coding in assembly language.
But I can code all day in C without worrying about big vs little endian.

If you have a 32 bit integer and want to send the 8 msb's over a serial link, do something like this:
    sendbyte(data32>>24);
I'm assuming sendbyte() accepts an 8 bit argument, so no need to explicitly strip off unused bits.
This might make it more obvious:
    sendbyte((data32>>24)&0xff);
That code is bulletproof in C, should work the same on a PDP-8 with 12 bit hardware 
as it does on the the latest stuff from Intel.  Unions will undoubtedly get packed differently
on a PDP-8, in case anybody still cares these days.  C does not specify that unions get packed
in any particular manner, that's true even on an 8/16/32/64 bit machine.

Unions almost always work, might be a good idea on a minimalist machine like the ATMega328P.
Shifting a 32 bit integer is relatively painful with an instruction set that can only shift an 8 bit word
left or right by one bit at a time.  However, while it might take 100 clock ticks to execute, the code to
shift a 32 bit word is instantiated only once as a function call, and 100 ticks happens much faster
than sending a single byte out through the I2C interface.  (100 ticks is a wild guess.)

If 100 ticks is too painful, then it's time to sack the Nano and move on to the $2 STM32 Blue Pill.

Take a look at the si5351 routines in Allard's code and on the uBitx.
Totally endian agnostic.
I'm looking forward to trying them out on a PDP-8 someday.

Jerry, KE7ER



On Thu, Mar 8, 2018 at 07:51 am, Jack Purdum wrote:
.....  If you are bit shifting or masking, you need to know the "Endian" order for the bytes. On an Arduino, you are given two functions: lowByte() and highByte() to allow you to extract the order to determine how the data are organized in an int. Knowing the byte order can be important, such as transferring binary data from one place to another over a serial link. But what if you are working with a long data type? The lowByte() and highByte() functions don't work since a long is 4 bytes. The solution is to use a C structure called a union. Think of the union as a buffer; a small chunk of memory. I think of it as a bucket, the size of which is determined by the biggest piece of data that will be stored in the union. For example: .....



Virus-free. www.avast.com

Jack, W8TEE
 

Easy enough to do:

union {
  byte array[4];
  byte b;
  char c;
  int i;
  long L;
  float f;
} myUnion;

void setup() {
  int i;
  long val;

  myUnion.array[0] = 1;
  myUnion.array[1] = 2;
  myUnion.array[2] = 3;
  myUnion.array[3] = 4;

  Serial.begin(9600);

  for (i = 0; i < 4; i++) {
    Serial.print("array[");
    Serial.print(i);
    Serial.print("] = ");
    Serial.println(myUnion.array[i], HEX);
  }
  Serial.print("in HEX, L = ");
  Serial.print(myUnion.L, HEX);
  Serial.print(" or in decimal, L = ");
  Serial.println(myUnion.L);

  sendByte(val >> 24);
}

void sendByte(byte num)
{
  Serial.print("In sendByte() num = ");
  Serial.println(num, HEX);
}

void loop() {
}

The output is:

array[0] = 1
array[1] = 2
array[2] = 3
array[3] = 4
in HEX, L = 04030201 or in decimal, L = 67305985
In sendByte() num = 4

This shows that a long is stored in big endian format (i.e., most significant byte to least significant).

Jack, W8TEE


From: Jerry Gaffke via Groups.Io <jgaffke@...>
To: BITX20@groups.io
Sent: Thursday, March 8, 2018 12:14 PM
Subject: Re: [BITX20] Pulling Arduino data apart

It would be interesting to see what the Arduino compiler does with this code:
    sendbyte(data32>>24);

That sort of thing is going to happen a lot in the code it sees, at least if I am writing it.
With optimization turned on, the compiler should recognize that everything remains
on byte boundaries and implement it as something very much like Jack's union trick. 
Should do this properly for big or little endian machines without me thinking about it.
No bit shifts.

Jerry



On Thu, Mar 8, 2018 at 08:57 am, Jerry Gaffke wrote:
If you have a 32 bit integer and want to send the 8 msb's over a serial link, do something like this:
    sendbyte(data32>>24);



Virus-free. www.avast.com

Jerry Gaffke
 

Consider a serial link, perhaps a UART, we wish to send 32 bit integers over that link.
The spec for the serial link should define if those integers go over as big or little endian.
Let's assume the serial link spec says it is little endian, and that each character has 8 bits.

Here's C code for machine A to send a 32 bit integer as a sequence of four bytes in little endian order::
    sendbyte(data32);  sendbyte(data32>>8);  sendbyte(data32>>16);  sendbyte(data32>>24);
And code for machine B to receive that 32 bit integer (assumes getbyte() returnes an unsigned 8 bit integer):
    data32=getbyte();  data32|=getbyte()<<8;  data32|=getbyte()<<16; data32|=getbyte<<24;

This C code doesn't care if the machine it is on is big endian or little endian.
However the C code on both ends must be aware of the integer size it is dealing with, be it 8,16,32 bits.
So the serial link spec may need to fully define the format of the data stream, not just say whether
it is big or little endian.

Plenty of C code out there that is not endian agnostic like that, and I'm fine with it.
Those 24 bit shifts are expensive if your compiler is turned down to dumb,
a typecast of an int32 pointer to an array of bytes may look like a more efficient way to code.
Most machines these days are little endian with 8/16/32/64 bit word sizes, and I'm fine
with code that assumes this is the case.  (There are some big endian machines though.)

But if you are trying to code for a machine that could be either big or little endian
or might have some weird word length in hardware, I'm of the opinion that the above
is the best way to do it.  If nothing else, it's very easy to read.
 
Endian-ness has even more repercussions when creating hardware.
I always found that working with the big-endian VME bus was a PITA,
the extra shifts were rather expensive back in the days of TTL,

It seems obvious at first glance, big-endian means we send over the most significant byte first,
and little endian means we send over the least significant byte first.
But implementation of this in a mixed environment can become a real head scratcher.
Especially if the implementation is not thoroughly thought out before coding starts.


>  Your sendbyte() example, the sendbyte(data32>>24) leaves the high byte for sending.
>  If you don't know the endian order, how do you know you didn't just rotate the data of interest onto the floor?

I don't quite follow.
sendbyte(data32>>24)   will always send the 8 msb's of that 32 bit word, regardless of what machine you are on.
I know I didn't rotate the data of interest onto the floor because I know that my data was in the 8 msb's of the 32 bit word. 

> The second example is no different. Indeed, since the shift right operator "backfills" with 0's
> and has higher precedence that the bitwise AND operator, you example always sends 0 to the function. Why bother? 

 
Hmm.  This second example?
    sendbyte((data32>>24)&0xff);
Only difference from the previous is that it makes it clear to the reader
that we are only interested in sending 8 bits.  And might save us from 
some weird bug if sendbyte() was not defined as an unsigned 8 bit int.
Looks fine to me.

Jerry


On Thu, Mar 8, 2018 at 10:34 am, Jack Purdum wrote:
OK, so what happens if you send an int from Allard's code to a 64 Intel I7? Compiler vendors are completely free to decide the byte order of all of their data types. My software company used to produced C programming tools (compilers, editors, assemblers, linkers) for both 8 bit and 16 bit machines. We made sure our Endians were the same, simply from a marketing standpoint. However, sending binary data from a 8 bit compiler to someone else's 16 bit compiler has no guarantee of working. Data structure packing and endian use is totally up to the compiler vendor. Indeed, there was one 8-bit MSDOS compiler vendor who chose to use -1 for NULL. The old XJ11 C standards committee made no restrictions on such things and the are defined as "implimentation dependent". That's why you should use NULL instead of 0 when checking string lengths. Now you could send the data as ASCII, but then you slow the transmission because values 0 through 255 only take 1 binary byte, but up to 3 ASCII bytes.
 
Your statement that "I can code all day in C without worrying about the big vs little endian" issue is only true at the source code level. If you are sending binary data, which is what I said in my post, you very definitely need to worry about the endian problem. As to 100 clock ticks, that seems high. An ldi assembler instruction take 3 clock cycles or 12 for a 32 bit long. Each rotate left (or right) is a single clock cycle, so I get 42 clock cycles to rotate a long off the map, and that includes the time to load it. So 0.000002625 of a second seems pretty quick Still, that's neither here nor there.
 
Your sendbyte() example, the sendbyte(data32>>24) leaves the high byte for sending. If you don't know the endian order, how do you know you didn't just rotate the data of interest onto the floor? The second example is no different. Indeed, since the shift right operator "backfills" with 0's and has higher precedence that the bitwise AND operator, you example always sends 0 to the function. Why bother?
 
Nope, there are times when you need to know the endian order and you can use a union to find it out. It can also be used to send binary data for a serial connection to a total different platform and still have it work. Knowing how to use a union is a good thing.
 
 

Jerry Gaffke
 

The question was, is the compiler smart enough to emit assembly language 
that just grabs the appropriate bytes without messing with any bit shifts.
It should.
If you have the correct compiler optimization flags set.


On Thu, Mar 8, 2018 at 11:36 am, Jack Purdum wrote:
It would be interesting to see what the Arduino compiler does with this code:
    sendbyte(data32>>24);

Neil Martinsen-Burrell
 

On Thu, Mar 8, 2018 at 1:42 PM, Jerry Gaffke via Groups.Io <jgaffke@...> wrote:
Here's C code for machine A to send a 32 bit integer as a sequence of four bytes in little endian order::
    sendbyte(data32);  sendbyte(data32>>8);  sendbyte(data32>>16);  sendbyte(data32>>24);
And code for machine B to receive that 32 bit integer (assumes getbyte() returnes an unsigned 8 bit integer):
    data32=getbyte();  data32|=getbyte()<<8;  data32|=getbyte()<<16; data32|=getbyte<<24;

This C code doesn't care if the machine it is on is big endian or little endian.
However the C code on both ends must be aware of the integer size it is dealing with, be it 8,16,32 bits.
So the serial link spec may need to fully define the format of the data stream, not just say whether
it is big or little endian.>  Your sendbyte() example, the sendbyte(data32>>24) leaves the high byte for sending.

>  If you don't know the endian order, how do you know you didn't just rotate the data of interest onto the floor?

I don't quite follow.
sendbyte(data32>>24)   will always send the 8 msb's of that 32 bit word, regardless of what machine you are on.
I know I didn't rotate the data of interest onto the floor because I know that my data was in the 8 msb's of the 32 bit word. 
connection to a total different platform and still have it work. Knowing how to use a union is a good thing.

It seems to me like Jerry is trying to point out that the left- and right-shift operators are endian-agnostic. They work in terms of the mathematically more-significant and less-significant directions without concern for the byte-order that an architecture uses to store its integers. so 256>>8 never gives a result of 0 on any architecture, even if the internal representation is 0x00 0x01.

-Neil N0FN

Jack, W8TEE
 

I did provide an example that you can use to investigate it. If you're interested, just take the intermediate assembler file and take a look at it.

Jack, W8TEE



From: Jerry Gaffke via Groups.Io <jgaffke@...>
To: BITX20@groups.io
Sent: Thursday, March 8, 2018 2:44 PM
Subject: Re: [BITX20] Pulling Arduino data apart

The question was, is the compiler smart enough to emit assembly language 
that just grabs the appropriate bytes without messing with any bit shifts.
It should.
If you have the correct compiler optimization flags set.


On Thu, Mar 8, 2018 at 11:36 am, Jack Purdum wrote:
It would be interesting to see what the Arduino compiler does with this code:
    sendbyte(data32>>24);


Jack, W8TEE
 

The spec for the serial link should define if those integers go over as big or little endian.
...
So the serial link spec may need to fully define the format of the data stream, not just say whether
it is big or little endian.
Absolutely agree, which was what I was saying from the very start. Allard's code can be endian agnostic because it runs in a single known environment. I haven't looked at his code for some time, but I don't know if there is anyplace in the code where he needs to break apart a basic data type.

My comment about putting bits on the floor meant that you had to know something about the byte order, otherwise why are you interested only in the high byte. Your code:

    sendbyte((data32>>24)&0xff);

to send a byte works great if the data is big endian:

        01010101 00000000 00000000 00000000.         // Yellow is the byte of interest

However, if you don't know the byte order and it is:

        00000000 00000000 00000000 01010101

Your code would throw the relevant data on the floor. Your code is only safe if you know the order. A union is a simple way to determine that order.

Jack, W8TEE






From: Jerry Gaffke via Groups.Io <jgaffke@...>
To: BITX20@groups.io
Sent: Thursday, March 8, 2018 2:42 PM
Subject: Re: [BITX20] Pulling Arduino data apart

Consider a serial link, perhaps a UART, we wish to send 32 bit integers over that link.
The spec for the serial link should define if those integers go over as big or little endian.
Let's assume the serial link spec says it is little endian, and that each character has 8 bits.

Here's C code for machine A to send a 32 bit integer as a sequence of four bytes in little endian order::
    sendbyte(data32);  sendbyte(data32>>8);  sendbyte(data32>>16);  sendbyte(data32>>24);
And code for machine B to receive that 32 bit integer (assumes getbyte() returnes an unsigned 8 bit integer):
    data32=getbyte();  data32|=getbyte()<<8;  data32|=getbyte()<<16; data32|=getbyte<<24;

This C code doesn't care if the machine it is on is big endian or little endian.
However the C code on both ends must be aware of the integer size it is dealing with, be it 8,16,32 bits.
So the serial link spec may need to fully define the format of the data stream, not just say whether
it is big or little endian.

Plenty of C code out there that is not endian agnostic like that, and I'm fine with it.
Those 24 bit shifts are expensive if your compiler is turned down to dumb,
a typecast of an int32 pointer to an array of bytes may look like a more efficient way to code.
Most machines these days are little endian with 8/16/32/64 bit word sizes, and I'm fine
with code that assumes this is the case.  (There are some big endian machines though.)

But if you are trying to code for a machine that could be either big or little endian
or might have some weird word length in hardware, I'm of the opinion that the above
is the best way to do it.  If nothing else, it's very easy to read.
 
Endian-ness has even more repercussions when creating hardware.
I always found that working with the big-endian VME bus was a PITA,
the extra shifts were rather expensive back in the days of TTL,

It seems obvious at first glance, big-endian means we send over the most significant byte first,
and little endian means we send over the least significant byte first.
But implementation of this in a mixed environment can become a real head scratcher.
Especially if the implementation is not thoroughly thought out before coding starts.

>  Your sendbyte() example, the sendbyte(data32>>24) leaves the high byte for sending.
>  If you don't know the endian order, how do you know you didn't just rotate the data of interest onto the floor?

I don't quite follow.
sendbyte(data32>>24)   will always send the 8 msb's of that 32 bit word, regardless of what machine you are on.
I know I didn't rotate the data of interest onto the floor because I know that my data was in the 8 msb's of the 32 bit word. 

> The second example is no different. Indeed, since the shift right operator "backfills" with 0's
> and has higher precedence that the bitwise AND operator, you example always sends 0 to the function. Why bother? 
 
Hmm.  This second example?
    sendbyte((data32>>24)&0xff);
Only difference from the previous is that it makes it clear to the reader
that we are only interested in sending 8 bits.  And might save us from 
some weird bug if sendbyte() was not defined as an unsigned 8 bit int.
Looks fine to me.

Jerry


On Thu, Mar 8, 2018 at 10:34 am, Jack Purdum wrote:
OK, so what happens if you send an int from Allard's code to a 64 Intel I7? Compiler vendors are completely free to decide the byte order of all of their data types. My software company used to produced C programming tools (compilers, editors, assemblers, linkers) for both 8 bit and 16 bit machines. We made sure our Endians were the same, simply from a marketing standpoint. However, sending binary data from a 8 bit compiler to someone else's 16 bit compiler has no guarantee of working. Data structure packing and endian use is totally up to the compiler vendor. Indeed, there was one 8-bit MSDOS compiler vendor who chose to use -1 for NULL. The old XJ11 C standards committee made no restrictions on such things and the are defined as "implimentation dependent". That's why you should use NULL instead of 0 when checking string lengths. Now you could send the data as ASCII, but then you slow the transmission because values 0 through 255 only take 1 binary byte, but up to 3 ASCII bytes.
 
Your statement that "I can code all day in C without worrying about the big vs little endian" issue is only true at the source code level. If you are sending binary data, which is what I said in my post, you very definitely need to worry about the endian problem. As to 100 clock ticks, that seems high. An ldi assembler instruction take 3 clock cycles or 12 for a 32 bit long. Each rotate left (or right) is a single clock cycle, so I get 42 clock cycles to rotate a long off the map, and that includes the time to load it. So 0.000002625 of a second seems pretty quick Still, that's neither here nor there.
 
Your sendbyte() example, the sendbyte(data32>>24) leaves the high byte for sending. If you don't know the endian order, how do you know you didn't just rotate the data of interest onto the floor? The second example is no different. Indeed, since the shift right operator "backfills" with 0's and has higher precedence that the bitwise AND operator, you example always sends 0 to the function. Why bother?
 
Nope, there are times when you need to know the endian order and you can use a union to find it out. It can also be used to send binary data for a serial connection to a total different platform and still have it work. Knowing how to use a union is a good thing.
 
 


Jerry Gaffke
 

The example I would look at is 
    sendbyte(data32>>24);

I'm pretty sure that in most production environments, there would be no bit shifts in the assembly code.
With the default Arduino IDE, all bets are off.
Regardless, even if it did blindly do all those shifts, the way we handle the i2c interface
is an order of magnitude worse with regard to execution time.

Neil N0FN wrote:
> It seems to me like Jerry is trying to point out that the left- and right-shift operators are endian-agnostic

Indeed.
And a whole lot easier to read.
At least for me.

Jerry



On Thu, Mar 8, 2018 at 11:55 am, Jack Purdum wrote:
I did provide an example that you can use to investigate it. If you're interested, just take the intermediate assembler file and take a look at it.
 

Jerry Gaffke
 

See my comments inline below.

On Thu, Mar 8, 2018 at 12:13 pm, Jack Purdum wrote:
Absolutely agree, which was what I was saying from the very start. Allard's code can be endian agnostic because it runs in a single known environment. I haven't looked at his code for some time, but I don't know if there is anyplace in the code where he needs to break apart a basic data type.
If we are talking about the si5351bx routines I previously referenced that are in Allard's Bitx40 code,
they bust up some large integers and write them to specific bitfields of arbitrary size
in the i2c register set of the Si5351.  That's worse then just big/little endian, as those
fields are often not an even 8 bits.  I know, as I wrote the code.
    https://groups.io/g/BITX20/message/28977 

Your statements below regarding how bit shifts work on a 32 bit integer
look wrong to me.  Integer operations like that are endian agnostic,
they don't care how that 32 bit register might get stored in main memory.


 
 
My comment about putting bits on the floor meant that you had to know something about the byte order, otherwise why are you interested only in the high byte. Your code:
 
    sendbyte((data32>>24)&0xff);
 
to send a byte works great if the data is big endian:
 
        01010101 00000000 00000000 00000000.         // Yellow is the byte of interest
 
However, if you don't know the byte order and it is:
 
        00000000 00000000 00000000 01010101
 
Your code would throw the relevant data on the floor. Your code is only safe if you know the order. A union is a simple way to determine that order.
 
Jack, W8TEE

Jack, W8TEE
 

They are agnostic on the host machine. That's correct. I was talking about a binary transfer between machines where the two may have different sized integers or ordering or both. The example I showed was from a big endian to a little endian machine. When binary data comes into a program from some other source, the receiving machine doesn't necessarily know the order. So I could send a binary long from the arduino that looks like this:

 01010101 00000000 00000000 00000000
but the host machine must figure out what those 4bytes mean. If it uses a little endian long, it would need to reverse the byte order for the long to be properly represented on the receiving end.

The code that Allard wrote or your Si5351 code don't need to worry about it because all of the code is processed by the same GCC compiler using the same code generator. However, send an Arduino long in binary format to a Desmet 8080 MSDOS compiler and I can guarantee you the data won't work.

We've wasted enough bandwidth on this. I think unions are a great way to learn how data are organized for a given compiler and are well-worth knowing about. Anyone who doesn't think so can easily ignore them.

Jack, W8TEE


From: Jerry Gaffke via Groups.Io <jgaffke@...>
To: BITX20@groups.io
Sent: Thursday, March 8, 2018 3:27 PM
Subject: Re: [BITX20] Pulling Arduino data apart

See my comments inline below.

On Thu, Mar 8, 2018 at 12:13 pm, Jack Purdum wrote:
Absolutely agree, which was what I was saying from the very start. Allard's code can be endian agnostic because it runs in a single known environment. I haven't looked at his code for some time, but I don't know if there is anyplace in the code where he needs to break apart a basic data type.
If we are talking about the si5351bx routines I previously referenced that are in Allard's Bitx40 code,
they bust up some large integers and write them to specific bitfields of arbitrary size
in the i2c register set of the Si5351.  That's worse then just big/little endian, as those
fields are often not an even 8 bits.  I know, as I wrote the code.
    https://groups.io/g/BITX20/message/28977 

Your statements below regarding how bit shifts work on a 32 bit integer
look wrong to me.  Integer operations like that are endian agnostic,
they don't care how that 32 bit register might get stored in main memory.


 
 
My comment about putting bits on the floor meant that you had to know something about the byte order, otherwise why are you interested only in the high byte. Your code:
 
    sendbyte((data32>>24)&0xff);
 
to send a byte works great if the data is big endian:
 
        01010101 00000000 00000000 00000000.         // Yellow is the byte of interest
 
However, if you don't know the byte order and it is:
 
        00000000 00000000 00000000 01010101
 
Your code would throw the relevant data on the floor. Your code is only safe if you know the order. A union is a simple way to determine that order.
 
Jack, W8TEE


pat griffin
 

Guys, I for one have enjoyed reading this exchange.  Good stuff even though I had to brush away some cobwebs from the synapses.  Bits on the floor, assemblers, disassemblers. Those were the days. 

Three years ago we bought IBM Power8 running Red Hat and, IBM being IBM, it is big endian unlike the rest of Red Hat world.  We had to scramble with a few pieces of code and I hadn't though those thoughts in years.


Thanks


Pat  AA4PG
http://www.cahabatechnology.com


From: BITX20@groups.io <BITX20@groups.io> on behalf of Jack Purdum via Groups.Io <jjpurdum@...>
Sent: Thursday, March 8, 2018 2:53:37 PM
To: BITX20@groups.io
Subject: Re: [BITX20] Pulling Arduino data apart
 
They are agnostic on the host machine. That's correct. I was talking about a binary transfer between machines where the two may have different sized integers or ordering or both. The example I showed was from a big endian to a little endian machine. When binary data comes into a program from some other source, the receiving machine doesn't necessarily know the order. So I could send a binary long from the arduino that looks like this:

 01010101 00000000 00000000 00000000
but the host machine must figure out what those 4bytes mean. If it uses a little endian long, it would need to reverse the byte order for the long to be properly represented on the receiving end.

The code that Allard wrote or your Si5351 code don't need to worry about it because all of the code is processed by the same GCC compiler using the same code generator. However, send an Arduino long in binary format to a Desmet 8080 MSDOS compiler and I can guarantee you the data won't work.

We've wasted enough bandwidth on this. I think unions are a great way to learn how data are organized for a given compiler and are well-worth knowing about. Anyone who doesn't think so can easily ignore them.

Jack, W8TEE


From: Jerry Gaffke via Groups.Io <jgaffke@...>
To: BITX20@groups.io
Sent: Thursday, March 8, 2018 3:27 PM
Subject: Re: [BITX20] Pulling Arduino data apart

See my comments inline below.

On Thu, Mar 8, 2018 at 12:13 pm, Jack Purdum wrote:
Absolutely agree, which was what I was saying from the very start. Allard's code can be endian agnostic because it runs in a single known environment. I haven't looked at his code for some time, but I don't know if there is anyplace in the code where he needs to break apart a basic data type.
If we are talking about the si5351bx routines I previously referenced that are in Allard's Bitx40 code,
they bust up some large integers and write them to specific bitfields of arbitrary size
in the i2c register set of the Si5351.  That's worse then just big/little endian, as those
fields are often not an even 8 bits.  I know, as I wrote the code.
    https://groups.io/g/BITX20/message/28977 

Your statements below regarding how bit shifts work on a 32 bit integer
look wrong to me.  Integer operations like that are endian agnostic,
they don't care how that 32 bit register might get stored in main memory.


 
 
My comment about putting bits on the floor meant that you had to know something about the byte order, otherwise why are you interested only in the high byte. Your code:
 
    sendbyte((data32>>24)&0xff);
 
to send a byte works great if the data is big endian:
 
        01010101 00000000 00000000 00000000.         // Yellow is the byte of interest
 
However, if you don't know the byte order and it is:
 
        00000000 00000000 00000000 01010101
 
Your code would throw the relevant data on the floor. Your code is only safe if you know the order. A union is a simple way to determine that order.
 
Jack, W8TEE


Jerry Gaffke
 

Agreed, we've wasted too much time on something not at the top of our priorities here.
Anyone wishing to continue this discussion is welcome to send me a private message.

In parting, I believe that once the data is in the CPU, in this case stored as a 32 bit integer
in a register, endian-ness is not a factor.  This code looks correct to me:

Here's C code for machine A to send a 32 bit integer as a sequence of four bytes in little endian order::
    sendbyte(data32);  sendbyte(data32>>8);  sendbyte(data32>>16);  sendbyte(data32>>24);
And code for machine B to receive that 32 bit integer (assumes getbyte() returnes an unsigned 8 bit integer):
    data32=getbyte();  data32|=getbyte()<<8;  data32|=getbyte()<<16; data32|=getbyte<<24;
This C code doesn't care if the machine it is on is big endian or little endian.

 
An as I understand it, Jack disagrees, here's his argument:

My comment about putting bits on the floor meant that you had to know something about
the byte order, otherwise why are you interested only in the high byte. Your code:
    sendbyte((data32>>24)&0xff);
to send a byte works great if the data is big endian: 
        01010101 00000000 00000000 00000000.         // Yellow is the byte of interest
However, if you don't know the byte order and it is: 
        00000000 00000000 00000000 01010101
Your code would throw the relevant data on the floor. Your code is only safe if you know
the order. A union is a simple way to determine that order.

Jerry, KE7ER


On Thu, Mar 8, 2018 at 12:53 pm, Jack Purdum wrote:
We've wasted enough bandwidth on this. I think unions are a great way to learn how data are organized for a given compiler and are well-worth knowing about. Anyone who doesn't think so can easily ignore them.
 

Dr Fred Hambrecht
 

Anytime knowledge is imparted it cannot be viewed as “wasted bandwidth”. I for one enjoyed the conversation.

 

v/r

Fred W4JLE

 

From: BITX20@groups.io [mailto:BITX20@groups.io] On Behalf Of Jerry Gaffke via Groups.Io
Sent: Thursday, March 8, 2018 17:15
To: BITX20@groups.io
Subject: Re: [BITX20] Pulling Arduino data apart

 

Agreed, we've wasted too much time on something not at the top of our priorities here.
Anyone wishing to continue this discussion is welcome to send me a private message.

In parting, I believe that once the data is in the CPU, in this case stored as a 32 bit integer
in a register, endian-ness is not a factor.  This code looks correct to me:

Here's C code for machine A to send a 32 bit integer as a sequence of four bytes in little endian order::
    sendbyte(data32);  sendbyte(data32>>8);  sendbyte(data32>>16);  sendbyte(data32>>24);
And code for machine B to receive that 32 bit integer (assumes getbyte() returnes an unsigned 8 bit integer):
    data32=getbyte();  data32|=getbyte()<<8;  data32|=getbyte()<<16; data32|=getbyte<<24;
This C code doesn't care if the machine it is on is big endian or little endian.

 
An as I understand it, Jack disagrees, here's his argument:

My comment about putting bits on the floor meant that you had to know something about
the byte order, otherwise why are you interested only in the high byte. Your code:

    sendbyte((data32>>24)&0xff);

to send a byte works great if the data is big endian: 

        01010101 00000000 00000000 00000000.         // Yellow is the byte of interest

However, if you don't know the byte order and it is: 

        00000000 00000000 00000000 01010101

Your code would throw the relevant data on the floor. Your code is only safe if you know
the order. A union is a simple way to determine that order.


Jerry, KE7ER

 

On Thu, Mar 8, 2018 at 12:53 pm, Jack Purdum wrote:

We've wasted enough bandwidth on this. I think unions are a great way to learn how data are organized for a given compiler and are well-worth knowing about. Anyone who doesn't think so can easily ignore them.

 

Jerry Gaffke
 

Here's a starting point on web resources regarding this big/little endian stuff in case you're curious.
    https://stackoverflow.com/questions/13994674/how-to-write-endian-agnostic-c-c-code

But for most of us this is a non-issue, and you needn't worry about it.
And you certainly don't have to suffer anybody arguing about it.

Code on the Nano (and likely in most any Arduino environment) is little endian.
Though on an 8 bit machine like the Nano, endian-ness is mostly a matter of what the compiler wants to do.
    https://www.avrfreaks.net/forum/endian-issue
    https://www.avrfreaks.net/forum/big-endian-or-little-endian-0

Jerry, KE7ER


On Fri, Mar 9, 2018 at 11:46 am, Dr Fred Hambrecht wrote:

Anytime knowledge is imparted it cannot be viewed as “wasted bandwidth”. I for one enjoyed the conversation.

 

Tom Christian
 

Great discussion from my perspective.  No wasted bandwidth here.  Thanks Jack & Jerry!
Tom
AB7WT

Michael Hagen
 

All this talk about injun's puts Mikey's brain on warpath!

 


On 3/9/2018 10:10 PM, Tom Christian wrote:
Great discussion from my perspective.  No wasted bandwidth here.  Thanks Jack & Jerry!
Tom
AB7WT

-- 
Mike Hagen, WA6ISP
10917 Bryant Street
Yucaipa, Ca. 92399
(909) 918-0058
PayPal ID  "MotDog@..."
Mike@...

Jerry Gaffke
 

The Wikipedia entry is a much better starting point than my previous stackoverflow conversation:
    https://en.wikipedia.org/wiki/Endianness
Also this, do a search for "endian":
    https://en.wikipedia.org/wiki/Lilliput_and_Blefuscu

Jerry, KE7ER



On Fri, Mar 9, 2018 at 01:02 pm, Jerry Gaffke wrote:

Here's a starting point on web resources regarding this big/little endian stuff in case you're curious.