Please see SVN Access for information regarding DokuWiki projects.
Your Donations help keep my Software going!
Endianness Confusion
Note: This is from a question posted in the Computer Orginization class forum, for the final class project– a single-cycle mini MIPS datapath simulator.
Michael G. Writes:
Hello everyone. The project description says that the memory is to be stored in big endian format. I am assuming this corresponds to “unsigned *Mem” in the program.
My machine is little endian, and so when writing to *memdata from Mem[index] in rw_memory(), I would need to convert the word at Mem[index] to little endian for it to be interpreted correctly, right …?
But Olympus is SPARC, and SPARCs are big endian, right? So when writing to *memdata from Mem[index], converting to little endian would be wrong, since it has to stay in big endian. I would need some way of knowing whether the real machine I'm running on is little endian or big endian… wouldent I?
Also, when writing to Mem[index] in rw_memory() it needs to be in big endian. But unsigned data2 could be in little endian or big endian depending on whether its running on my PC or on Olympus. I would need to know the endianness of data2 to know whether I need to convert it to big endian from little endian or if I should leave it as it is.
I am very confused, please set me straight. Help! :)
Terence Writes:
I'm going to speak confidently; but don't completely rely on my assessment, and double check your work, because I won't take any liability. :-o\nn,
You have two options.
1. Byte swap LE→BE if you're on LE
2. Build a BE number byte by byte.
While byte swapping is fairly straight forward, you do not need to do it if the platform is already the same endian. So then you would need to check the platform's endian. Something like this will do:
char isLE() //using char in lieu of bool { unsigned x = 0x11; //LE: 0x11000000, BE: 0x00000011 //note 0x11 is just an arbitrary number return (((char*)&x)[0] == 0x11) ? 1 : 0; }
I'll leave the byte swap portion up to you. One problem with this method (for this project) is that this is effectively slower since you have no control over initialization nor build method (thus no global “isLE” var that can be initialized only once by either main nor build macro.)
If you were to write this in MIPS asm, you're looking at (let $t9 be the return value, 0 be the address of x, and this code untested):
addi $t0, $zero, 17; //t0 = 0x11 sw $t0, 0; // x = 0x00000011 lb $t1, 0; //t1 = ((char*)&x)[0] (load byte) beq $t1, $t0, true; //if t1 == 0x11 t9=true else t9=false false: addi $t9, $zero, 0; j ret; true: addi $t9, $zero, 1; ret: nop;
And remember, then you still have to do the byte swap, which is more code generated, including a second branch based on the return value.
Your alternative(and a more elegant solution for this project) is to simply “build the BE byte by byte”; in C(and MIPS ISA for instance) your number is always visible as user-readable(which is Big Endian.) Thus all you need to do is mask and shift the portions into the proper places(similarly for reading them.)
((char*)&dest)[0] = (val & 0xff000000) >> 24; ((char*)&dest)[1] = (val & 0x00ff0000) >> 16;
…and etc for the rest. Look for 0xaabbccdd in project.pdf and you'll see that it works.
If you're still not convinced, it's because in C, math operations aren't endian dependant. Even at the ISA level this is true. And again, the bitmask above is converted to the native endian. So essentially the two lines(and the rest that are left as an exercise) convert from “native to big” regardless of what native actually is. (Again contrast this to the byte swapping option.)
Essentially in MIPS, let v0 be val, 0 be the address of dest, and this code untested:
lui $t0, 0xff00; and $t1, $v0, $t0; //t1 = val & 0xff000000 srl $t1, $t1, 24; //t1 >>= 24 sb $t1, 0; //((char*)&dest)[0] = t1; (store byte) lui $t0, 0x00ff; and $t1, $v0, $t0; //t1 = val & 0x00ff0000 srl $t1, $t1, 16; //t1 >>= 16 sb $t1, 1; //((char*)&dest)[1] = t1; (store byte)
And a few more times (note you can use andi for the rest); no need to check the endian type which implies no branch, and is stored in BE format no matter what.
Also, this is not to say that a byte swapping isn't actually preferred, but again, you have no real initializer, so (in my opinion) the latter is a better solution.
Q: “So when writing to *memdata from Mem[index], converting to little endian would be wrong, since it has to stay in big endian.”
Right, so you either check if you're LE and if so swap, or you don't swap– you build the endian in memory byte by byte.
Q: “I would need some way of knowing whether the real machine I'm running on is little endian or big endian… wouldent I?”
You don't need it, but you could do it that way.
Q: “I would need to know the endianness of data2 to know whether I need to convert it to big endian from little endian or if I should leave it as it is.”
Right, if you plan on doing a swap; but if you build it byte by byte you don't need to.
Write a test app or two before you commit to whatever method you intend on going with; again I make no guarantee of correctness, so use my advice at your own risk.
Good luck!
Michael G. Writes:
Thanks for the reply, Terrance. I remember you were telling me about this this morning, but what I didn't understand was that the C » and & operators don't care about endianness, i.e. even if something is in little endian format internally, these operators treat it as if it was in big endian or human readable format. Is that right? I wrote a test program, and it seems to verify this. I want to make sure I'm correct, though, so let me know if something seems wrong. Here is the program I wrote:
#include <stdio.h> int main(void) { unsigned big_endian_x; // used later // this gets stored in the native format; for little endian it is 67 45 23 01 unsigned x = 0x01234567; // this line verifies that printf("%u: %02X %02X %02X %02x\n", x, ((unsigned char*)&x)[0], ((unsigned char*)&x)[1], ((unsigned char*)&x)[2], ((unsigned char*)&x)[3]); // it seems that C math operators still treat it as if it was in human // readable or big endian format. This prints the number in the "correct" // order even though on my machine it is stored backwards. printf("%02X\n", (x & 0xff000000) >> 24); printf("%02X\n", (x & 0x00ff0000) >> 16); printf("%02X\n", (x & 0x0000ff00) >> 8); printf("%02X\n", (x & 0x000000ff)); printf("%X\n", (x)); /* * Because the C math operators treat numbers like this as if they were * in human readable or big endian format, it is possible to, as Terrance * said, "build the BE byte by byte" * * By taking the 4 bytes that were printed above and forcing them into the * correct byte positions... i.e., data[0] gets the first byte, data[1] gets * the second, data[2] the third, data[3] the fourth, in the order above. * This will actually store the number in big endian rather than native * format. */ ((unsigned char*) &big_endian_x)[0] = (x & 0xff000000) >> 24; ((unsigned char*) &big_endian_x)[1] = (x & 0x00ff0000) >> 16; ((unsigned char*) &big_endian_x)[2] = (x & 0x0000ff00) >> 8; ((unsigned char*) &big_endian_x)[3] = x & 0x000000ff; /* * Let's verify that this is really in big endian format */ printf("%u: %02X %02X %02X %02x\n", x, ((unsigned char*)&big_endian_x)[0], ((unsigned char*)&big_endian_x)[1], ((unsigned char*)&big_endian_x)[2], ((unsigned char*)&big_endian_x)[3]); /* It is! Interestingly, if we try to print this like normal on a little * endian machine, it puts it bakwards again :) * * But at least its in big endian format, and this method works on both * little end and big endian machines. If you run this on a big endian * machine, all that we really would have done is go from big endian to big * endian... :) */ printf("%X\n", big_endian_x); return 0; }
The output of this program on a little endian machine is
$ ./endian 19088743: 67 45 23 01 01 23 45 67 1234567 19088743: 01 23 45 67 67452301
On a big endian machine it is:
>./endian 19088743: 01 23 45 67 01 23 45 67 1234567 19088743: 01 23 45 67 1234567
The output doesn't make any sense without looking at the code at the same time, though. Thanks so much for helping. Again, if something seems wrong please let me know. :)
Terence Writes:
Q: “Thanks for the reply, Terrance. I remember you were telling me about this this morning, but what I didn't understand was that the C » and & operators don't care about endianness, i.e. even if something is in little endian format internally, these operators treat it as if it was in big endian or human readable format. Is that right?”
Yeah, you got it.
Q:“I wrote a test program, and it seems to verify this. I want to make sure I'm correct, though, so let me know if something seems wrong. Here is the program I wrote:”
Yeah, that's the output you should expect; not so tricky after all!
You are here: News » Email Q&A » Endianness Confusion