ARM immediate value encoding ... for dummies

11-11-2021

1 Introduction
2 Background
- 2.1 Encoding positive numbers
- 2.2 Encoding negative numbers
3 Interactive walkthrough

Introduction

During the first term of my M.S. in Computer Science program at Johns Hopkins, I took Foundations of Computer Architecture with Dr. Kann. He had a list of final projects that we could do, one of which was to develop an interactive tool to teach students about the immediate Operand2 encoding that are on the ARM chips (I also did a Raspberry Pi-based project here). This blog is intended to be a companion to Alisdair McDiarmid's excellent blog post on the same topic. The only difference being that the interactive tool presented here goes into much more detail on the steps taken to encode a number and it can also handle negative numbers.

Background

Encoding positive numbers

Before you begin the exercise, you should read Alisdair McDiarmid's blog post for a refresher on machine code and ARM instructions. For convenience, I reproduce the most important information below.

ARM uses bits 0-11 of an instruction to represent an immediate, but it does not represent the full number in 12 bits. In fact, the bottom 8 bits are used to represent the number while the top 4 bits are used to capture a rotation amount. Since we can only represent 16 numbers with 4 bits, the rotation amounts are always multiplied by 2. This way, the 4 rotation bits can capture all the even numbers from zero to 30, which is much more useful than all numbers from zero to 15. Hence, 1011 = 11 below actually equals a rotation amount of 11 x 2 = 22.

Therefore, the binary string 1011 11111111 actually represents the instruction 11111111 ROR 1011, or 255 ROR 22. If you rotated 255 to the right 22 times, you will get 261120, which is the number that the binary string 1011 11111111 actually encodes. But how does ARM even know to encode 261120 as 255 ROR 22 in the first place? The main idea behind the encoding is to shift the most significant parts of the original number into the bottom 8 bits (turning 261120 into 255) and then use the 4-bit rotation amount to reconstruct the number by rotating the 8-bit number that amount to the right until we get the original number. Informally, significance denotes the longest span of bits that starts and ends with a 1.

For example, consider the number 261120. Its 32-bit binary representation is 00000000000000111111110000000000. If I only preserved the string of 1s starting at bit 10 and ending at bit 17 (8 bits in total), I can still reconstruct the number if I knew the original position of this 8-bit string. Thus, an alternate way to encode this is to say that 261120 is 11111111, where the first 1 is at bit 10 and the last 1 is at bit 17. ARM immediate encoding is very similar in that it also captures the significant bits but instead of encoding the raw position of the 8 bits in the original number, it encodes that information in a rotation amount. Observe that the 8-bit number 11111111 extended to be 32 bits 00000000000000000000000011111111 and then rotated 22 times to the right gets you back to 00000000000000111111110000000000, the original number. Does this process apply to negative numbers?

Encoding negative numbers

Since a negative number has the majority of its 32 bits filled with ones, we will be utilizing the help of a command known as MVN. The MVN instruction moves the inverse of a bit string into a register. For example, the instruction MVN r0, 0b00000000000000111111110000000000 will cause the register r0 to hold the value 11111111111111000000001111111111.

It is best to illustrate the encoding of a negative number with an example. Let us consider -265, which is 11111111 11111111 11111110 11110111 in two's complement form. Clearly, this number is longer than the 12-bit encoding that is alloted for immediates. However, we can apply the same tricks that we used to encode positive numbers, namely, we can capture the most significant bits of this number, try to fit it into 8 bits, and then specify a rotation amount with some bit inversion. It should be obvious that the ones in the number is redundant and therefore is not significant (recall that every negative number has a bunch of ones floating around). Thus, the most significant part of this number has to be the longest substring that starts and ends with zero, namely 11111111 11111111 11111110 11110111. An alternate way to encode this is to say that -265 is 011110, where the first zero is at bit 3 and the last zero is at bit 8, and the rest of the bits are all ones.

Then, can we just fit 011110 into the bottom eight bits and then do some sort of rotation to get the bits back to the original position? If we extend 011110 into a 32-bit number, getting 00000000 00000000 00000000 00011110, rotated it 29 times to the right (allowing odd rotations for the sake of demonstration), we will get 00000000 00000000 00000000 11110000. This is quite close to our original number, except all the bits at positions 0-2 and 9-31 have to be ones. We could try inverting this bit string, which would flip all the zeros at bits 0-2 and 9-31 into ones, but it would also mess up all the bits at positions 3-8, turning the number into 11111111 11111111 11111111 00001111.

Therefore, a clever way to solve this dilemma is to first invert all the bits in the significant string (the longest string starting and ending with zero) so that when we rotate it back to its original position and invert the entire 32-bit string, the significant bits will be inverted back to their correct values. Recall that our significant bits are 011110. Let's first invert it, getting 100001. Then, extend it into a 32-bit number to get 00000000 00000000 00000000 00100001. Now, rotate it 29 times to the right (once again, allow odd rotations for the sake of demonstration) to get 00000000 00000000 00000001 00001000. Invert the entire string, and voila, we get 11111111 11111111 11111110 11110111, which is -265. In reality, since the number of significant bits is less than 8, we would also get the ones to the left and to the right of our significant bits, namely 11111111 11111111 11111110 11110111. You should convince yourself that if we rotated 10111101 to the right 30 times and then inverted the entire 32-bit string we will get -265.

Note that the same considerations that plagued us when we tried to encode positive numbers should also apply here, namely if the longest substring the starts and ends with zero is longer than 8 bits then you will not be able to encode the number.

Interactive walkthrough

The goal of this interactive walkthrough is to go through the steps of encoding a number into the 12 bits that is used to represent an immediate in an ARM instruction. This demo works differently for positive and negative values. You can watch demos for the tool here: positive numbers, negative numbers.

To get started, please enter a number you want to encode into the input box below. The binary representation of your number should then populate in the 32 bits below the input box. You can try a base-10 number or a hex number prepended with 0x.

Example numbers to try: 261120, -265, -36865, 0x102, 0xFF0000FF, 0xC0000034