bit-manipulation

Extracting bits with a single multiplication

I saw an interesting technique used in an answer to another question, and would like to understand it a little better.

We're given an unsigned 64-bit integer, and we are interested in the following bits:

1.......2.......3.......4.......5.......6.......7.......8.......

Specifically, we'd like to move them to the top eight positions, like so:

12345678........................................................

We don't care about the value of the bits indicated by ., and they don't have to be preserved.

The solution was to mask out the unwanted bits, and multiply the result by 0x2040810204081. This, as it turns out, does the trick.

How general is this method? Can this technique be used to extract any subset of bits? If not, how does one figure out whether or not the method works for a particular set of bits?

Finally, how would one go about finding the (a?) correct multiplier to extract the given bits?

What is the fastest/most efficient way to find the highest set bit (msb) in an integer in C?

If I have some integer n, and I want to know the position of the most significant bit (that is, if the least significant bit is on the right, I want to know the position of the farthest left bit that is a 1), what is the quickest/most efficient method of finding out?

I know that POSIX supports a ffs() method in <strings.h> to find the first set bit, but there doesn't seem to be a corresponding fls() method.

Is there some really obvious way of doing this that I'm missing?

What about in cases where you can't use POSIX functions for portability?

EDIT: What about a solution that works on both 32- and 64-bit architectures (many of the code listings seem like they'd only work on 32-bit integers).

Position of least significant bit that is set

I am looking for an efficient way to determine the position of the least significant bit that is set in an integer, e.g. for 0x0FF0 it would be 4.

A trivial implementation is this:

unsigned GetLowestBitPos(unsigned value)
{
   assert(value != 0); // handled separately

   unsigned pos = 0;
   while (!(value & 1))
   {
      value >>= 1;
      ++pos;
   }
   return pos;
}

Any ideas how to squeeze some cycles out of it?

(Note: this question is for people that enjoy such things, not for people to tell me xyzoptimization is evil.)

[edit] Thanks everyone for the ideas! I've learnt a few other things, too. Cool!

C# int to byte[]

I need to convert an int to a byte[] one way of doing it is to use BitConverter.GetBytes(). But im unsure if that matches the following specification:

An XDR signed integer is a 32-bit datum that encodes an integer in the range [-2147483648,2147483647]. The integer is represented in two's complement notation. The most and least significant bytes are 0 and 3, respectively. Integers are declared as follows:

Source: RFC1014 3.2

How could i do a int to byte transformation that would satisfy the above specification?

Are the shift operators (<<, >>) arithmetic or logical in C?

In C, are the shift operators (<<, >>) arithmetic or logical?

C/C++ check if one bit is set in, i.e. int variable
int temp = 0x5E; // in binary 0b1011110.

Is there such a way to check if bit 3 in temp is 1 or 0 without bit shifting and masking.

Just want to know if there is some built in function for this, or am I forced to write one myself.

Efficient Algorithm for Bit Reversal (from MSB->LSB to LSB->MSB) in C

What is the most efficient algorithm to achieve the following:

0010 0000 => 0000 0100

The conversion is from MSB->LSB to LSB->MSB. All bits must be reversed; that is, this is not endianness-swapping.

What is the meaning of double tilde (~~) in Java?

When browsing the source code of Guava, I came across the following piece of code (part of the implementation of hashCode for the inner class CartesianSet):

int adjust = size() - 1;
for (int i = 0; i < axes.size(); i++) {
    adjust *= 31;
    adjust = ~~adjust;
    // in GWT, we have to deal with integer overflow carefully
}
int hash = 1;
for (Set<E> axis : axes) {
    hash = 31 * hash + (size() / axis.size() * axis.hashCode());

    hash = ~~hash;
}
hash += adjust;
return ~~hash;

Both of adjust and hash are ints. From what I know about Java, ~ means bitwise negation, so adjust = ~~adjust and hash = ~~hash should leave the variables unchanged. Running the small test (with assertions enabled, of course),

for (int i = Integer.MIN_VALUE; i < Integer.MAX_VALUE; i++) {
    assert i == ~~i;
}

confirms this. Assuming that the Guava guys know what they are doing, there must be a reason for them to do this. The question is what?

EDIT As pointed out in the comments, the test above doesn't include the case where i equals Integer.MAX_VALUE. Since i <= Integer.MAX_VALUE is always true, we will need to check that case outside the loop to prevent it from looping forever. However, the line

assert Integer.MAX_VALUE == ~~Integer.MAX_VALUE;

yields the compiler warning "Comparing identical expressions", which pretty much nails it.

How can I remove a flag in C?

There is a variable that holds some flags and I want to remove one of them. But I don't know how to remove it.

Here is how I set the flag.

my.emask |= ENABLE_SHOOT;
How do I get bit-by-bit data from an integer value in C?

I want to extract bits of a decimal number.

For example, 7 is binary 0111, and I want to get 0 1 1 1 all bits stored in bool. How can I do so?

OK, a loop is not a good option, can I do something else for this?

Why is XOR the default way to combine hashes?

Say you have two hashes H(A) and H(B) and you want to combine them. I've read that a good way to combine two hashes is to XOR them, e.g. XOR( H(A), H(B) ).

The best explanation I've found is touched briefly here on these hash function guidelines:

XORing two numbers with roughly random distribution results in another number still with roughly random distribution*, but which now depends on the two values.
...
* At each bit of the two numbers to combine, a 0 is output if the two bits are equal, else a 1. In other words, in 50% of the combinations, a 1 will be output. So if the two input bits each have a roughly 50-50 chance of being 0 or 1, then so too will the output bit.

Can you explain the intuition and/or mathematics behind why XOR should be the default operation for combining hash functions (rather than OR or AND etc.)?

What is bit masking?

I am fairly new to C programming, and I encountered bit masking. What is the general concept and function of bit masking?

Examples are much appreciated.

Most common C# bitwise operations on enums

For the life of me, I can't remember how to set, delete, toggle or test a bit in a bitfield. Either I'm unsure or I mix them up because I rarely need these. So a "bit-cheat-sheet" would be nice to have.

For example:

flags = flags | FlagsEnum.Bit4;  // Set bit 4.

or

if ((flags & FlagsEnum.Bit4)) == FlagsEnum.Bit4) // Is there a less verbose way?

Can you give examples of all the other common operations, preferably in C# syntax using a [Flags] enum?

Best practices for circular shift (rotate) operations in C++

Left and right shift operators (<< and >>) are already available in C++. However, I couldn't find out how I could perform circular shift or rotate operations.

How can operations like "Rotate Left" and "Rotate Right" be performed?

Rotating right twice here

Initial --> 1000 0011 0100 0010

should result in:

Final   --> 1010 0000 1101 0000

An example would be helpful.

(editor's note: Many common ways of expressing rotates in C suffer from undefined behaviour if the rotate count is zero, or compile to more than just a single rotate machine instruction. This question's answer should document best practices.)

What is “two's complement”?

I'm in a computer systems course and have been struggling, in part, with two's complement. I want to understand it, but everything I've read hasn't brought the picture together for me. I've read the Wikipedia article and various other articles, including my text book.

What is two's complement, how can we use it and how can it affect numbers during operations like casts (from signed to unsigned and vice versa), bit-wise operations and bit-shift operations?

In C/C++ what's the simplest way to reverse the order of bits in a byte?

While there are multiple ways to reverse bit order in a byte, I'm curious as to what is the "simplest" for a developer to implement. And by reversing I mean:

1110 -> 0111
0010 -> 0100

This is similar to, but not a duplicate of this PHP question.

This is similar to, but not a duplicate of this C question. This question is asking for the easiest method to implement by a developer. The "Best Algorithm" is concerned with memory and cpu performance.

What does (x ^ 0x1) != 0 mean?

I came across the following code snippet

if( 0 != ( x ^ 0x1 ) )
     encode( x, m );

What does x ^ 0x1 mean? Is this some standard technique?

Should I use #define, enum or const?

In a C++ project I'm working on, I have a flag kind of value which can have four values. Those four flags can be combined. Flags describe the records in database and can be:

  • new record
  • deleted record
  • modified record
  • existing record

Now, for each record I wish to keep this attribute, so I could use an enum:

enum { xNew, xDeleted, xModified, xExisting }

However, in other places in code, I need to select which records are to be visible to the user, so I'd like to be able to pass that as a single parameter, like:

showRecords(xNew | xDeleted);

So, it seems I have three possible appoaches:

#define X_NEW      0x01
#define X_DELETED  0x02
#define X_MODIFIED 0x04
#define X_EXISTING 0x08

or

typedef enum { xNew = 1, xDeleted, xModified = 4, xExisting = 8 } RecordType;

or

namespace RecordType {
    static const uint8 xNew = 1;
    static const uint8 xDeleted = 2;
    static const uint8 xModified = 4;
    static const uint8 xExisting = 8;
}

Space requirements are important (byte vs int) but not crucial. With defines I lose type safety, and with enum I lose some space (integers) and probably have to cast when I want to do a bitwise operation. With const I think I also lose type safety since a random uint8 could get in by mistake.

Is there some other cleaner way?

If not, what would you use and why?

P.S. The rest of the code is rather clean modern C++ without #defines, and I have used namespaces and templates in few spaces, so those aren't out of question either.

Explain the use of a bit vector for determining if all characters are unique

I am confused about how a bit vector would work to do this (not too familiar with bit vectors). Here is the code given. Could someone please walk me through this?

public static boolean isUniqueChars(String str) {
    int checker = 0;
    for (int i = 0; i < str.length(); ++i) {
        int val = str.charAt(i) - 'a';
        if ((checker & (1 << val)) > 0) return false;
        checker |= (1 << val);
    }
    return true;
}

Particularly, what is the checker doing?

What is the idea behind ^= 32, that converts lowercase letters to upper and vice versa?

I was solving some problem on codeforces. Normally I first check if the character is upper or lower English letter then subtract or add 32 to convert it to the corresponding letter. But I found someone do ^= 32 to do the same thing. Here it is:

char foo = 'a';
foo ^= 32;
char bar = 'A';
bar ^= 32;
cout << foo << ' ' << bar << '\n'; // foo is A, and bar is a

I have searched for an explanation for this and didn't find out. So why this works?

Using bitwise OR 0 to floor a number

A colleague of mine stumbled upon a method to floor float numbers using a bitwise or:

var a = 13.6 | 0; //a == 13

We were talking about it and wondering a few things.

  • How does it work? Our theory was that using such an operator casts the number to an integer, thus removing the fractional part
  • Does it have any advantages over doing Math.floor? Maybe it's a bit faster? (pun not intended)
  • Does it have any disadvantages? Maybe it doesn't work in some cases? Clarity is an obvious one, since we had to figure it out, and well, I'm writting this question.

Thanks.

What does a tilde do when it precedes an expression?
var attr = ~'input,textarea'.indexOf( target.tagName.toLowerCase() )
           ? 'value'
           : 'innerHTML'

I saw it in an answer, and I've never seen it before.

What does it mean?

Rounding up to next power of 2

I want to write a function that returns the nearest next power of 2 number. For example if my input is 789, the output should be 1024. Is there any way of achieving this without using any loops but just using some bitwise operators?


Related: Algorithm for finding the smallest power of two that's greater or equal to a given value is a C++ question. C++20 introduced std:bit_ceil which lets the compiler do whatever's optimal for the target system, but nothing equivalent is yet available in portable ISO C for bit-scan, popcount or other common bit operations that most CPUs have. Portable C code has to be less efficient and/or more complicated.

Given an integer, how do I find the next largest power of two using bit-twiddling? is a language-agnostic version of the question with some C++11 and 17 constexpr using GNU extensions.

Answers to this question don't need to be portable; fast versions for various platforms are useful.

Divide a number by 3 without using *, /, +, -, % operators

How would you divide a number by 3 without using *, /, +, -, %, operators?

The number may be signed or unsigned.

~x + ~y == ~(x + y) is always false?

Does this code always evaluate to false? Both variables are two's complement signed ints.

~x + ~y == ~(x + y)

I feel like there should be some number that satisfies the conditions. I tried testing the numbers between -5000 and 5000 but never achieved equality. Is there a way to set up an equation to find the solutions to the condition?

Will swapping one for the other cause an insidious bug in my program?

Count the number of set bits in a 32-bit integer

8 bits representing the number 7 look like this:

00000111

Three bits are set.

What are the algorithms to determine the number of set bits in a 32-bit integer?

Why does this random value have a 25/75 distribution instead of 50/50?

Edit: So basically what I'm trying to write is a 1 bit hash for double.

I want to map a double to true or false with a 50/50 chance. For that I wrote code that picks some random numbers (just as an example, I want to use this on data with regularities and still get a 50/50 result), checks their last bit and increments y if it is 1, or n if it is 0.

However, this code constantly results in 25% y and 75% n. Why is it not 50/50? And why such a weird, but straight-forward (1/3) distribution?

public class DoubleToBoolean {
    @Test
    public void test() {

        int y = 0;
        int n = 0;
        Random r = new Random();
        for (int i = 0; i < 1000000; i++) {
            double randomValue = r.nextDouble();
            long lastBit = Double.doubleToLongBits(randomValue) & 1;
            if (lastBit == 1) {
                y++;
            } else {
                n++;
            }
        }
        System.out.println(y + " " + n);
    }
}

Example output:

250167 749833
'and' (boolean) vs '&' (bitwise) - Why difference in behavior with lists vs numpy arrays?

What explains the difference in behavior of boolean and bitwise operations on lists vs NumPy arrays?

I'm confused about the appropriate use of & vs and in Python, illustrated in the following examples.

mylist1 = [True,  True,  True, False,  True]
mylist2 = [False, True, False,  True, False]

>>> len(mylist1) == len(mylist2)
True

# ---- Example 1 ----
>>> mylist1 and mylist2
[False, True, False, True, False]
# I would have expected [False, True, False, False, False]

# ---- Example 2 ----
>>> mylist1 & mylist2
TypeError: unsupported operand type(s) for &: 'list' and 'list'
# Why not just like example 1?

>>> import numpy as np

# ---- Example 3 ----
>>> np.array(mylist1) and np.array(mylist2)
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
# Why not just like Example 4?

# ---- Example 4 ----
>>> np.array(mylist1) & np.array(mylist2)
array([False,  True, False, False, False], dtype=bool)
# This is the output I was expecting!

This answer and this answer helped me understand that and is a boolean operation but & is a bitwise operation.

I read about bitwise operations to better understand the concept, but I am struggling to use that information to make sense of my above 4 examples.

Example 4 led me to my desired output, so that is fine, but I am still confused about when/how/why I should use and vs &. Why do lists and NumPy arrays behave differently with these operators?

Can anyone help me understand the difference between boolean and bitwise operations to explain why they handle lists and NumPy arrays differently?

What are bitwise shift (bit-shift) operators and how do they work?

I've been attempting to learn C in my spare time, and other languages (C#, Java, etc.) have the same concept (and often the same operators)...

At a core level, what does bit-shifting (<<, >>, >>>) do, what problems can it help solve, and what gotchas lurk around the bend? In other words, an absolute beginner's guide to bit shifting in all its goodness.

How to set, clear, and toggle a single bit?

How to set, clear, and toggle a bit?

bit-manipulation