So as Johan pointed out I made a pretty big mistake in my previous post because rand(10000) does not generate 10000 random numbers, it generates a 10000×10000 matrix of random numbers. Here’s the corrected code:

tic
a=binornd(1,0.9,10000,1);
toc
tic
a=rand(10000,1)<0.9;
toc
tic
for i=1:10000
a=binornd(1,0.9);
end
toc
tic
for i=1:10000
a=rand<0.9;
end
toc

And here are the corrected times:

Elapsed time is 0.002682 seconds.
Elapsed time is 0.000368 seconds.
Elapsed time is 0.462683 seconds.
Elapsed time is 0.000401 seconds.

This does seem a lot more reasonable. The extra overhead in calling binornd presumably explains why (2) takes a lot longer than (1). (4) is almost as fast as (2) because (4) is automatically vectorised. However, it appear the compiler does not manage to vectorise (3) because it is a load slower than (2).

Although reasonable, there still seems to be a lot of inefficiency in how the random number generator’s output is used in binornd. rand generates a random number between [0,1]. If it generated a true real from the uniform distribution on [0,1], we could construct an infinite sequence of independent binary random variables just by taking it’s binary representation. Let’s assume that it does generate a true draw from the uniform distribution up to the numerical accuracy defined by the 64-bit internal representation of a double: we then have 52 random bits (since out of the 64bits 52 define the fractional part), 52bits of information. The distribution I actually want to draw from (0 with probability 0.9 and 1 with probability 0.1) has information bits. When we do the rand<0.9 operation, we throw away 52-0.469 bits, which is pretty wasteful in any situation where the random number generation is critical, such as Monte Carlo simulation.