*The complete question was:*** **The other day I was trying to explain the difference between “impossible” and “with probability zero” to a friend. I remarked “if you pick an integer, and I pick an integer, the mathematical probability that we picked the same number is zero, but it’s certainly not impossible.” A seemingly harmless statement, but only later did I think, what does it even mean to “pick ANY integer with EQUAL probability?” Is there any meaning to a random variable that can take on an infinite number of values with equal probability? It would be like a uniform distribution that has one or both bounds stretched to infinity. Does this kind of object have a name? What kind of mathematical properties would this even have? Oh dear, this is blowing my mind just thinking about it. Please help!

**Mathematician:** Developing a consistent theory of probability for sets with an infinite number of elements (like the set of all integers, the set of all real numbers, or the set of real numbers between 0 and 1) requires dealing with a handful of subtle and tricky problems. In fact, to determine whether it is possible to choose one object from an infinite number of objects such that each object has the same probability of being chosen, we must delve into what we really mean by “possible”. Various interpretations of this question are:

1. Can we work with probabilities (as they are defined in a formal, mathematical way) on infinite sets of items, and if so, can we assign equal probabilities to each item?

2. Can a (formal, mathematical) procedure be devised to sample uniformly at random from an infinite set?

3. Can an algorithm be designed that can carry out this sampling procedure on a real computer such that the algorithm will terminate in a finite amount of time?

Each of these questions, which I’ll address one by one, raises interesting considerations.

Can a formal theory of probability be developed for infinite sets of items? Absolutely. For example, the Gaussian distribution (often known as the bell curve or normal distribution) is defined on the set of real numbers (so when you draw a sample according to this distribution, you get some real number). Strictly speaking, it assigns a probability of zero to each of the individual real numbers, but assigns a non-zero probability to subsets of real numbers. For example, if we sample from a Gaussian distribution, there is a formula that can tell us how likely the number is to be less than any particular value X (so the set of all numbers less than X is assigned a positive probability). Why does the gaussian distribution not assign non-zero probabilities to the actual numbers themselves? The reason (loosely speaking) is because the probability of getting ANY number when we sample from a distribution must be 1 (which just means that some number must always occur). On the other hand, the set of real numbers contains so many numbers that, if each of them had a non-zero probability of occurring, it would not be possible for the total probability (which is, essentially, just the sum or integral of all of the numbers’ probabilities) to be 1. Another, more intuitive way to think about this is to consider a dart board. If we throw a dart at the board and are willing to assume that matter is infinitely divisible (sorry, Physicist) then we will always hit some point on the board (assuming our aim isn’t too terrible). But, at the same time, the chance of hitting any particular point is negligibly small (zero in fact) since there are so many possible points. So clearly, to describe the probability of hitting this board’s points, it is not sufficient to consider only the probability of each individual point being hit, but rather we have to consider how likely we are to hit various regions of the board, such as the region constituting the bullseye. Even though each particular point has a zero probability of being hit, *some* point is always hit in practice, and the set of points that make up the bullseye together have a positive probability of being hit with each throw.

Okay, so we *can* define probabilities on an infinite set, though as the Gaussian distribution case shows, we may have to actually let the probabilities be assigned to subsets of our original set, rather than to every object in the set itself. But can we do this in such a way that every object in our set is sampled with equal likelihood? The answer is again yes, though with some caveats. For instance, if we limit ourselves to the real numbers that are between 0 and 1, we can assign a uniform distribution to these numbers which will give them each an equal probability. The uniform distribution basically says that the probability that a given sampled number will be between a and b (for 0<a<b<1) is equal to b-a. This fact implies that all numbers are equally likely to be sampled from this distribution.

Fine, but can we define a probability distribution on the set of integers (rather than the real numbers between 0 and 1) such that they each occur with equal probability (i.e. does a uniform distribution on the integers exist)? The answer, sadly, is no. A probability mass function (which is the kind of probability distribution we need in this case) is defined to be a positive function that has a sum of values equal to 1. But any positive function that assigns an equal value to each integer must have probabilities that sum to either infinity or zero, so the desired distribution is impossible to construct. As a technical side note though, people sometimes try to get around this issue in Bayesian analysis by applying what are known as improper priors. Attempting to define a uniform distribution on the full set of real numbers also fails, for a very similar reason that it doesn’t work on the integers (it can not be the case that each real number (or equally sized interval of real numbers) has the same probability and the probability density function integrates to 1).

On to our second question of whether is it possible to come up with a formal mathematical procedure for sampling from infinite sets. The answer is yes, if we have an unlimited amount of time to spare. For real numbers between 0 and 1, we can use the following sampling procedure:

i. Start with the number 0.0, and set n=1

ii. Set the nth digit of our number after the decimal point to a random number from 0 to 9.

iii. Increase n by 1 and return to step ii.

If this procedure were iterated forever, it would produce a single random number between 0 and 1, and all real numbers between 0 and 1 are equally likely to be generated. Essentially we are just constructing a number by choosing each of it’s decimal digits randomly.

But what if we wanted to carry out this procedure for the set of all integers? Here things get stranger. The natural choice is the following procedure:

i. Start with the number 0.0, and set n=1

ii. Set the nth digit of our number BEFORE the decimal point to a random number from 0 to 9.

iii. Increase n by 1 and return to step ii.

If this procedure were carried out forever, it would seem as though it would produce an integer, and that this integer would have an equal probability of being any integer. This is true, in the sense that all integers that the algorithm produces have an equal likelihood of being produced (i.e. when it DOES produce integers those integers are each equally likely). But the algorithm does not actually do what we would like. We begin to see the problem when we pick any integer X, and ask the question, “what is the probability that this procedure would produce a number that is greater than X?”. The answer, is that there is a probability of 1 (i.e. a 100% chance), NO MATTER WHAT X IS. This makes sense, given that the integers stretch off to infinity, and that the number of integers close to 0 will always be dwarfed by the number of them far from it. But how can this procedure (when run forever) produce one, single integer, while at the same time having a probability of 1 of producing a number bigger than any particular number we choose? Well, each integer can be thought of as having an infinite sequence of zero digits to the left of its first non-zero digit. This algorithm will have a probability of 0 of producing a number with an infinite successive sequence of zeros, and therefore will have a zero probability of producing an integer! In other words, there is a probability of 1 that each number it produces will be (in a certain sense) “infinite”, so it does not serve the purpose that we hoped.

This brings us to our final question, regarding whether a terminating algorithm (that can be run on a real computer) can be created that will sample uniformly at random from an infinite set of numbers. The answer to this question is no, but with the footnote that this does not matter too much in practice (for reasons to be discussed). One way to understand why this is impossible is to consider how many bits it would take to transmit the number produced by such a sampling procedure. We would have to transmit some kind of code (agreed upon in advance) that represents the number we got from sampling, but since there are an infinite number of possible outcomes and since we have no knowledge (in advance) of what number will occur, we would need to have an infinite number of codes to represent all the outcomes. Hence, if we think of the codes as numbers, and choose some number N, then some of the codes must contain more than N digits (since N digits is only enough to describe a finite number of different codes). But since this holds for all N no matter how large, this means that, on average, it would take literally forever to transmit one of these codes. But, if the number cannot be transmitted, no computer could ever make a copy of it, which implies that no computer could ever generate such a number. What this confusing, convoluted argument is getting at is the fact that a uniform distribution on an infinite set of items (if it existed) would have an infinite entropy, so the numbers sampled from such a procedure could never be transmitted or stored (as doing so, on average, would require an infinite number of bits) so there is no way that such an algorithm could be used in real life. One way to see that the entropy of such a distribution would be infinite is to note that if we define a uniform distribution on n items, that as we let n go to infinity the entropy of the distribution approaches infinity.

Despite these problems, some infinite sets have a nice property that the procedure of sampling (with equal probability) from them can be nicely approximated on a computer. For example, if we want to sample a real number between 0 and 1, we can approximate this procedure by limiting ourselves only to numbers with at most 40 digits after the decimal point, and then sampling uniformly at random from this restricted set. While this procedure is not perfect, it will produce numbers that (for most purposes) look like those we would get if we truly sampled from all real numbers between 0 and 1. On the other hand, sampling uniformly at random from the set of all integers cannot be approximated in any nice way (and hence, is in some sense an inaccessible procedure). The problem here is that, as noted, if you fix any number X that you like, 100% of integers are greater than that X, no matter what X is. Since real computers are limited in the size of the numbers they can store, any attempt to approximate the procedure of sampling from all integers will be limited to sampling from integers less than some number X, despite the fact that 100% of integers truly are above X. If we try to sample uniformly at random from the set of all integers (or the set of real numbers, for that matter) we are doomed to complete failure.

Pingback: Q: Is it odd that the universe’s constants are all so perfectly conducive to life? « Ask a Mathematician / Ask a Physicist

another interesting tidbit related to this is that if you pick any digit, 100% of all integers will contain this digit at least once.

Infinite is a process which does not end, not a number, which means there has never and can never be such an experiment.

Pingback: Q: How do you talk about the size of infinity? How can one infinity be bigger than another? | Ask a Mathematician / Ask a Physicist

Pingback: Q: What exactly is the vacuum catastrophe and what effects does this have upon our understanding of the universe? | Ask a Mathematician / Ask a Physicist

”

i. Start with the number 0.0, and set n=1

ii. Set the nth digit of our number BEFORE the decimal point to a random number from 0 to 9 or #.if # halt

iii. Increase n by 1 and return to step ii.”

would this work?

i think it would produce numbers closer to zero then larger numbers though.

Yes, that procedure makes numbers with a small number of non-zero digits much more likely than ones with many non-zero digits.

Pingback: Q: What are chaos and chaos theory? How can you talk about chaos? | Ask a Mathematician / Ask a Physicist

Great article. I just wanted to note a way to come up with random numbers between 0 and 1 that wasn’t mentioned here, but is cool and important so maybe it will benefit other readers.

Assign 1 to Heads and 0 to Tails. Flip a coin, recording the outcomes (Dn). Construct a binary number from the outcome with the form 0.D1D2D3…Dn. If you repeat this infinitely many times you magically get a random number between 0 and 1. And this has cool uses in probability theory. For more info look up “Borel’s Normal Number Theorem” or check out Billingsley’s Probability and Measure.

hello friends ,

I want to know about probability distribution which can be apply at a moving object.

If an object is moving, and we want to find the next(future) location of that.

It is possible to define a uniform probability distribution on the rational numbeers in [0,1]. There are various ways of putting that set of rational numbers in 1-to-1 correspondence with the positive integers. If you can take a sample from the uniform distribuiton of rational numbers in [0,1], you can take a sample from the positive integers by finding what integer corresponds to the rational number that you picked. Hence you can define a probability distribution on the positive integers, where each integer has the same probability of being selected. It’s just that this probability is zero!

It is not clear whether sampling from finite uniform discrete distributions on the unit interval actually approximates sampling from a uniform continuous distribution on the unit interval. It isn’t clear because the definition of what it means to “approximate” isn’t stated – and I can’t find any sources that have bothered to study this topic.

Some properties of a probability distribution cannot be approximated by sampling it, exactly or approximately. For example a property of a given Gaussian distribution is whether it’s mean is a rational number. I don’t know of any definition of “approximation” that would let us conclude that this property can be approximated by sampling the distribution.

There might be properties of a continuous distribution that cannot be “approximated” by sampling discrete distributions. It would be interesting to try out various definitions of “an approximable distribution” and see where they lead.

This is really getting into the heart of measure theory!
However, the 1-1 map only preserves uniformity with respect to the point measure, but not the Lebesque measure, and the unit interval has an infinite number of points, and thus no definable uniform probability distribution.

There are two “measures” in play here: the Lebesque measure (length of an interval) and the point measure (how many points). A uniform probability can only be defined on a set of finite measure, but the real numbers have both an infinite Lebesque measure and an infinite point measure (infinite length and infinite points).

On the other hand, the unit interval has a finite Lebesque measure and an infinite point measure, so a uniform distribution on the unit interval, with respect to the Lebesque measure, is certainly possible. That measure is: p(x)=1 for 0

Tricky stuff!

I agree that if Legesgue measure is used, any measureable subset of the rational nubers in [0,1] would have measure 0. However, we can look for non-Lebesgue measures that are “uniform” in the generic sense that they don’t treat any rational number as special.

See if this works. I think the cardinality of the set of 1-1 mappings of the non-negative integers onto themselves is aleph_1, and this is the same as the cardinality of the real numbers in [0,1]. So there is some 1-1 mapping F from the real numbers in [0,1] to the 1-1 mappings of the integers onto themselves. Consider a two-step sampling process. Let G be a discrete distribution defined on the non-negative integers (such as a Poisson distribution). Draw an integer k from G. Draw a real number x from the uniform distribution on [0,1] (the usual uniform distribution, which uses Lebesgue measure). Find the mapping of the integers to themselves defined by f = F(x). Let f(k) be the value of the sample taken by the two-step process.

Is that process in violation of any of the axioms of measure theory?

I pick n[finite] numbers @ random from an infinite set[ eg., integers]…the chance of a match is clearly possible, yet its probability must be zero…please help me with paradox… keith

keith,

Apparently you didn’t read the blog. You can’t pick an integer at random from the set of all integers.

I think there is a mistake in the continuous uniform distribution case. The probability of sampling a number between a and b is 1/(b-a) not b-a as stated in the article.

@Costas Argyris

Thanks for checking! After looking at it carefully, it should stand. If the probability were , then the probability of being between 0.8 and 0.9 would be 10.

For Stephen Tashiro: your construction doesn’t work, because you are not realizing that the concept of “pick a random number in [0,1]” in standard set theory is not producing an actual number, it is simply a second-class sort of number inside the real number line. Random numbers in ZFC don’t have a value, they only have a measure. That means you are simply not allowed to speak about the map from omega to omega corresponding to this “number” you chose in [0,1], because “random numbers” in ZFC simply never can be consistently assumed to have values.

The way you hear this said in mathematician-talk is that the sets involved are non-measurable. That means simply that you can’t talk about the probability of producing numbers inside those sets consistently, so that the concept of picking a random number can’t be reconciled with the existence of such sets.

For your specific case, the inverse image of the collection of maps from omega-to-omega whose value is “n” at the integer you chose at the first step is simply not measurable, so you can’t talk about the probability of producing an integer n from your pseudo-algorithm. The non-algorithmic part is the map from [0,1] to the map from omega to omega, which you made assuming the continuum hypothesis, which can’t correspond to any construction. The proof that it can’t correspond to any construction is simply the fact that any such construction would conflict with the random picking construction, and lead to a paradox.

This is extremely important to understand— modern mathematics does NOT allow you to pretend that “picking a random number” with a definite value, and then talking about this value as you do intuitively, using all the available things on the real numbers, is a consistent notion, because this conflicts with the axiom of choice.

I don’t think this is a bearable situation. It must be fixed, so that naive arguments like the ones you made above don’t fail.

The simplest paradox: given a random real in [0,1] what is the probability that it lands in a Vitali set? (this doesn’t make sense)

More elaborate paradoxes are more interesting, because they show you why specific constructions that are claimed to exist in standard mathematical theory simply can’t exist when you can pick numbers at random (i.e. in the real mathematical universe). These are the “non-constructive constructions” of classical mathematics.

1. The continuum hypothesis is incompatible with random picking: consider ordering the reals in [0,1] with order-type aleph-1, pick two random reals x,y and ask is x<y or y<x. If you pick x first, the number of y<x is countable, and has measure 0. If you pick y first, it goes the other way. That's a paradox associated with Hausdorff popularized by Frieling. This shows that the well-ordering of the reals with order type aleph-1 can't be exhibited in any algorithmic or even predicative way, as it is incompatible with the picking of two random numbers in [0,1], something which we definitely _can_ do with an algorithm.

2. There is no basis for R as a vector space over Q: pick two Gaussian random reals x and y with mean 0 and width 1, and consider z=(3x + 4y)/5. The z is again Gaussian with mean 0 an width 1. But if you decompose x and y into a basis over Q, x is composed of n1 basis elements, y of n2 basis elements, and z of n1+n2 basis elements, but they all have to have the same distribution. It is impossible for the sum of two draws from a probability distribution on the positive integers to have the same distribution again (this is obvious if you think about the first place where the distribution is nonzero).

You can do similar things to show that a nonprincipal ultrafilter on the integers is incompatible with this, the Banach Tarsky paradox (obviously), in short, all the nonsense results of the past century are omitted in universes where you can pick real numbers at random. The method of showing that adjoining random reals is consistent was worked out by Solovay, following Paul Cohen, and is covered in an obtuse incomprehensible way in most modern logic books. It's obtuse and incomprehensible only because these books refuse to bite the bullet and say that picking random numbers is a consistent notion.