Calculating Probability of N Points on a Line Being Within Given Distance

In summary, there is an analytical solution to deal with any number of points, but it requires a bit of work.
  • #1
CoNiss
1
0
TL;DR Summary
Probability of any random n points on a line being within a given distance
Probability of any random n points on a line being within a given distance

Hi,

I am a software engineer trying to solve the following problem analytically

given a line segment in cm and n random points on it
what is the probability that the distance between any 2 consecutive points on the line is less than a given minimum distance?

For Example:
n = 10 points
lineSegment = 1000 cm
minimumDistance = 2 cm

Running a Montecarlo simulation I took the following steps:
1. generate n random points
2. sort the points by order of smaller first
3. calculate the distances between consecutive points
4. count how many distances are smaller or equal to the minimumDistance.

Link to Python Montecarlo simulation on replit:
https://replit.com/@NissimCohen/MonteCarlo1#main.py

Is there an analytical solution to deal with any n number of points?

Thanks...
 
Physics news on Phys.org
  • #2
Hello @CoNiss ,

:welcome: ##\qquad## !​

Not familiar with Python, and no statistics expert -- just curious. And intrigued by the 5/6 probability of zero instances (see below).

My reasoning:
With one randomly distributed point on a 1000 cm line, the probability that a second randomly distributed point is NOT within 2 cm is 0.996.
For 10 points there are 45 pairs and I think that the probability for zero instances is 0.99645=0.835, which is pretty close to 5/6.

For (exactly) one instance I would propose a probability of 0.99644 times 0.004 times 45 = 0.151 (the 45 for the number of pairs in 10 points)

This 0.151 is in the neighborhood of the 0.155 a run of your code found:

Code:
================
Monte Carlo Simulation: ( sample size  =  1000000 )

Results:
(10 points on a 1000 cm line segment)

0 instances:  833286
1 instances:  154916
2 instances:  11352
3 instances:  436
4 instances:  10
5 instances:  0
6 instances:  0
7 instances:  0
8 instances:  0

Probability of all distances (per sample of 10  points) being smaller than 2 cm:
1 - ( 833286  /  1000000 ) =  0.16671400000000003
================

But for two instances I haven't found a reasonable probability expression so far.

Interesting exercise, for sure !

##\ ##
 
  • #3
CoNiss said:
TL;DR Summary: Probability of any random n points on a line being within a given distance

Probability of any random n points on a line being within a given distance

Hi,

I am a software engineer trying to solve the following problem analytically

given a line segment in cm and n random points on it
what is the probability that the distance between any 2 consecutive points on the line is less than a given minimum distance?

For Example:
n = 10 points
lineSegment = 1000 cm
minimumDistance = 2 cm

Running a Montecarlo simulation I took the following steps:
1. generate n random points
2. sort the points by order of smaller first
3. calculate the distances between consecutive points
4. count how many distances are smaller or equal to the minimumDistance.

Link to Python Montecarlo simulation on replit:
https://replit.com/@NissimCohen/MonteCarlo1#main.py

Is there an analytical solution to deal with any n number of points?

Thanks...
An idea: You could make the problem discrete by dividing the line segment into a large number of "slots", for example 10 000 or 100 000 of them.
Then you work out in how many ways you can place n points in these slots.
You have to decide whether the points are distinguishable or not and whether several points can be in the same slot.
Then you work out the same number of arrangements for a smaller number of slots to take into account that you don't want two points within a certain distance of each other. Here you also have to exclude the possibility that two points are in the same slot.
Division of the two results should give you 1 minus the probability you are looking for.
Does that sound right?
 
  • #4
CoNiss said:
TL;DR Summary: Probability of any random n points on a line being within a given distance

Probability of any random n points on a line being within a given distance

Hi,

I am a software engineer trying to solve the following problem analytically

given a line segment in cm and n random points on it
what is the probability that the distance between any 2 consecutive points on the line is less than a given minimum distance?

For Example:
n = 10 points
lineSegment = 1000 cm
minimumDistance = 2 cm

Running a Montecarlo simulation I took the following steps:
1. generate n random points
2. sort the points by order of smaller first
3. calculate the distances between consecutive points
4. count how many distances are smaller or equal to the minimumDistance.

Link to Python Montecarlo simulation on replit:
https://replit.com/@NissimCohen/MonteCarlo1#main.py

Is there an analytical solution to deal with any n number of points?

Thanks...
There's a homework problem on here to calculate the expected value of the shortest segment for ##n## random points. And you can find analytic solutions elsewhere online.

You might be able to modify some of Techniques involved to solve your problem.
 
  • #5
I think carrying out the plan indicated by @Philip Koeck requires finding a convenient expression for the total number of iterations of some nested loops where the loop variables have a simple dependence.

To elaborate:
Approximate a continuous segment by dividing it into L equal intervals ("bins"). Define a slection of a point to be a selection of an entire interval, The point's location is given by its interval number, numbering from left to right on the x-axis of a graph. So the smallest possible location of a "point" is 1 and the largest possible location is L.

Assume that we use a sufficiently large number of intervals L so that the probability of two points being chosen in the same interval is approximately zero.

Let the distance between two points at locations j < k be k - j. So for example, adjacent intervals are a distance of 1 apart and a point located at k is a distance M from the point located at k+M.

With those assumptions, the number of possible arrangements of N points in L bins is given by the binomial coefficient ##\binom{N}{L} = \frac{N!}{L! (N-L)!}##. Each arrangement has equal probability by the assumptions of the problem.

Counting the number of these arrangements that do not have two points smaller than the (integer) distance M between them can (in theory) be done by writing a program with nested loops.

For example for N = 3 points, pseudo code for the program looks like:

(a3 is the location of the rightmost point and a1 is the location of the leftmost point )

Count=0
For a3 = 1+2*M to a3= L
For a1 = 1 to a3-2*M,
For a2 = a1+M to a2 = a3-M
Count = Count+1
End for a2
End For a1
End for a3

It wouldn't surprise me if people who study computation complexity or combinatorics have developed formulas for finding the Count.

To solve the whole problem,we need to take the limit of the ratio of Count to the total number of possible arrangements as the number of bins approaches infinity. One hopes that Sterlings approximation to the factorial function will allow that. However, the error between K! and its Sterlings approximation approaches infinity as K approaches infinity. So the expression we are dealing with needs to involve ratios of factorials. If the expression for Count involves sums of factorials, things could get tricky.
 
  • Like
Likes Philip Koeck
  • #6
(I now realize) the analysis implied by the previous post doesn't require factorials.

The proposed solution is ##\lim_{L \rightarrow \infty} \frac {Count} {\binom{L}{N}}##

The denominator ##\binom{L}{N}= (L)(L-1)...(L-N+1)/N! ## is a polynomial of degree ##N## with the highest order term ##1L^N/N!##.

The numerator, Count, can be computed as nested summations. These evaluate to a polynomial of degree ##N## in ##L##.

The limit is the ratio of the coefficients of ##L^N## in the two polynomials.

To illustrate this in the case N=3 of the previous post.

A program loop "from A to B" is evaluated (B-A+1) times. So the loop on a2 increments Count by ## (a_3-M)- (a_1+M) + 1 = a_3-a_1- 2M+1 ##

The final value of Count is given by the nested summations:
##\sum_{a_3=2M}^{a_3=L} ( \sum_{a_1=1}^{a_1=a3-2M} (a_3 - a_1 - 2M + 1))##

To retain the approximation of a length m by a number of intervals M, we also must let M approach infinity as L does. We keep the ratio M/L as close as possible to the ratio of the minimum distance to the length of the line segment. Let this ratio be ##r##. So, in the limit, we can replace M by Lr. With that substitution, the polynomial expression for Count will involve only the variable L and constants.

For example the innermost sum above is
##\sum_{a_1=1}^{a_1=a_3-2M} (a_3-a_1-2M+1)##

## = (a_3-2M-1+1)(a_3-2M+1) - \sum_{a_1=1}^{a_1=a3-2M} a_1 ##

(apply the summation formula ## \sum_{k=1}^n k = (k)(k+1)/2## )

##= (a_3-2M)(a_3-2M + 1) - (a_3-2M)(a_3-2M+1)/2) ##

##= (1/2) (a_3-2M)(a_3-2M+1) ##

The summation on ##a_3## is thus
## \sum_{a_3=2M}^{a_3=L} ((1/2)(a_3-2M)(a_3-2M+1))##

##= (1/2) \sum_{a_3=2M}^{a_3=L} ((a_3)^2 - 4a_3M + 4M^2 + a_3 - 2M)##

## = (1/2) (L -2M+1) (4M^2 - 2M) ##
##\ \ + (1/2)\sum_{a3=2M}^{a_3=L} (a_3)^2 ##
##\ \ + (1/2)(1 - 4M) \sum_{a_3=2M}^{a3=L} a_3 ##

At the moment, I don't have the energy to work this further! However, I hope this illustrates how the formula for Count will have its highest order term a multiple of ##L^3## after applying the summation formula for ##\sum K^2## and substituting ##M = rL##
 
Last edited:
  • #7
Isn't the same as "a stick is broken into N random pieces"? Don't textbooks have this for small N (exact) and large N (approximations, and possibly exact)?
 
  • #8
It turns out that the continuous approach leads to simpler algebra than the approximation by bins. Let the line segment be ##[0,L] ## and the desired spacing be ##m##. For example, take the case of n = 3 points.

The joint probability density of 3 points ##(x,y,z)## in the line segment is ##f(x,y,z) = \frac{1}{L^3}##

Integrating this density over the set where ##(x,y,z)## are at least ##m## apart requires doing integrals over sets such as ## \{ (x,y,z)| 2m \le z \le L,\ 0 \le x \le z-2m,\ x+m \le y \le z-m\} ## or ##\{ (x,y,z)| 2m \le y \le L,\ 0 \le x \le y-2m,\ x+m \le z \le y-m\}##
etc.

There are 3! = 6 such mutually exclusive sets.

Let ##a_1< a_2<a_3## be one ordering of the variables ##(x,y,z)##.

##\int_{a_3=2m}^{a_3=L} \int_{a_1=0}^{a_1=a_3-2m} \int_{a_2=a_1 + m}^{a_2=a_3-m} \frac {1}{L^3} \ da_2 \ da_1\ da_3##

## =\frac{1}{L^3} \int_{a_3=2m}^{a_3=L} \int_{a_1=0}^{a_1=a_3-2m} (a_3 - a_1 - 2m)\ da_1\ da_3 ##

## = \frac{1}{L^3} \int_{a_3=2m}^{a_3=L}( \ |_{a_1=0}^{a_1=a_3-2m} ( a_3 - 2m)a_1 - \frac{a_1^2}{2} )\ da_3 ##

## = \frac{1}{L^3} \int_{a_3=2m}^{a_3=L}( (a_3 - 2m)^2 - \frac{ (a^3 - 2m)^2}{2} ) \ da_3 ##

## = \frac{1}{L^3} \int_{a_3=2m}^{a_3=L} \frac{ (a^3 - 2m)^2}{2} \ da_3 ##

## = \frac{1}{L^3} \int_{a_3=2m}^{a_3=L} \frac{ (a_3^2 - 4ma_3 + 4m^2)}{2} \ da_3 ##

## = \frac{1}{L^3} |_{a_3=2m}^{a_3=L} \frac {a_3^3}{6} - ma_3^2 + 2 m^2 a_3 ) ##

## = \frac{1}{L^3} ( \frac{ L^3}{6} - mL^2 + 2 m^2 L - \frac{8m^3}{6} + 4m^3 - 4m^3)##

## = \frac{1}{6} - \frac{m}{L} + 2 ( \frac{m}{L})^2 - \frac{8 (\frac{m}{L})^3}{6} ##

Let ##\frac{m}{L} = r#### = \frac{1}{6} - r + 2r^2 - \frac{8r^3}{6}##

Since there are 6 = 3! orderings of the three variables ##x,y,z##, the total probability is 6 times the above result which is ##1 - 6r + 12r^2 -8 r^3 = T(r) ##

Checks:

As ## r \rightarrow 0 ##, ## T(r) \rightarrow 1##.

As ## r \rightarrow 1/2 ##, ##T(r) \rightarrow 0 ##. (A segment of length L containing 3 equally spaced points with distance m between them must be at least 2m long )
 

FAQ: Calculating Probability of N Points on a Line Being Within Given Distance

What is the basic concept behind calculating the probability of N points on a line being within a given distance?

The basic concept involves determining the likelihood that any two of N randomly placed points on a line segment will be within a specified distance of each other. This typically requires understanding the distribution of points and applying principles of probability and geometry.

How do you set up the problem mathematically?

Mathematically, you define the length of the line segment (usually normalized to 1 for simplicity) and the distance threshold. You then consider the positions of the N points as random variables uniformly distributed along the segment. The problem often involves integrating over the possible configurations of these points to find the probability that the distance between any two points is less than the given threshold.

What methods are commonly used to calculate this probability?

Common methods include analytical approaches using combinatorics and integration, as well as numerical simulations. Analytical methods might involve calculating the expected number of pairs of points within the given distance, while numerical methods could involve Monte Carlo simulations to estimate the probability empirically.

How does the number of points N affect the probability?

As the number of points N increases, the probability that at least one pair of points will be within the given distance generally increases. This is because there are more pairs of points to consider, increasing the likelihood that at least one pair meets the distance criterion.

Are there any practical applications of this probability calculation?

Yes, there are several practical applications, including telecommunications (e.g., signal strength analysis), materials science (e.g., distribution of particles within a material), and ecology (e.g., distribution of species within a habitat). Understanding these probabilities can help in optimizing designs and making informed decisions in various fields.

Back
Top