Écart maximal entre les échantillons prélevés sans remplacement à partir d'une distribution uniforme discrète

Ce problème est lié aux recherches de mon laboratoire sur la couverture robotique:

Tirez au hasard $n$ nombres de l'ensemble $\{1,2,\ldots,m\}$ sans remplacement et triez les nombres dans l'ordre croissant. $1\le n\le m$ .

À partir de cette liste triée de nombres $\{a_{(1)},a_{(2)},…,a_{(n)}\}$ , générez la différence entre les nombres consécutifs et les limites: $g = \{a_{(1)},a_{(2)}−a_{(1)},\ldots,a_{(n)}−a_{(n-1)},m+1-a_{(n)}\}$ . Cela donne $n+1$ lacunes.

Quelle est la distribution de l'écart maximum?

$P(\max(g) = k) = P(k;m,n) = ?$

Cela peut être encadré à l'aide des statistiques de commande : $P(g_{(n+1)} = k) = P(k;m,n) = ?$

Voir le lien pour la répartition des écarts , mais cette question demande la répartition de l' écart maximal .

Je serais satisfait de la valeur moyenne, $\mathbb{E}[g_{(n+1)}]$ .

Si $n=m$ tous les espaces sont de taille 1. Si $n+1 = m$ il y a un espace de taille $2$ et $n+1$ emplacements possibles. La taille maximale de l'espace est $m-n+1$ , et cet espace peut être placé avant ou après n'importe lequel des $n$ nombres, pour un total de $n+1$ positions possibles. La plus petite taille d'espace maximale est $\lceil\frac{m-n}{n+1}\rceil$ . Définir la probabilité d'une combinaison donnée. $T= {m \choose n}^{-1}$

J'ai partiellement résolu la fonction de masse de probabilité comme $P(g_{(n+1)} = k) = P(k;m,n) = \begin{cases} 0 & k < \lceil\frac{m-n}{n+1}\rceil\\ 1 & k = \frac{m-n}{n+1} \\ 1 & k = 1 \text{ (occurs when $m=n$)} \\ T(n+1)& k = 2 \text{ (occurs when $m=n+1$)} \\ T(n+1)& k = \frac{m-(n-1)}{n} \\ ? & \frac{m-(n-1)}{n} \le k \le m-n+1 \\ T(n+1)& k = m-n+1\\ 0 & k > m-n+1 \end{cases} \tag{1}$

Travail en cours (1): L'équation pour le premier écart, est simple: $a_{(1)}$ La valeur attendue a une valeur simple:

P (a_{(1)} = k) = P (k; m, n) = \frac{1}{(\binom{m}{n})} \sum_{k = 1}^{m - n + 1} (\binom{m - k - 1}{n - 1})

$P(a_{(1)} = k) = P(k;m,n) = \frac{1}{{m \choose n}} \sum_{k=1}^{m-n+1} {m-k-1 \choose n-1}$

. Par symétrie, je m'attends à ce que toutes les

lacunes aient cette distribution. Peut-être la solution pourrait-elle être trouvée en tirant de cette distribution

fois.

E [P (a_{(1)})] = \frac{1}{(\binom{m}{n})} \sum_{k = 1}^{m - n + 1} (\binom{m - k - 1}{n - 1}) k = \frac{m - n}{1 + n}

$\mathbb{E}[P(a_{(1)})] = \frac{1}{ {m \choose n}} \sum_{k=1}^{m-n+1} {m-k-1 \choose n-1} k = \frac{m-n}{1+n}$

n

$n$

n

$n$

Travaux en cours (2): il est facile d'exécuter des simulations Monte Carlo.

simMaxGap[m_, n_] := Max[Differences[Sort[Join[RandomSample[Range[m], n], {0, m+1}]]]];
m = 1000; n = 1; trials = 100000;
SmoothHistogram[Table[simMaxGap[m, n], {trials}], Filling -> Axis,
Frame -> {True, True, False, False},
FrameLabel -> {"k (Max gap)", "Probability"},
PlotLabel -> StringForm["m=``,n=``,smooth histogram of maximum map for `` trials", m, n, trials]][![enter image description here][1]][1]

— AaronBecker
source

Dans ces conditions, vous devez avoir n <= m. Je pense que vous voulez g = {a_ (1), a_ (2) -a_ (1), ..., a_ (n) -a_ (n-1)}. Est-ce que sélectionner au hasard signifie sélectionner chaque nombre avec une probabilité de 1 / m lors du premier tirage? Puisque vous ne remplacez pas la probabilité serait de 1 / (m-1) sur le second et ainsi de suite jusqu'à 1 sur le mième tirage si n = m. Si n <m, cela s'arrêterait plus tôt avec le dernier tirage ayant une probabilité 1 / (m- (n-1)) sur le nième tirage.

— Michael R. Chernick

Your original description of

g

$g$ made no sense, because (I believe) you transposed two of the subscripts. Please verify that my edit conforms with your intention: in particular, please confirm that you mean for there to be

n

$n$ gaps, of which

a_{(1)}

$a_{(1)}$ is the first.

— whuber

@gung I think this is research, rather than self-study

— Glen_b -Reinstate Monica

I think your minimum and maximum gap sizes should be

1

$1$ and

m - n + 1

$m-n+1$ . The minimum gap size is when consecutive integers are chosen, and the maximum gap size occurs when you select

m

$m$ and

n - 1

$n-1$ first integers

1, \dots, n - 1

$1,\dots,n-1$ (or

1

$1$ and

m - n + 2, \dots, m

$m-n+2,\dots,m$ )

— probabilityislogic

Thank you Michael Chernick and probabilityislogic, your corrections have been made. Thank you @whuber for making the correction!

— AaronBecker

$f(g;n,m)$ $a_{(1)}$ $g$ $g$ $n-1$ $\{g+1,g+2,\ldots,m\}$ $\binom{m-g}{n-1}$ such subsets out of the $\binom{m}{n}$ equally likely subsets, whence

Pr (a_{(1)} = g = f (g; n, m) = \frac{(\binom{m - g}{n - 1})}{(\binom{m}{n})} .

$\Pr(a_{(1)}=g = f(g;n,m) = \frac{\binom{m-g}{n-1}}{\binom{m}{n}}.$

Adding $f(k;n,m)$ for all possible values of $k$ greater than $g$ yields the survival function

Pr (a_{(1)} > g) = Q (g; n, m) = \frac{(m - g) (\binom{m - g - 1}{n - 1})}{n (\binom{m}{n})} .

$\Pr(a_{(1)} \gt g) = Q(g;n,m)= \frac{(m-g)\binom{m-g-1}{n-1}}{n \binom{m}{n}}.$

Let $G_{n,m}$ be the random variable given by the largest gap:

G_{n, m} = max (a_{(1)}, a_{(2)} - a_{(1)}, \dots, a_{(n)} - a_{(n - 1)}) .

$G_{n,m} = \max\left(a_{(1)}, a_{(2)}-a_{(1)}, \ldots, a_{(n)}-a_{(n-1)}\right).$

(This responds to the question as originally framed, before it was modified to include a gap between $a_{(n)}$ and $m$ .) We will compute its survival function

P (g; n, m) = Pr (G_{n, m} > g),

$P(g;n,m)=\Pr(G_{n,m}\gt g),$ from which the entire distribution of

G_{n, m}

$G_{n,m}$ is readily derived. The method is a dynamic program beginning with

n = 1

$n=1$ , for which it is obvious that

\begin{matrix} (1) & P (g; 1, m) = Pr (G_{1, m} > 1) = \frac{m - g}{m}, g = 0, 1, \dots, m . \end{matrix}

$P(g;1,m) = \Pr(G_{1,m} \gt 1) = \frac{m-g}{m},\ g=0, 1, \ldots, m.\tag{1}$

For larger $n\gt 1$ , note that the event $G_{n,m}\gt g$ is the disjoint union of the event

a_{1} > g,

$a_{1} \gt g,$

for which the very first gap exceeds $g$ , and the $g$ separate events

a_{1} = k and G_{n - 1, m - k} > g, k = 1, 2, \dots, g

$a_{1}=k\text{ and } G_{n-1,m-k} \gt g, \ k=1, 2, \ldots, g$

for which the first gap equals $k$ and a gap greater than $g$ occurs later in the sample. The Law of Total Probability asserts the probabilities of these events add, whence

\begin{matrix} (2) & P (g; n, m) = Q (g; n, m) + \sum_{k = 1}^{g} f (k; n, m) P (g; n - 1, m - k) . \end{matrix}

$P(g;n,m) = Q(g;n,m) + \sum_{k=1}^g f(k;n,m) P(g;n-1,m-k).\tag{2}$

Fixing $g$ and laying out a two-way array indexed by $i=1,2,\ldots,n$ and $j=1,2,\ldots,m$ , we may compute $P(g;n,m)$ by using $(1)$ to fill in its first row and $(2)$ to fill in each successive row using $O(gm)$ operations per row. Consequently the table can be completed in $O(gmn)$ operations and all tables for $g=1$ through $g=m-n+1$ can be constructed in $O(m^3n)$ operations.

These graphs show the survival function $g\to P(g;n,64)$ for $n=1,2,4,8,16,32,64$ . As $n$ increases, the graph moves to the left, corresponding to the decreasing chances of large gaps.

Closed formulas for $P(g;n,m)$ can be obtained in many special cases, especially for large $n$ , but I have not been able to obtain a closed formula that applies to all $g,n,m$ . Good approximations are readily available by replacing this problem with the analogous problem for continuous uniform variables.

Finally, the expectation of $G_{n,m}$ is obtained by summing its survival function starting at $g=0$ :

E (G_{n, m}) = \sum_{g = 0}^{m - n + 1} P (g; n, m) .

$\mathbb{E}(G_{n,m}) = \sum_{g=0}^{m-n+1} P(g;n,m).$

This contour plot of the expectation shows contours at $2, 4, 6, \ldots, 32$ , graduating from dark to light.

— whuber
source

Suggestion: line "Let

G_{n, m}

$G_{n,m}$ be the random variable given by the largest gap:", please add the last gap of

m + 1 - a_{n}

$m+1-a_{n}$ . Your expectation plot matches my Monte Carlo simulation.

— AaronBecker