Efficiency is an asymptotic criterion. Asymptotic notation for program execution times

To describe asymptotic estimates there is a notation system:

§ They say that f(n)= O(g(n)), if there is a constant c>0 and a number n0 such that the condition 0≤f(n)≤c*g(n) is satisfied for all n≥n0. More formally:

(()) { () | 0, } 0 0 O g n= fn$c> $n"n> n£ fn£ cg n

O(g(n)) is used to indicate functions that are no more than a constant number of times greater than g(n), this variant is used to describe upper bounds (in the sense of “no worse than”). When we are talking about a specific algorithm for solving a specific problem, the goal of analyzing the time complexity of this algorithm is to obtain an estimate for the time at worst or on average, usually an asymptotic estimate from above O(g(n)), and, if possible, an asymptotically lower estimate for W(g(n)), and even better, an asymptotically exact estimate for Q(g(n)).

But the question remains: could there be even better solution algorithms for this problem? This question poses the problem of finding a lower estimate of the time complexity for the problem itself (for all possible algorithms for solving it, and not for one of the known algorithms for solving it). The issue of obtaining nontrivial lower bounds is very difficult. To date, there are not many such results, but non-trivial lower bounds have been proven for some limited computer models, and some of them play an important role in practical programming. One of the problems for which a lower bound for time complexity is known is the sorting problem:

§ Given a sequence of n elements a1,a2,... an, selected from the set on which the linear order is specified.

§ It is required to find a permutation p of these n elements that will map the given sequence into a non-decreasing sequence ap(1),ap(2),... ap(n), i.e. ap(i)≤ap(i+1) for 1≤i mixing method . Let us have two problems A and B, which are related in such a way that problem A can be solved as follows:

1) The source data for task A is converted into the corresponding source data

data for task B.

2) Problem B is being solved.

3) The result of solving problem B is converted into the correct solution to problem A.__ In this case we say that task A reducible to the problem B. If steps (1) and (3) above can be completed in time O(t(n)), where, as usual, n is the 25 “volume” of task A, then we say that A t (n)-reducible to B, and write it like this: A μt (n) B. Generally speaking, reducibility is not a symmetric relation; in the special case when A and B are mutually reducible, we call them equivalent. The following two self-evident statements characterize the power of the reduction method under the assumption that this reduction preserves the order of the “scope” of the problem.

"O" big And "o" small( and ) - mathematical notations for comparing the asymptotic behavior of functions. They are used in various branches of mathematics, but most actively in mathematical analysis, number theory and combinatorics, as well as in computer science and the theory of algorithms.

, « O small of " means "infinitesimal relative to " [, a negligible quantity when considered. The meaning of the term “O big” depends on its field of application, but always grows no faster than, “ O large from "(exact definitions are given below).

In particular:

Continued 7

the phrase “the complexity of the algorithm is” means that with an increase in the parameter characterizing the amount of input information of the algorithm, the operating time of the algorithm cannot be limited to a value that grows more slowly than n!;

the phrase “the function is “about” small of the function in the neighborhood of the point” means that as k approaches it decreases faster than (the ratio tends to zero).

Sum Rule: Let a finite set M be divided into two disjoint subsets M 1 and M 2 (in union giving the entire set M). Then the power |M| = |M 1 | + |M 2 |.

Product rule: Let object a in a certain set be selected in n ways, and after that (that is, after choosing object a) object b can be selected in m ways. Then the object ab can be selected in n*m ways.

Comment: Both rules allow inductive generalization. If a finite set M admits a partition into r pairwise disjoint subsets M 1 , M 2 ,…,M r , then the cardinality |M| = |M 1 |+|M 2 |+…+|M r |. If object A 1 can be selected in k 1 ways, then (after object A 1 is selected) object A 2 can be selected in k 2 ways, and so on and finally, object AR can be selected in k ways, then object A 1 A 2 ... And r can be chosen in k 1 k 2 …k r ways.

As noted in the previous section, the study of classical algorithms in many cases can be carried out using asymptotic methods of mathematical statistics, in particular using CLT and methods of inheritance of convergence. The separation of classical mathematical statistics from the needs of applied research is manifested, in particular, in the fact that widespread monographs lack the mathematical apparatus necessary, in particular, for the study of two-sample statistics. The point is that you have to go to the limit not by one parameter, but by two - the volumes of two samples. We had to develop an appropriate theory - the theory of inheritance of convergence, set out in our monograph.

However, the results of such a study will have to be applied to finite sample sizes. A whole bunch of problems arise associated with such a transition. Some of them were discussed in connection with the study of the properties of statistics constructed from samples from specific distributions.

However, when discussing the impact of deviations from initial assumptions on the properties of statistical procedures, additional problems arise. What deviations are considered typical? Should we focus on the most “harmful” deviations that most distort the properties of algorithms, or should we focus on “typical” deviations?

With the first approach, we get a guaranteed result, but the “price” of this result may be too high. As an example, let us point out the universal Berry-Esseen inequality for the error in the CLT. A.A. absolutely rightly emphasizes. Borovkov that “the speed of convergence in real problems, as a rule, turns out to be better.”

With the second approach, the question arises of which deviations are considered “typical”. You can try to answer this question by analyzing large amounts of real data. It is quite natural that the answers of different research groups will differ, as can be seen, for example, from the results given in the article.

One of the false ideas is to use only a specific parametric family when analyzing possible deviations - the Weibull-Gnedenko distributions, the three-parameter family of gamma distributions, etc. Back in 1927, Acad. USSR Academy of Sciences S.N. Bernstein discussed the methodological error of reducing all empirical distributions to the four-parameter Pearson family. However, parametric methods of statistics are still very popular, especially among applied scientists, and the blame for this misconception lies primarily with teachers of statistical methods (see below, as well as the article).

15. Selecting one of many criteria to test a specific hypothesis

In many cases, many methods have been developed to solve a specific practical problem, and a specialist in mathematical research methods is faced with the problem: which one should be offered to the applied scientist for analyzing specific data?

As an example, consider the problem of testing the homogeneity of two independent samples. As you know, to solve it, you can offer a lot of criteria: Student, Cramer-Welch, Lord, chi-square, Wilcoxon (Mann-Whitney), Van der Waerden, Savage, N.V. Smirnov, omega-square type (Lehman -Rozenblatt), G.V. Martynov, etc. Which one to choose?

The idea of ​​“voting” naturally comes to mind: to check against many criteria and then make a decision “by majority vote”. From the point of view of statistical theory, such a procedure simply leads to the construction of another criterion, which is a priori no better than the previous ones, but more difficult to study. On the other hand, if the solutions coincide according to all considered statistical criteria based on different principles, then, in accordance with the concept of stability, this increases confidence in the resulting general solution.

There is a widespread, especially among mathematicians, false and harmful opinion about the need to search for optimal methods, solutions, etc. The fact is that optimality usually disappears when you deviate from the initial premises. Thus, the arithmetic mean as an estimate of the mathematical expectation is optimal only when the initial distribution is normal, while it is always a valid estimate, as long as the mathematical expectation exists. On the other hand, for any arbitrarily chosen method of estimation or testing of hypotheses, it is usually possible to formulate the concept of optimality in such a way that the method in question becomes optimal - from this specially chosen point of view. Let's take, for example, the sample median as an estimate of the mathematical expectation. It is, of course, optimal, although in a different sense than the arithmetic mean (optimal for a normal distribution). Namely, for the Laplace distribution, the sample median is the maximum likelihood estimate, and therefore optimal (in the sense specified in the monograph).

The homogeneity criteria were analyzed in the monograph. There are several natural approaches to comparing criteria - based on asymptotic relative efficiency according to Bahadur, Hodges-Lehman, Pitman. And it turned out that each criterion is optimal given the corresponding alternative or suitable distribution on the set of alternatives. In this case, mathematical calculations usually use the shift alternative, which is relatively rare in the practice of analyzing real statistical data (in connection with the Wilcoxon test, this alternative was discussed and criticized by us in). The result is sad - the brilliant mathematical technique demonstrated in does not allow us to give recommendations for choosing a criterion for testing homogeneity when analyzing real data. In other words, from the point of view of the application worker’s work, i.e. analysis of specific data, the monograph is useless. The brilliant mastery of mathematics and the enormous diligence demonstrated by the author of this monograph, alas, brought nothing to practice.

Of course, every practically working statistician, in one way or another, solves for himself the problem of choosing a statistical criterion. Based on a number of methodological considerations, we chose the omega-square (Lehman-Rosenblatt) criterion, which is consistent with any alternative. However, there remains a feeling of dissatisfaction due to the lack of justification for this choice.

The asymptotic behavior (or asymptotics) of a function in the vicinity of a certain point a (finite or infinite) is understood as the nature of the change of the function as its argument x tends to this point. They usually try to represent this behavior using another, simpler and studied function, which in the vicinity of point a with sufficient accuracy describes the change in the function we are interested in or evaluates its behavior from one side or another. In this regard, the problem arises of comparing the nature of the change of two functions in the vicinity of point a, associated with the consideration of their quotient. Of particular interest are cases when, for x a, both functions are either infinitesimal (infinitesimal) or infinitely large (infinitely large). 10.1. Comparison of infinitesimal functions The main purpose of comparing b.m. functions consists in comparing the nature of their approach to zero at x a, or the speed of their approach to zero. Let b.m. for x a the functions a(i) and P(x) are non-zero in some punctured neighborhood (a) of the point a, and at the point a they are equal to zero or are not defined. Definition 10.1. The functions a(x) and 0(x) are called b.m. of the same order for a and write og(a:) = in O (/?(«)) (the symbol O is read “O big”), if at x a there is a non-zero finite limit of the ratio a(x)//?( i), i.e. Obviously, then, according to (7.24), Βi € R\(0), and the notation X^a0[a(x) is valid).The symbol O has the property of transitivity, i.e. if - in fact, taking into account Definition 10.1 and the property of the product of functions (see (7.23)) having finite (in this case non-zero) limits, we obtain ASYMPTOTIC BEHAVIOR OF FUNCTIONS. Comparison of infinitesimal functions. Definition 10.2. Function a(x) call a bm of a higher order of smallness compared to (3(x) (or relative to /3(x)) for x a and write) (the symbol o is read io small if the limit of the ratio a exists and is equal to zero. In this case also a function is said to be of a lower order of smallness compared to a(x) for x a, and the word smallness is usually omitted (as in the case of a higher order in Definition 10.2). This means that if lim (then the function /)(x) is, according to Definition 10.2, b.m. higher order compared to a(x) for x a and a(i) are b.m. lower order compared to /3(x) for x a, because in this case lijTi (fi(x)/ot(x)) . So we can write According to Theorem 7.3 about the connection between a function, its limit and b.m. functions from (10.3) it follows that ot) is a function, b.m. at. Hence a(x), i.e. values ​​|a(z)| for x close to a, the values ​​of \0(x)\ are much smaller. In other words, the function a(x) tends to zero faster than the function /?(x). Theorem 10.1. The product of any b.m. for x a the functions a(x) and P(x)) are different from zero in some punctured neighborhood of the point a, there are for x-¥a b.m. a function of higher order compared to each of the factors. Indeed, according to the definition of 10.2 b.m. higher order (taking into account Definition 7.10 b.m. functions), equalities mean the validity of the theorem. Equalities containing the symbols O and o are sometimes called asymptotic estimates. Definition 10.3. The functions ot(x) and /3(x) are called incomparable b.m. for x -¥ a, if there is neither a finite nor an infinite limit to their ratio, i.e. if $ lim a(x)/0(x) (p £ as well as $ lim 0(x)/a(x)). Example 10.1. A. The functions a(x) = x and /?(x) = sin2ar by definition 10.1 - b.m. of the same order at x 0, since taking into account (b. The function a(x) = 1 -coss, by definition 10.2, is b.m. of a higher order compared to 0(x) = x at x 0, since with taking into account c. The function a(zz) = \/x is a lower order in comparison with fl(x) = x for x 0, since g. The functions a(s) = = x according to Definition 10.3 are incomparable b.m. at x 0, since the limit ASYMPTOTIC BEHAVIOR OF FUNCTIONS Comparison of infinitesimal functions does not exist (neither finite nor infinite - see example 7.5).The power function x11 with exponent n 6 N, n > 1, is at x and b.m. of a higher order compared to xn~1) i.e. yapa = ao(a:n"*1), since lim (xL/xn"1) = If a more accurate comparative description of the behavior of b.m. is necessary. functions for x - and one of them is chosen as a kind of standard and called the main one. Of course, the choice of the main b.m. to a certain extent arbitrary (they try to choose a simpler one: x for x -*0; x-1 for x -41; 1/x for x ->oo, etc.). From the degrees 0k(x) the main b.m. functions /)(x) with different exponents k > 0 (for k ^ 0 0k(x) is not a b.m.) form a comparison tie for estimating a more complex b.m. functions a(z). Definition 10.4. The function a(z) is called b.m. kth order of smallness relative to (3(x) for x a, and the number k is of order of smallness if the functions a(z) and /Zk(x) are of the same order for x a) i.e. if the word “smallness” is usually omitted in this case. Note: 1) the order k of one b.m. function relative to another can be any positive number; 2) if the order of the function a(x) relative to /3(x) is equal to k, then the order of the function P(x) with respect to a(x) is equal to 1/k; 3) not always for the bm function a(x), even comparable to all powers of /? *(x), you can specify a specific order k. Example 10.2. A. Function cosx, according to definition 10.4, - b.m. order k = 2 relative to 0(x) = x for x 0, since taking into account b. Let's look at the functions. Let us show that for any Indeed, according to (7.32). Thus, b.m. for x -»+0 the function a1/1 is comparable to xk for any k > 0, but for this function it is not possible to indicate the order of smallness with respect to x. # Determine the order of one b.m. functions relative to another is not always easy. We can recommend the following procedure: 1) write the relation a(x)/0k(x) under the limit sign; 2) analyze the written relation and try to simplify it; 3) based on known results, make an assumption about the possible value of k) at which a non-zero finite limit will exist; 4) check the assumption by calculating the limit. Example 10.3. Let us determine the order of b.m. functions tgx - sin x relative to x for x -» 0, i.e. Let's find a number k > O such that we have ASYMPTOTIC BEHAVIOR OF THE FUNCTIONS. Comparison of infinitesimal functions. At this stage, knowing that for x 0, according to (7.35) and (7.36), (sinx)/x 1 and cosx -> 1, and taking into account (7.23) and (7.33), we can determine that condition (10.7) will be fulfilled at k = 3. Indeed, direct calculation of the limit at k = 3 gives the value A = 1/2: Note that for k > 3 we obtain an infinite limit, and at the limit it will be equal to zero.

1 Entropy and information distance

1.1 Basic definitions and notations.

1.2 Entropy of discrete distributions with limited mathematical expectation.

1.3 Logarithmic generalized metric on a set of discrete distributions.

1.4 Compactness of functions with a countable set of arguments

1.5 Continuity of information distance Kullback - Leibler - Sanov

1.6 Conclusions.

2 Probabilities of large deviations

2.1 Probabilities of large deviations of functions from the number of cells with a given filling.

2.1.1 Local limit theorem.

2.1.2 Integral limit theorem.

2.1.3 Information distance and probabilities of large deviations of separable statistics

2.2 Probabilities of large deviations of separable statistics that do not satisfy the Cramer condition.

2.3 Conclusions.

3 Asymptotic properties of goodness-of-fit criteria

3.1 Consent criteria for selection without return scheme

3.2 Asymptotic relative efficiency of goodness-of-fit criteria.

3.3 Criteria based on the number of cells in general layouts.

3.4 Conclusions.

Recommended list of dissertations

  • Asymptotic efficiency of goodness-of-fit tests based on characterization properties of distributions 2011, Candidate of Physical and Mathematical Sciences Volkova, Ksenia Yurievna

  • Large deviations and limit theorems for some random walk functionals 2011, candidate of physical and mathematical sciences Shklyaev, Alexander Viktorovich

  • Limit theorems and large deviations for random walk increments 2004, candidate of physical and mathematical sciences Kozlov, Andrey Mikhailovich

  • On the rate of convergence of statistics of goodness-of-fit tests with power measures of divergence to the chi-square distribution 2010, candidate of physical and mathematical sciences Zubov, Vasily Nikolaevich

  • Probabilities of large deviations of asymptotically homogeneous ergodic Markov chains in space 2004, Doctor of Physical and Mathematical Sciences Korshunov, Dmitry Alekseevich

Introduction of the dissertation (part of the abstract) on the topic “Asymptotic properties of goodness-of-fit criteria for testing hypotheses in a selection scheme without returning, based on filling cells in a generalized placement scheme”

Object of research and relevance of the topic. In the theory of statistical analysis of discrete sequences, a special place is occupied by goodness-of-fit tests for testing a possibly complex null hypothesis, which is that for a random sequence such that

Xi e hi,i = 1, ,n, where hi = (0,1,. ,M), for any i = 1,.,n, and for any k £ 1m the probability of the event

Xi = k) does not depend on r. This means that the sequence is in some sense stationary.

In a number of applied problems, the sequence (Xr-)™ = 1 is considered to be a sequence of colors of balls when choosing without returning until exhaustion from an urn containing n - 1 > 0 balls of color k, k € 1m - We will denote the set of such selections O(n0 - 1, .,pm - 1). Let there be a total of n - 1 balls in the urn, m k=0

Let us denote by r(k) (fc) Jk) rw - Г! , . . . , sequence of numbers of balls of color A; in the sample. Consider the sequence where k)

Kk-p-GPk1.

The sequence h^ is defined using the distances between the locations of adjacent balls of color k in such a way that

Pk Kf = p. 1>=1

The set of sequences h(fc) for all k £ 1m uniquely determines the sequence. Sequences hk for different k are dependent on each other. In particular, any one of them is uniquely determined by all the others. If the cardinality of the set 1m is 2, then the sequence of colors of balls is uniquely determined by the sequence of distances between the places of neighboring balls of the same fixed color. Let there be N - 1 balls of color 0 in an urn containing n - 1 balls of two different colors. We can establish a one-to-one correspondence between the set ffl(N- l,n - N) and the set 9\n,N vectors h(n, N ) = (hi,., hjf) with positive integer components such that K = P. (0.1)

The set 9П)дг corresponds to the set of all different partitions of a positive integer n into N ordered terms.

Having specified a certain probability distribution on the set of vectors £Hn,dr, we obtain the corresponding probability distribution on the set Wl(N - 1,n - N). A set is a subset of a set of vectors with non-negative integer components satisfying (0.1). Distributions of the form will be considered as probability distributions on a set of vectors in the dissertation work

P(%,N) = (n,.,rN)) = P(£„ = ru,v = l,.,N\jr^ = n), (0.2) where. ,£dr - independent non-negative integer random variables.

Distributions of the form (0.2) in /24/ are called generalized schemes for placing n particles in N cells. In particular, if the random variables £b. ,£лг in (0.2) are distributed according to Poisson’s laws with parameters Ai,., Лдг respectively, then the vector h(n,N) has a polynomial distribution with the probabilities of outcomes

Ri = . , L" ,V = \,.,N.

L\ + . . . + AN

If the random variables £ь >&v in (0-2) are identically distributed according to the geometric law where p is any in the interval 0< р < 1, то, как отмечено в /25/,/26/, получающаяся обобщенная схема размещения соответствует равномерному распределению на множестве В силу взаимнооднозначного соответствия между множеством dft(N - 1 ,п - N) и множеством tRn,N получаем равномерное распределение на множестве выборов без возвращения. При этом, вектору расстояний между местами шаров одного цвета взаимно однозначно соответствует вектор частот в обобщенной схеме размещения, и, соответственно, числу расстояний длины г - число ячеек, содержащих ровно г частиц. Для проверки по единственной последовательности гипотезы о том, что она получена как результат выбора без возвращения, и каждая такая выборка имеет одну и ту же вероятность можно проверить гипотезу о том, что вектор расстояний между местами шаров цвета 0 распределен как вектор частот в соответствующей обобщенной схеме размещения п частиц по N ячейкам.

As noted in /14/, /38/, a special place in testing hypotheses about the distribution of frequency vectors h(n, N) = (hi,., /gdr) in generalized schemes for placing n particles in N cells is occupied by criteria based on based on statistics of the form 1 m(N -l,n-N)\ N

LN(h(n,N))=Zfv(hv)

Фн = Ф(-Т7, flQ Hi II-

0.4) where fu, v = 1,2,. and φ - some real-valued functions, N

Mr = E = r), r = 0.1,. 1/=1

The quantities in /27/ were called the number of cells containing exactly g particles.

Statistics of the form (0.3) in /30/ are called separable (additively separable) statistics. If the functions /„ in (0.3) do not depend on u, then such statistics were called in /31/ symmetric separable statistics.

For any r the statistic /xr is a symmetric separable statistic. From equality

E DM = E DFg (0.5) it follows that the class of symmetric separable statistics of hv coincides with the class of linear functions of fir. Moreover, the class of functions of the form (0.4) is wider than the class of symmetric separable statistics.

But = (#o(n, N)) is a sequence of simple null hypotheses that the distribution of the vector h(n, N) is (0.2), where the random variables are,. in (0.2) are identically distributed and k) = pk,k = 0,1,2,., the parameters n, N change in the central region.

Consider some P £ (0,1) and a sequence of, generally speaking, complex alternatives

H = (H(n, N)) such that exists - the maximum number for which, for any simple hypothesis H\ € H(n, N), the inequality holds

РШ > an,N(P)) > Р

We will reject the hypothesis Hq(ti,N) if fm > asm((3). If there is a limit

Шп ~1пР(0н > an,N(P))=u(p,Н), where the probability for each N is calculated under the hypothesis Нк(п, N), then the value ^(/З, Н) is named in /38/ index of the criterion φ at the point (j3, H). The last limit may, generally speaking, not exist. Therefore, in the dissertation work, in addition to the criterion index, the value is considered

Ish (~1pP(fm > al(/?)))

JV->oo N-ooo mean, respectively, the lower and upper limits of the sequence (odg) for N -> oo,

If a criterion index exists, then the criterion's subscript coincides with it. The lower index of the criterion always exists. The higher the value of the criterion index (subscript of the criterion), the better the statistical criterion in this sense. In /38/, the problem of constructing goodness-of-fit criteria for generalized layouts with the highest value of the criterion index in the class of criteria that reject the hypothesis Ho(n,N) at /MO Ml Mt MS iV" iV""""" ~yv" " was solved ^ "where m > 0 is some fixed number, the sequence of constant edg is selected based on the given value of the power of the criterion for the sequence of alternatives, ft is a real function of m + 1 arguments.

The criterion indices are determined by the probabilities of large deviations. As was shown in /38/, the rough (up to logarithmic equivalence) asymptotics of the probabilities of large deviations of separable statistics when the Cramer condition is satisfied for the random variable /(ξ) is determined by the corresponding Kull-Bak-Leibler-Sanov information distance (the random variable rj satisfies the condition Cramer, if for some R > 0 the generating function of the moments Metr] is finite in the interval \t\< Н /28/).

The question of the probabilities of large deviations of statistics from an unlimited number of fir, as well as arbitrary separable statistics that do not satisfy the Cramer condition, remained open. This did not allow us to finally solve the problem of constructing criteria for testing hypotheses in generalized placement schemes with the highest rate of tending to zero of the probability of an error of the first kind with non-approaching alternatives in the class of criteria based on statistics of the form (0.4). The relevance of the dissertation research is determined by the need to complete the solution to the specified problem.

The purpose of the dissertation work is to construct goodness-of-fit criteria with the highest value of the criterion index (subscript of the criterion) for testing hypotheses in a selection scheme without return in the class of criteria that reject the hypothesis U(n, N) for $.<>,■ ■)><*. (0-7) где ф - функция от счетного количества аргументов, и параметры п, N изменяются в центральной области.

In accordance with the purpose of the study, the following tasks were set:

Investigate the properties of entropy and information distance Kull-Bak - Leibler - Sanov for discrete distributions with a countable number of outcomes;

Investigate the probabilities of large deviations of statistics of the form (0.4);

Investigate the probabilities of large deviations of symmetric separable statistics (0.3) that do not satisfy the Cramer condition;

Find a statistic such that the goodness-of-fit criterion constructed on its basis for testing hypotheses in generalized placement schemes has the highest index value in the class of criteria of the form (0.7).

Scientific novelty:

Scientific and practical value. The work solves a number of questions about the behavior of the probabilities of large deviations in generalized placement schemes. The results obtained can be used in the educational process in the specialties of mathematical statistics and information theory, in the study of statistical procedures for the analysis of discrete sequences, and were used in /3/, /21/ to justify the security of one class of information systems. Provisions for defense:

Reducing the problem of testing for a single sequence of colors of balls the hypothesis that this sequence is obtained as a result of a choice without returning until the balls are exhausted from an urn containing balls of two colors, and each such choice has the same probability, to the construction of goodness-of-fit criteria for testing hypotheses in the corresponding generalized layout;

Continuity of the entropy and Kullback-Leibler-Sanov information distance functions on an infinite-dimensional simplex with the introduced logarithmic generalized metric;

A theorem on the rough (up to logarithmic equivalence) asymptotics of the probabilities of large deviations of symmetric separable statistics that do not satisfy the Cramer condition in the generalized placement scheme in the semi-exponential case;

Theorem on rough (up to logarithmic equivalence) asymptotics of the probabilities of large deviations for statistics of the form (0.4);

Construction of a goodness-of-fit criterion for testing hypotheses in generalized layouts with the highest index value in the class of criteria of the form (0.7).

Approbation of work. The results were presented at seminars of the Department of Discrete Mathematics of the Mathematical Institute named after. V. A. Steklov RAS, information security department of ITM&VT named after. S. A. Lebedev RAS and at:

Fifth All-Russian Symposium on Applied and Industrial Mathematics. Spring session, Kislovodsk, May 2 - 8, 2004;

Sixth International Petrozavodsk Conference "Probabilistic methods in discrete mathematics" June 10 - 16, 2004;

Second International Conference "Information Systems and Technologies (IST" 2004)", Minsk, November 8-10, 2004;

International conference "Modern Problems and new Trends in Probability Theory", Chernivtsi, Ukraine, June 19 - 26, 2005.

The main results of the work were used in the research work "Apology", carried out by ITMiVT RAS. S. A. Lebedev in the interests of the Federal Service for Technical and Export Control of the Russian Federation, and were included in the report on the implementation of the research stage /21/. Some results of the dissertation were included in the research report "Development of mathematical problems of cryptography" of the Academy of Cryptography of the Russian Federation for 2004 /22/.

The author expresses deep gratitude to the scientific supervisor, Doctor of Physical and Mathematical Sciences A. F. Ronzhin and the scientific consultant, Doctor of Physical and Mathematical Sciences, Senior Researcher A. V. Knyazev. The author expresses gratitude to Doctor of Physical and Mathematical Sciences, Professor A. M. Zubkov and Candidate of Physical and Mathematical Sciences Mathematical Sciences I. A. Kruglov for his attention to the work and a number of valuable comments.

Structure and content of the work.

The first chapter examines the properties of entropy and information distance for distributions on the set of non-negative integers.

In the first paragraph of the first chapter, notations are introduced and the necessary definitions are given. In particular, the following notation is used: x = (xq,x\, . ) - an infinite-dimensional vector with a countable number of components;

H(x) - -Ex^oXvlnx,-, truncm(x) = (x0,x1,.,xm,0,0,.)] f2* = (x, xi > 0, zy = 0.1,. , Oh"< 1}; Q = {х, х, >0,u = 0.1,., o xv = 1); = (x G O, ££L0 = 7);

Ml = o Ue>1|5 € o< Ml - 7МГ1 < 00}. Понятно, что множество £1 соответствует семейству вероятностных распределений на множестве неотрицательных целых чисел, П7 - семейству вероятностных распределений на множестве неотрицательных целых чисел с математическим ожиданием 7.

If y 6E P, then for e > 0 the set will be denoted by Oe(y)

Oe(y) - (x^< уие£ для всех v = 0,1,.}.

In the second paragraph of the first chapter, a theorem on the boundedness of the entropy of discrete distributions with limited mathematical expectation is proved.

Theorem 1. On the boundedness of the entropy of discrete distributions with bounded mathematical expectation.

For any 6 P7

H(x)

If x € fly corresponds to a geometric distribution with a mathematical definition of 7, that is, 7 x„ = (1- р)р\ v = 0.1,., where р = --,

1 + 7 then the equality holds

H(x) = F(<7).

The statement of the theorem can be viewed as the result of a formal application of the Lagrange method of conditional multipliers in the case of an infinite number of variables. The theorem that the only distribution on the set (k, k + 1, k + 2,.) with a given mathematical expectation and maximum entropy is a geometric distribution with a given mathematical expectation is given (without proof) in /47/. The author, however, has given strict proof.

The third paragraph of the first chapter gives the definition of a generalized metric - a metric that allows infinite values.

For x,y € Q the function p(x,y) is defined as the minimal e > O with the property yie~£<хи< уиее для всех и = 0,1,. Если такого е не существует, то полагается, что р(х,у) = оо.

It is proved that the function p(x,y) is a generalized metric on the family of distributions on the set of non-negative integers, as well as on the entire set Cl*. Instead of e in the definition of the metric p(x,y), you can use any other positive number other than 1. The resulting metrics will differ by a multiplicative constant. Let us denote by J(x, y) the information distance

00 £ J(x,y) = E In-.

Here and below it is assumed that 0 In 0 = 0.0 In jj = 0. The information distance is defined for such x, y that x„ = 0 for all and such that y = 0. If this condition is not met, then we will assume J(x,ij) = oo. Let L SP. Then we will denote

J (A Y) = |nf J(x,y).

The fourth paragraph of the first chapter gives the definition of compactness of functions defined on the set Q*. The compactness of a function with a countable number of arguments means that with any degree of accuracy the value of the function can be approximated by the values ​​of this function at points where only a finite number of arguments are non-zero. The compactness of the entropy and information distance functions is proved.

1. For any 0< 7 < оо функция Н(х) компактна на

2. If for some 0< 70 < оо

R e then for any 0<7<оо,г>0 the function x) = J(x,p) is compact on the set

The fifth paragraph of the first chapter discusses the properties of the information distance defined on an infinite-dimensional space. Compared to the finite-dimensional case, the situation with the continuity of the information distance function changes qualitatively. It is shown that the information distance function is not continuous on the set in any of the metrics

Pl&V) = E\Xi~Y»\, u=0

E (xv - Ui)2 v=Q

Рз(х,у) = 8Up\xu-yv\. v

The validity of the following inequalities is proved for the entropy functions H(x) and information distance J(x,p):

1. For any x, x" € fi

N(x) - N(x")\< - 1){Н{х) + Н{х")).

2. If for some x,p e P there exists e > 0 such that x 6 0 £(p), then for any x" £ Q J(x,p) - J(x",p)|< (е"М - 1){Н{х) + Н{х") + ееН(р)).

From these inequalities, taking into account Theorem 1, it follows that the entropy and information distance functions are uniformly continuous on the corresponding subsets of Q in the metric p(x,y)t, namely,

1. For any 7 such that 0< 7 < оо, функция Н(х) равномерно непрерывна на Г2 в метрике р(ж,у);

2. If for some 70, 0< 70 < оо

TO for any 0<7<оои£>0 function

L p(x) = J(x,p) is uniformly continuous on the set Π Oe(p) in the metric p(x,y).

A definition of non-extremal function is given. The non-extremal condition means that the function does not have local extrema, or the function takes the same values ​​at local minima (local maxima). The non-extrema condition weakens the requirement of the absence of local extrema. For example, the function sin x on the set of real numbers has local extrema, but satisfies the non-extremal condition.

Let for some 7 > 0, the region A is given by the condition

A = (x € VLv4>(x) > a), (0.9) where φ(x) is a real-valued function, a is some real constant, inf φ(x)< а < inf ф(х).

The question was studied under what conditions on the function φ when changing the parameters n,N in the central region, ^ -; 7, for all sufficiently large values ​​there are non-negative integers ko, k\,., kn such that k0 + ki + . + kn = N, k\ + 2k2. + control panel - N and

F(ko k\ kp

-£,0,0 ,.)>a.

It is proved that for this it is enough to require that the function φ be non-extremal, compact and continuous in the metric p(x,y), and also that for at least one point x satisfying (0.9), for some e > 0 there exists a finite moment degrees 1 + e and x„ > 0 for any v = 0.1.

In the second chapter, we study the rough (up to logarithmic equivalence) asymptotics of the probability of large deviations of functions from D = (^0) ■ ) Ts "n, 0, .) - the number of cells with a given filling in the central region of change of parameters N, n. Rough The asymptotics of the probabilities of large deviations are sufficient to study the indices of the agreement criteria.

Let the random variables ^ in (0.2) be identically distributed and

P(z) - generating function of a random variable - converges in a circle of radius 1< R < оо. Следуя /38/, для 0 < z < R обозначим через £(z) случайную величину такую, что

Ml+£ = £ i1+ex„< 00.

0.10) k] = Pk, k = 0.1,.

Let's denote

If there is a solution to the equation m Z(z) = ъ then it is unique /38/. Throughout what follows we will assume that pk > O,A; = 0.1,.

The first paragraph of the first paragraph of the second chapter contains the asymptotics of logarithms of probabilities of the form

1pP(/x0 = ko,.,tsp = kp).

The following theorem is proved.

Theorem 2. Rough local theorem on the probabilities of large deviations. Let n, N -» oo so that jj ->7.0<7 < оо, существует z7 - корень уравнения M£(z) = 7, с. в. £(г7) имеет положительную дисперсию. Тогда для любого k G Cl(n,N)

1nP(D = k) = JftpK)) + O(^lniV).

The statement of the theorem follows directly from the formula for the joint distribution fii,. fin in /26/ and the following estimate: if non-negative integer values ​​, Нп satisfy the condition

Hi + 2d2 + + PNn = n, then the number of non-zero values ​​among them is 0(l/n). This is a rough estimate and does not claim to be new. The number of non-zero CGs in generalized layout schemes does not exceed the value of the maximum filling of cells, which in the central region, with a probability tending to 1, does not exceed the value O(lnn) /25/, /27/. Nevertheless, the resulting estimate 0(y/n) is satisfied with probability 1 and is sufficient to obtain rough asymptotics.

In the second paragraph of the first paragraph of the second chapter, the value of the limit is found where adg is a sequence of real numbers converging to some a G R, φ(x) is a real-valued function. The following theorem is proved.

Theorem 3. Rough integral theorem on the probabilities of large deviations. Let the conditions of Theorem 2 be satisfied, for some r > 0, C > 0 the real function φ(x) is compact, uniformly continuous in the metric p on the set

A = 0r+<;(p(z7)) П Ц7+с] и удовлетворяет условию неэкстремальности на множестве fly. Если для некоторой константы а такой, что inf ф(х) < а < sup ф(х). xeily существует вектор ра € fi7 П 0r(p(z7)); такой, что

Ф(ra) > a and j(( (x) >a,xe P7),p(2;7)) = 7(pa,p(*y)) mo for any sequence a^ converging to a,

Jim -vbPW%%,.)>aN) = J(pa,p(2h)). (0.11)

With additional restrictions on the function φ(x), the information distance J(pa,p(z7)) in (2.3) can be calculated more specifically. Namely, the following theorem is true. Theorem 4. On information distance. Let for some 0< 7 < оо для некоторвх г >0, C > 0, the real function φ(x) and its first-order partial derivatives are compact and uniformly continuous in the generalized metric p(x, y) on the set p G

A = Og(p) P %+c] there exist T > 0, R > 0 such that for all \t\<Т,0 < z < R,x е А

E^exp^-f(x))< оо,

0(a;)exp(t-< со, i/=o oxv 0X1/ для некоторого е >O oo Q pvv1+£zu exp(t-ph(x))< оо, (0.13) и существует единственный вектор x(z,t), удовлетворяющий системе уравнений xv(z, t) = pvzv ехр {Ь-ф(х(г, t))}, v = 0,1,. функция ф(х) удовлетворяет на множестве А условию неэкстремальности, а - некоторая константа, ф(р) < а < sup ф(:x)(z,t),

0

00 vpv(za,ta) = 7, 1/=0

0(p(*aL)) = a, where

Then p(za, ta) € and

J((x e А,ф(х) = а),р) = J(p(za, ta),p)

00 d 00 d = l\nza + taYl ir- (x(za,ta)) - In E^r/exp(ta-z- (p(zatta))). j/=0 C^i/ t^=0

If the function f(x) is a linear function, and the function f(x) is defined using equality (0.5), then condition (0.12) turns into Cramer’s condition for the random variable f(£(z)). Condition (0.13) is a form of condition (0.10) and is used to prove the presence in domains of the form (x G f(x) > a) of at least one point from 0(n, N) for all sufficiently large n, N.

Let ^)(n, N) = (hi,., /gdr) be the frequency vector in the generalized placement scheme (0.2). As a corollary of Theorems 3 and 4, the following theorem is formulated.

Theorem 5. Rough integral theorem on the probabilities of large deviations of symmetric separable statistics in a generalized placement scheme.

Let n, N -» oo so that ^ - 7, 0< 7 < оо, существует z1 - корень уравнения М£(,г) = 7, с. в. £(27) имеет положительную дисперсию и максимальный шаг распределения 1, а - некоторая константа, f(x) - действительная функция, а < Mf(^(z1)), существуют Т >0,R > 0 such that for all |t|<Т,0 < z < R,

00 oo, u=0 there are such ta\

E vVi/("01 ta) = b where f(v)p"(za,ta) = a, 1/=0

Then for any sequence adg converging to a,

Jim - - InF»(- £ f(h„) > aN) = J(p(za,ta),p(z7))

00 7 In 2a + taa - In £ p^/e^M i/=0

This theorem was first proven by A.F. Ronzhin in /38/ using the saddle point method.

In the second paragraph of the second chapter, the probabilities of large deviations of separable statistics in generalized cxj^iax placements are studied in the case of failure to satisfy the Cramer condition for the random variable f(€(z)). Cramer's condition for the random variable f(£(z)) is not satisfied, in particular, if £(z) is a Poisson random variable and f(x) is x2. Note that Cramer's condition for the separable statistics themselves in generalized allocation schemes is always satisfied, since for any fixed n, N the number of possible outcomes in these schemes is finite.

As noted in /2/, if the Cramer condition is not satisfied, then to find the asymptotics of the probabilities of large deviations of sums of identically distributed random variables, additional ones are required. f

V and. . I conditions of correct change on the distribution of the term. In progress j

O, 5 the case corresponding to the fulfillment of condition (3) in /2/ is considered, that is, the seven-exponential case. Let P(£i = k) > 0 for all k = 0,1. and the function p(k) = -\nP(^ = k), can be extended to a function of continuous argument - a regularly varying function of order p, 0< р < со /45/, то есть положительной функции такой, что при t ->oo p(tx) xr.

Let the function f(x) for sufficiently large values ​​of the argument be a positive strictly increasing, regularly varying function of order. Let us define the function cp(x) by setting for sufficiently large x φ) = p(Γ\x)).

On the rest of the numerical axis, ip(x) can be specified in an arbitrary limited measurable way.

Then s. V. /(£i) has moments of any order and does not satisfy the Cramer condition, p(x) = o(x) as x -> ω, and the following Theorem 6 is valid. Let the function ip(x) be monotonically nondecreasing for sufficiently large x, fg^ction does not increase monotonically, n, N -> oo so that jj - A, 0< Л < оо; гд - единственный корень уравнения M^i(^) = Л, тогда для любого с >b(z\), where b(z) = M/(£i(.z)), there is a limit CN) = -(c - b(z\))4.

It follows from Theorem b that if Cramer’s condition is not met, the limit lim 1 InP(LN(h(n, N)) > cN) = 0, ^ ^ iv-too iv which proves the validity of the hypothesis stated in /39/. Thus, the value of the index of the agreement criterion in generalized placement schemes and failure to fulfill Cramer’s condition is always equal to zero. In this case, in the class of criteria, when Cramer’s condition is satisfied, criteria with a non-zero index value are constructed. From this we can conclude that using criteria whose statistics do not satisfy the Cramer condition, for example, the chi-square test in a polynomial scheme, to construct goodness-of-fit tests for testing hypotheses for non-converging alternatives in the indicated sense is asymptotically ineffective. A similar conclusion was made in /54/ based on the results of a comparison of chi-square and maximum likelihood ratio statistics in a polynomial scheme.

The third chapter solves the problem of constructing goodness-of-fit criteria with the largest value of the criterion index (the largest value of the subscript of the criterion) to test hypotheses in generalized placement schemes. Based on the results of the first and second chapters on the properties of the entropy functions, information distance and probabilities of large deviations, in the third chapter a function of the form (0.4) is found such that the goodness-of-fit criterion constructed on its basis has the largest value of the exact subscript in the class of criteria under consideration. The following theorem is proved.

Theorem 7. On the existence of an index. Let the conditions of Theorem 3 be satisfied: 0< /3 < 1, Н = Hp(i),Hp(2>,. is a sequence of alternative distributions, а,ф((3, N) is the maximum number for which, under the hypothesis Нр<ло выполнено неравенство существует предел lim^-оо о>φ(P, N) - a. Then at the point (/3, H) there is a criterion index φ

Zff, H) = 3((φ(x) > a, x £ ^.PW).

Shy)<ШН)>where w/fo fh h v^l ^

The Conclusion sets out the results obtained in their relationship with the general goal and specific tasks posed in the dissertation, formulates conclusions based on the results of the dissertation research, indicates the scientific novelty, theoretical and practical value of the work, as well as specific scientific tasks identified by the author and the solution of which seems relevant .

Brief review of the literature on the research topic. The thesis examines the problem of constructing agreement criteria in generalized placement schemes with the highest value of the criterion index in the class of functions of the form (0.4) with non-converging alternatives.

Generalized layout schemes were introduced by V.F. Kolchin in /24/. The quantities in the polynomial scheme were called the number of cells with g pellets and were studied in detail in the monograph by V. F. Kolchin, B. A. Sevastyanov, V. P. Chistyakov /27/. The values ​​of fir in generalized layouts were studied by V.F. Kolchin in /25/, /26/. Statistics of the form (0.3) were first considered by Yu. I. Medvedev in /30/ and were called separable (additively separable) statistics. If the functions /„ in (0.3) do not depend on u, such statistics were called in /31/ symmetric separable statistics. The asymptotic behavior of the moments of separable statistics in generalized allocation schemes was obtained by G. I. Ivchenko in /9/. Limit theorems for a generalized layout scheme were also considered in /23/. Reviews of the results of limit theorems and agreement criteria in discrete probabilistic schemes of type (0.2) were given by V. A. Ivanov, G. I. Ivchenko, Yu. I. Medvedev in /8/ and G. I. Ivchenko, Yu. I. Medvedev , A.F. Ronzhin in /14/. Agreement criteria for generalized layouts were considered by A.F. Ronzhin in /38/.

A comparison of the properties of statistical criteria in these works was carried out from the point of view of relative asymptotic efficiency. The case of converging (contigual) hypotheses was considered - efficiency in the sense of Pitman and non-converging hypotheses - efficiency in the sense of Bahadur, Hodges - Lehman and Chernov. The relationship between different types of relative performance statistical tests is discussed, for example, in /49/. As follows from the results of 10. I. Medvedev in /31/ on the distribution of separable statistics in a polynomial scheme, the greatest asymptotic power under convergent hypotheses in the class of separable statistics on the frequencies of outcomes in a polynomial scheme has a criterion based on the chi-square statistic. This result was generalized by A.F. Ronzhin for circuits of type (0.2) in /38/. I. I. Viktorova and V. P. Chistyakov in /4/ constructed an optimal criterion for a polynomial scheme in the class of linear functions of /xr. A.F. Ronzhin in /38/ constructed a criterion that, given a sequence of alternatives that are not close to the null hypothesis, minimizes the logarithmic rate at which the probability of an error of the first kind tends to zero, in the class of statistics of the form (0.6). A comparison of the relative performance of the chi-square and maximum likelihood ratio statistics under approaching and non-approximating hypotheses was carried out in /54/.

The thesis considered the case of non-converging hypotheses. Studying the relative statistical effectiveness of criteria under non-converging hypotheses requires studying the probabilities of extremely large deviations - of the order of 0(i/n). For the first time, such a problem for a polynomial distribution with a fixed number of outcomes was solved by I. N. Sanov in /40/. The asymptotic optimality of goodness-of-fit tests for testing simple and complex hypotheses for a multinomial distribution in the case of a finite number of outcomes with non-converging alternatives was considered in /48/. The properties of information distance were previously considered by Kullback, Leibler /29/,/53/ and I. II. Sanov /40/, as well as Hoeffding /48/. In these works, the continuity of information distance was considered on finite-dimensional spaces in the Euclidean metric. A number of authors considered a sequence of spaces with increasing dimension, for example, in the work of Yu. V. Prokhorov /37/ or in the work of V. I. Bogachev, A. V. Kolesnikov /1/. Rough (up to logarithmic equivalence) theorems on the probabilities of large deviations of separable statistics in generalized allocation schemes under the Cramer condition were obtained by A.F. Ronzhin in /38/. A. N. Timashev in /42/,/43/ obtained exact (up to equivalence) multidimensional integral and local limit theorems on the probabilities of large deviations of the vector fir^n, N),., iir.(n,N), where s, r\,., rs - fixed integers,

ABOUT<П < .

The study of the probabilities of large deviations when the Cramer condition is not met for the case of independent random variables was carried out in the works of A. V. Nagaev /35/. The method of conjugate distributions is described by Feller /45/.

Statistical problems of testing hypotheses and estimating parameters in a selection scheme without return in a slightly different formulation were considered by G. I. Ivchenko, V. V. Levin, E. E. Timonina /10/, /15/, where estimation problems were solved for a finite population, when the number of its elements is an unknown quantity, the asymptotic normality of multivariate S - statistics from s independent samples in a selection scheme without reversion was proved. The problem of studying random variables associated with repetitions in sequences of independent trials was studied by A. M. Zubkov, V. G. Mikhailov, A. M. Shoitov in /6/, /7/, /32/, /33/, /34/ . An analysis of the main statistical problems of estimation and testing of hypotheses within the framework of the general Markov-Pólya model was carried out by G. I. Ivchenko, Yu. I. Medvedev in /13/, a probabilistic analysis of which was given in /11/. A method for specifying non-uniform probability measures on a set of combinatorial objects, which is not reducible to the generalized placement scheme (0.2), was described in G. I. Ivchenko, Yu. I. Medvedev /12/. A number of problems in probability theory, in which the answer can be obtained as a result of calculations using recurrent formulas, were indicated by A. M. Zubkov in /5/.

Inequalities for the entropy of discrete distributions were obtained in /50/ (cited from the abstract of A. M. Zubkov in RZhMat). If (pn)^Lo is the probability distribution, oo

Рп = Е Рк, к=тг

A = supp^Pn+i< оо (0.14) п>0 and

F(x) = (x + 1) In (x + 1) - x In x, then for the entropy I of this probability distribution

00 i = - 5Z Рк^Рк к=0 the inequalities are valid -L 1 00 00 Р

I + (In -f-) £ (Arn - Rn+1)< F(А) < Я + £ (АРп - P„+i)(ln

L D p=P -t p.4-1 and inequalities turn into equalities if

Рп= (xf1)n+vn>Q. (0.15)

Note that the extremal distribution (0.15) is a geometric distribution with mathematical expectation A, and the function F(A) of the parameter (0.14) coincides with the function of the mathematical expectation in Theorem 1.

Similar dissertations in the specialty "Probability Theory and Mathematical Statistics", 01/01/05 code VAK

  • Asymptotic efficiency of scale-parameter-free exponential tests 2005, Candidate of Physical and Mathematical Sciences Chirina, Anna Vladimirovna

  • Some problems in probability theory and mathematical statistics related to the Laplace distribution 2010, Candidate of Physical and Mathematical Sciences Lyamin, Oleg Olegovich

  • Limit theorems in problems of dense embedding and dense series in discrete random sequences 2009, Candidate of Physical and Mathematical Sciences Mezhennaya, Natalya Mikhailovna

  • Limit theorems for the number of intersections of a strip by random walk trajectories 2006, candidate of physical and mathematical sciences Orlova, Nina Gennadievna

  • Optimization of the structure of moment estimates of the accuracy of normal approximation for distributions of sums of independent random variables 2013, Doctor of Physical and Mathematical Sciences Shevtsova, Irina Gennadievna

Conclusion of the dissertation on the topic “Probability Theory and Mathematical Statistics”, Kolodzei, Alexander Vladimirovich

3.4. conclusions

In this chapter, based on the results of previous chapters, it was possible to construct a goodness-of-fit criterion for testing hypotheses in generalized placement schemes with the highest logarithmic rate of tending to zero probabilities of errors of the first kind, with a fixed probability of errors of the first kind and non-converging alternatives. ~"

Conclusion

The purpose of the dissertation work was to construct goodness-of-fit criteria for testing hypotheses in a selection scheme without returning from an urn containing balls of 2 colors. The author decided to study statistics based on the frequencies of distances between balls of the same color. In this formulation, the problem was reduced to the task of testing hypotheses in a suitable generalized layout.

The dissertation work included

The properties of entropy and information distance of discrete distributions with an unlimited number of outcomes and limited mathematical expectation have been studied;

A rough (up to logarithmic equivalence) asymptotic behavior of the probabilities of large deviations of a wide class of statistics in a generalized placement scheme is obtained;

Based on the results obtained, a criterion function with the highest logarithmic rate of tending to zero of the probability of an error of the first type with a fixed probability of an error of the second type and non-approaching alternatives was constructed;

It has been proven that statistics that do not satisfy the Cramer condition have a lower rate of convergence to zero of the probabilities of large deviations compared to statistics that satisfy this condition.

The scientific novelty of the work is as follows.

The concept of a generalized metric is given - a function that admits infinite values ​​and satisfies the axioms of identity, symmetry and triangle inequality. A generalized metric is found and sets are indicated on which the entropy and information distance functions, defined on a family of discrete distributions with a countable number of outcomes, are continuous in this metric;

In a generalized placement scheme, a rough (up to logarithmic equivalence) asymptotics was found for the probabilities of large deviations of statistics of the form (0.4) satisfying the corresponding form of Cramer’s condition;

In a generalized placement scheme, a rough (up to logarithmic equivalence) asymptotics is found for the probabilities of large deviations of symmetric separable statistics that do not satisfy the Cramer condition;

In the class of criteria of the form (0.7), a criterion with the highest value of the criterion index is constructed.

The work solves a number of questions about the behavior of the probabilities of large deviations in generalized placement schemes. The results obtained can be used in the educational process in the specialties of mathematical statistics and information theory, in the study of statistical procedures for the analysis of discrete sequences, and were used in /3/, /21/ to justify the security of one class of information systems.

However, a number of questions remain open. The author limited himself to considering the central zone of changes in parameters n, N of generalized schemes for placing n particles in N cells. If the carrier of the distribution of random variables generating the generalized arrangement scheme (0.2) is not a set of the form r, r + 1, r + 2,., then when proving the continuity of the information distance function and studying the probabilities of large deviations, it is necessary to take into account the arithmetic structure of such a carrier that was not considered in the author's work. For the practical application of criteria built on the basis of the proposed function with the maximum index value, it is necessary to study its distribution both under the null hypothesis and under alternatives, including converging ones. It is also of interest to transfer the developed methods and generalize the results obtained to other probabilistic schemes other than generalized placement schemes.

If - frequencies of distances between outcome numbers 0 in a binomial scheme with probabilities of outcomes po> 1 - Po, then it can be shown that in this case

Pb = kh.t fin = kn) = I(± iki = n)(kl + --, (3.3) v=\ K\ \ . Kn\ where

O* = Po~1(1 ~Po),v =

From the analysis of the formula for the joint distribution of values ​​of cg in a generalized arrangement scheme, proven in /26/, it follows that distribution (3.3), generally speaking, cannot be represented in the general case as a joint distribution of values ​​of cg in any generalized arrangement of particles by cells. This distribution is a special case of distributions on the set of combinatorial objects introduced in /12/. It seems an urgent task to transfer the results of the dissertation work for generalized placement schemes to this case, which was discussed in /52/.

If the number of outcomes in a choice-without-return or polynomial allocation scheme is greater than two, then the joint frequency distribution of distances between adjacent identical outcomes can no longer be represented in such a simple way. So far it is only possible to calculate the mathematical expectation and dispersion of the number of such distances /51/.

List of references for dissertation research Candidate of Physical and Mathematical Sciences Kolodzei, Alexander Vladimirovich, 2006

1. Bogachev V.I., Kolesnikov A.V. Nonlinear transformations of convex measures and entropy of Radon-Nikodym densities // Reports of the Academy of Sciences. - 2004. - T. 207. - 2. - P. 155 - 159.

2. Vidyakin V.V., Kolodzei A.V. Statistical detection of covert channels in data transmission networks // Proc. report II International conf. "Information systems and technologies IST" 2004" (Minsk, October 8-10, 2004) Minsk: BSU, 2004. - Part 1. - pp. 116 - 117.

3. Viktorova I. I., Chistyakov V. P. Some generalizations of the empty box criterion // Theory Probab. and its applications. - 1966. - T. XI. - 2. P. 306-313.

4. Zubkov A. M. Recurrent formulas for calculating functionals of ods of discrete random variables // Review of Appl. and industrial math. 1996. - T. 3. - 4. - P. 567 - 573.

5. G. Zubkov A. M., Mikhailov V. G. Limit distributions of random variables associated with long repetitions in a sequence of independent tests // Theory Probab. and its applications. - 1974. - T. XIX. 1. - pp. 173 - 181.

6. Zubkov A. M., Mikhailov V. G. On repetitions of s - chains in a sequence of independent quantities // Theory Probab. and its application. - 1979. T. XXIV. - 2. - P. 267 - 273.

7. Ivanov V. A., Ivchenko G. I., Medvedev Yu. I. Discrete problems in probability theory // Results of Science and Technology. Ser. theory of probability, mathematics. stat., theor. cybern. T. 23. - M.: VINITI, 1984. P. 3 -60.

8. Ivchenko G. I. On moments of separable statistics in a generalized allocation scheme // Mat. notes. 1986. - T. 39. - 2. - P. 284 - 293.

9. Ivchenko G. I., Levin V. V. Asymptotic normality in a selection scheme without return // Theory Probab. and it is applied. - 1978.- T. XXIII. 1. - pp. 97 - 108.

10. Ivchenko G.I., Medvedev Yu.I. On the Markov-Polya urn scheme: from 1917 to the present day // Review applied. and industrial math. - 1996.- T. 3. 4. - P. 484-511.

11. Ivchenko G.I., Medvedev Yu.I. Random combinatorial objects // Reports of the Academy of Sciences. 2004. - T. 396. - 2. - P. 151 - 154.

12. Ivchenko G. I., Medvedev Yu. I. Statistical problems associated with the organization of control over the processes of generating discrete random sequences // Diskretn. math. - 2000. - T. 12. - 2. S. 3 - 24.

13. Ivchenko G. I., Medvedev Yu. I., Ronzhin A. F. Separable statistics and goodness-of-fit criteria for polynomial samples // Proceedings of Mathematics. Institute of the USSR Academy of Sciences. 1986. - T. 177. - P. 60 - 74.

14. Ivchenko G. I., Timonina E. E. On estimation when choosing from a finite population // Mat. notes. - 1980. - T. 28. - 4. - P. 623 - 633.

15. Kolodzei A. V. Theorem on the probabilities of large deviations for separable statistics that do not satisfy the Cramer condition // Diskretn. math. 2005. - T. 17. - 2. - P. 87 - 94.

16. Kolodzei A. V. Entropy of discrete distributions and the probability of large deviations of functions from filling cells in generalized layouts // Review of Appl. and industrial math. - 2005. - T. 12. 2. - P. 248 - 252.

17. Kolodzey A. V. Statistical criteria for identifying hidden channels based on changing the order of messages // Research work "Apology": Report / FSTEC of the Russian Federation, Head A. V. Knyazev. Inv. 7 chipboards - M., 2004. - P. 96 - 128.

18. Kolodzei A.V., Ronzhin A.F. About some statistics related to checking the homogeneity of random discrete sequences // Research work "Development of mathematical problems of cryptography" N 4 2004.: Report / AK RF, - M., 2004 .

19. Kolchin A. V. Limit theorems for a generalized layout scheme // Diskretn. math. 2003. - T. 15. - 4. - P. 148 - 157.

20. Kolchin V.F. One class of limit theorems for conditional distributions // Lit. math. Sat. - 1968. - T. 8. - 1. - P. 111 - 126.

21. Kolchin V. F. Random graphs. 2nd ed. - M.: FIZMATLIT, 2004. - 256 p.

22. Kolchin V. F. Random mappings. - M.: Nauka, 1984. - 208 p.

23. Kolchin V.F., Sevastyanov B.A., Chistyakov V.P. Random placements. M.: Nauka, 1976. - 223 p.

24. Kramer G. // Uspekhi Matem. Sciences. - 1944. - high. 10. - pp. 166 - 178.

25. Kulbak S. Information theory and statistics. - M.: Nauka, 1967. - 408 p.

26. Medvedev Yu. I. Some theorems on the asymptotic distribution of the chi-square statistic // Dokl. Academy of Sciences of the USSR. - 1970. - T. 192. 5. - P. 997 - 989.

27. Medvedev Yu. I. Separable statistics in a polynomial scheme I; II. // Theory Prob. and its use. - 1977. - T. 22. - 1. - P. 3 - 17; 1977. T. 22. - 3. - P. 623 - 631.

28. Mikhailov V. G. Limit distributions of random variables associated with multiple long repetitions in a sequence of independent tests // Theory Probab. and its applications. - 1974. T. 19. - 1. - P. 182 - 187.

29. Mikhailov V. G. Central limit theorem for the number of incomplete long repetitions // Theory Probab. and its applications. - 1975. - T. 20. 4. - P. 880 - 884.

30. Mikhailov V. G., Shoitov A. M. Structural equivalence of s - chains in random discrete sequences // Discrete. math. 2003. - T. 15, - 4. - P. 7 - 34.

31. Nagaev A.V. Integral limit theorems taking into account probabilities of large deviations. I. // Theory Probab. and it is applied. -1969. T. 14. 1. - pp. 51 - 63.

32. Petrov V. V. Sums of independent random variables. - M.: Nauka, 1972. 416 p.

33. Prokhorov Yu. V. Limit theorems for sums of random vectors whose dimension tends to infinity // Theory Probab. and its applications. 1990. - T. 35. - 4. - P. 751 - 753.

34. Ronzhin A.F. Criteria for generalized particle placement schemes // Theory Probab. and its applications. - 1988. - T. 33. - 1. - P. 94 - 104.

35. Ronzhin A.F. Theorem on the probabilities of large deviations for separable statistics and its statistical application // Mat. notes. 1984. - T. 36. - 4. - P. 610 - 615.

36. Sanov I. N. On the probabilities of large deviations of random variables // Mat. Sat. 1957. - T. 42. - 1 (84). - S.I - 44.

37. Seneta E. Correctly changing functions. M.: Nauka, 1985. - 144 p.

38. Timashev A. N. Multidimensional integral theorem on large deviations in an equiprobable placement scheme // Diskret, Mat. - 1992. T. 4. - 4. - P. 74 - 81.

39. Timashev A. N. Multidimensional local theorem on large deviations in an equiprobable placement scheme // Diskretn. math. - 1990. T. 2. - 2. - P. 143 - 149.

40. Fedoryuk M.V. Pass method. M.: Nauka, 1977. 368 p.

41. Feller V. Introduction to probability theory and its applications. T. 2. - M.: Mir, 1984. 738 p.

42. Shannon K. Mathematical theory of communication // Works on information theory and cybernetics: Trans. from English / M., IL, 1963, p. 243 - 332.

43. Conrad K. Probability Distribution and Maximum Entropy // http://www.math.uconn.edu/~kconrad/blurbs/entropypost.pdf

44. Hoeffding W. Asymptotically optimal tests for multinomial distribution // Ann. Math. Statist. 1965. - T. 36. - pp. 369 - 408.

45. Inglot T,. Rallenberg W. S. M., Ledwina T. Vanishing shortcoming and asymptotic relative efficiency // Ann. Statist. - 2000. - T. 28. - P. 215 238.

46. ​​Jurdas C., Pecaric J., Roki R., Sarapa N., On an inequality for the entropy of probability distribution // Math. Inequal. and Appl. - 2001. T. 4. - 2. - P. 209 - 214. (RZhMat. - 2005. - 05.07-13B.16).

47. Kolodzey A. V., Ronzhin A. F., Goodness of Fit Tests for Random Combinatoric Objects // Proc. report intl. conf. Modern Problems and new Trends in Probability Theory, (Chernivtsi, June 19 - 26, 2005) - Kyiv: Institute of Mathematics, 2005. Part 1. P. 122.

48. Kullback S. and Leibler R. A. On information and sufficiency // Ann. Math. Statist. 1951. - T. 22. - pp. 79 - 86.

49. Quine M.P., Robinson J. Efficiency of chi-square and likelihood ratio goodness of fit tests // Ann. Statist. 1985. - T. 13. - 2. - P. 727 -742.

Please note that the scientific texts presented above are posted for informational purposes only and were obtained through original dissertation text recognition (OCR). Therefore, they may contain errors associated with imperfect recognition algorithms. There are no such errors in the PDF files of dissertations and abstracts that we deliver.

In modern conditions, interest in data analysis is constantly and intensively growing in completely different fields, such as biology, linguistics, economics, and, of course, IT. The basis of this analysis is statistical methods, and every self-respecting data mining specialist needs to understand them.

Unfortunately, truly good literature, the kind that can provide both mathematically rigorous proofs and clear intuitive explanations, is not very common. And these lectures, in my opinion, are unusually good for mathematicians who understand probability theory precisely for this reason. They are taught to masters at the German Christian-Albrecht University in the Mathematics and Financial Mathematics programs. And for those who are interested in how this subject is taught abroad, I translated these lectures. It took me several months to translate, I diluted the lectures with illustrations, exercises and footnotes on some theorems. I note that I am not a professional translator, but simply an altruist and amateur in this field, so I will accept any criticism if it is constructive.

In short, this is what the lectures are about:


Conditional mathematical expectation

This chapter does not directly relate to statistics, however, it is ideal for starting to study it. Conditional expectation is the best choice for predicting a random outcome based on information already available. And this is also a random variable. Here we consider its various properties, such as linearity, monotonicity, monotonic convergence and others.

Point Estimation Basics

How to estimate the distribution parameter? What criterion should I choose for this? What methods should I use? This chapter helps answer all these questions. Here we introduce the concepts of unbiased estimator and uniformly unbiased minimum variance estimator. Explains where the chi-square and t-distributions come from and why they are important in estimating the parameters of a normal distribution. Explains what the Rao-Kramer inequality and Fisher information are. The concept of an exponential family is also introduced, which greatly facilitates obtaining a good estimate.

Bayesian and minimax parameter estimation

A different philosophical approach to evaluation is described here. In this case, the parameter is considered unknown because it is a realization of a certain random variable with a known (a priori) distribution. By observing the result of the experiment, we calculate the so-called posterior distribution of the parameter. Based on this, we can obtain a Bayesian estimator, where the criterion is the minimum loss on average, or a minimax estimator, which minimizes the maximum possible loss.

Sufficiency and completeness

This chapter has serious practical significance. A sufficient statistic is a function of the sample such that it is sufficient to store only the result of this function in order to estimate the parameter. There are many such functions, and among them are the so-called minimum sufficient statistics. For example, to estimate the median of a normal distribution, it is enough to store only one number - the arithmetic mean over the entire sample. Does this also work for other distributions, such as the Cauchy distribution? How do sufficient statistics help in choosing estimates? Here you can find answers to these questions.

Asymptotic properties of estimates

Perhaps the most important and necessary property of an assessment is its consistency, that is, the tendency towards a true parameter as the sample size increases. This chapter describes what properties the estimates we know, obtained by the statistical methods described in previous chapters, have. The concepts of asymptotic unbiasedness, asymptotic efficiency and Kullback-Leibler distance are introduced.

Testing Basics

In addition to the question of how to estimate a parameter unknown to us, we must somehow check whether it satisfies the required properties. For example, an experiment is being conducted to test a new drug. How do you know if the likelihood of recovery is higher with it than with using old medications? This chapter explains how such tests are constructed. You will learn what the uniformly most powerful test is, the Neyman-Pearson test, the significance level, the confidence interval, and where the well-known Gaussian test and t-test come from.

Asymptotic properties of criteria

Like assessments, criteria must satisfy certain asymptotic properties. Sometimes situations may arise when it is impossible to construct the required criterion, however, using the well-known central limit theorem, we construct a criterion that asymptotically tends to the necessary one. Here you will learn what the asymptotic significance level is, the likelihood ratio method, and how the Bartlett test and the chi-square test of independence are constructed.

Linear model

This chapter can be seen as a complement, namely the application of statistics in the case of linear regression. You will understand what grades are good and under what conditions. You will learn where the least squares method came from, how to construct tests, and why the F-distribution is needed.