The searching algorithm for detecting a Markovian target based on maximizing the discounted effort reward search

This paper presents the searching algorithm to detect a Markovian target which moves randomly in M-cells. Our algorithm is based on maximizing the discounted effort reward search. At each fixed number of time intervals, the search effort is a random variable with a normal distribution. More than minimizing the non-detection probability of the targets at time interval i, we seek for the optimal distribution of the search effort by maximizing the discounted effort reward search. We present some special cases of one Markovian and hidden target. Experimental results for a Markovian, hidden target are obtained and compared with the cases of applying and without applying the discounted effort reward search.


Introduction
The searching problem for missing targets had begun since the fifties of the last century. Scientists have presented different types of research plans that fit the nature of the research area. The targets were placed sometimes in difficult terrain areas on the surface of the ground or in the deep of the sea. In order to increase the probability of detection or minimize the search effort, specialists in this field dived the areas to be searched in a set with identical or different states. The search area is divided into cells of different forms. Hong et al. [1,2] divided the area into hexagonal cells. They proposed an approximation algorithm for the optimal search path. This algorithm optimizes an approximate path to compute the detection probability, by using the conditional probabilities and then finding the maximum probability of detection of this search path. Song and Teneketizs [3] determined the optimal search strategies with multiple sensors that maximize the total probability of successful search where the target is hidden in one of a finite set of different cells. Teamah et al. [4] divided the search region into square cells. They minimized the probability of undetected and the searching effort (is bounded by a normal distribution)

Problem formulation
In this section, we present the same model which has been studied before by El-Hadidy [5] but without using fuzzy logic. This model uses the same discrete approach that was used in El-Hadidy [5] where the targets move on discrete state space (M-cells) with a discrete-time Markovian motion.

The searching technique
The searcher has the ability to move freely on M-cells (the searcher can jump from any cell to another freely). The searcher will detect the primary target and then its related target which may be in one of the primary target's neighbor cells. Since the searcher aims to find the optimal method to get the minimum distribution of the searching effort that minimizes the searching cost, we will use all the previous hypotheses to formulate a very interesting and difficult optimization problem. El-Hadidy [5] showed the probability that the primary target exists in cell j at time interval i is denoted by P ij , i = 1, 2, ..., N, j = 1, 2, ..., M and consequently the probability of the other target is one of the probabilities: P i(j−h−1) , P i(j−h) , P i(j−h+1) , P i(j−1) , P i(j+1) , P i(j+h −1) , P i(j+h) , P i(j+h+1) , see Fig. 1.

The searching effort
We let the effort is randomly distributed, then we can consider that the effort which will be distributed among the cells is L(R) and its value is bounded by a random variable X (i.e., 0 ≤ L(R) ≤ X). Here, the probability of detection depends on the total amount of effort Z ij , i = 1, 2, ..., N, j = 1, 2, ..., M is applied there by the searcher and not on the way the effort is applied. We assume that the searches at distinct time intervals are independent and the motion of the target is independent of the sensors' actions. The searcher will visit the cell j through one of its adjacent cells as in the cases in Fig. 2.

The probability of detection
We consider that the conditional probability of detecting the target at time interval i with Z ij amount of effort given that the target is located in state j is given by the detection function b(i, j, Z ij ). El-Hadidy [5] showed that the probability of detecting the first target in the cell j at time interval i is P ij 1 − b i, j, Z ij , where Z ij is the amount of effort, given that the target is located in cell j. It is known that the number of the cells which surrounds the cell where the first target is detected at the time interval i is 8, so the other target will be detected in one of these cells at the same time. We must not forget that the searcher entered one of these eight cells before the detection of the first target. Therefore, we have seven cells and the probability of the other target will be distributed on them, see Hong et al. [1]. Here, the searcher does not enter the cells that he entered before in this time interval i. Then, the searcher will enter one of the seven cells and leaving only 6 cells with the target being distributed. Consequently, the probability of detecting the other target is For further clarification, see El-Hadidy [5]. Here, we will deal with the probability of undetecting the two targets in the cell j at time interval i which is given by Consequently, the probability of undetecting the two targets over the whole time is given by, and it can be written as, And the total effort of detecting the two targets is, where the other target.

The exponential detection function
In physics, the signal detector is based on an exponential function because the detection exponential function has much lower computational complexity than the others such as the Gaussian kernelized energy detector, see Luo et al. [37]. Thus, here in order to model the effort, we use an exponential detection function, that is, where T j and T j+ are factors due to the searching process (which depending on the nature of the cells and its dimensions) in the cell j and its neighbors, respectively. Then, the probability of undetecting the targets over the whole time is given by,

Optimization problem with discounted effort reward
As in El-Hadidy [5] and Blum et al. [38], we use an exponential function w j (i) = λ i j , 0 < λ j < 1 that will reduce the possible rewards at time interval i. The tuning parameter λ j permits us to decide indirectly how fast we want to find the targets or in other words how important are the actions that the searcher will take in the future. Here, we need to minimize the probability of undetected; then, we use the complement function of w j (i), that is, 1 − λ i j . The cost function (3) is combined with the discounted effort function to develop the final discounted effort reward function: where ij = 6 1 − λ i j+ P i(j+ ) e −(Z i(j+ ) /T j+ ) and the unrestricted effort will become, . Let X be a random variable with a normal distribution. It has a probability density function f (x) and distribution function F (x) . The purpose here is to minimize Z ij , Z i(j+ ) , λ j and λ j+ , and thus, we have different types of decision variables and parameters in the objective function. This leads us to consider our problem as a multi-objective nonlinear programming problem aims to minimize H(Z; λ) subject to the constraints: L(Z; λ) ≤ X, where Z is a function on X. Since the detection function is exponential, then the problem will become a convex nonlinear programming problem (NLP) as follows, NLP: where R NM is the feasible set of constrained decisions. The unique solution is guaranteed by the convexity of H(Z; λ) and Z (X) . Since we have two kinds of probabilities: (1) the probability of the target in each cell and (2) the probability of detecting the target, the carrying out of the search space (M-different states) with the greatest possible probability ≤ 1 will save the time and the effort. Hence, the detection probability (objective function) will be affected by the constraint M j=1 P j + P j+ = 1. In addition, the targets jump between the cells with transition Markov matrix (stochastic matrix). Thus, at each time interval i, there exists a transition probability from state j (or j + ) to another state, that is, P ij or P i(j+ ) , this probability is computing from the stochastic matrix (see the "Application" section). This leads us to consider P ij or P i(j+ ) that is not a given parameter but a constraint where its maximum and minimum values effect directly on Z ij , Z i(j+ ) , λ j and λ j+ . This probability is used in the formulation of the objective function; then, we call our problem as nonlinear stochastic programming problem. One can think Z ij , Z i(j+ ) have the same type of decision variables although they used on different cells. Here, each cell has a different nature from the other so the searching methods (search devices used and etc.) differs from the cell to other. Beside that, we consider that the probability of detection in state j (or j + ) at time interval i depends only on the total amount of effort applied there by the searcher and not on the way the effort is applied. Thus, we consider Z ij , Z i(j+ ) are the effort different variables.

Definition 1Z ∈ Z (X) is said to be an optimal solution for problem (NLP) if Z ∈ Z (X)
does not exist such that H(Z; λ) ≤ H Z ; − λ with at least one strict inequality holds, with Now, we have the corresponding nonlinear stochastic programming problem (NLSP) as, NLSP: The constraintP (L i (Z; λ) ≤ X i ) ≥ 1 − β has to be satisfied with its complement probability of at least (1 − β) and can be restated Here, we consider that X has a normal distribution because one of the important advantages of the normal distribution is that they are sensitive to shifts in the searching effort at any time interval i. For the complement probability, we haveP Var(X i ) is a standard normal random variable. If K p represents the value of the standard normal random variable at which φ(K p ) = β, then this constraint can be expressed as . This inequality will be satisfied only if: Thus, the NLSP is equivalent to the following nonlinear stochastic programming problem (NLSP(1)), NLSP(1): Which is equivalent to,

Maximum probability of detection with minimum effort
Since H(Z; λ) is an exponential function, then it can be easy to prove that H(Z; λ) is convex functions, and then the necessary Kuhn-Tucker conditions are obtained as in Mangasarian [39]. Implies to, 1, 2, ..., N, σ = i and θ = 1, 2, ..., M.
If U > 0, then we found that Z σ θ = −P σ θ ; this is impossible because Z σ θ > 0 and 0 ≤ P σ θ ≤ 1. Thus, if U = 0, and subtracting (8) from (6), we have, Then, we have, Since the probability of the first target in the cell j is greater than zero, then In addition, T j is a factor due to the search in cell j and the dimensions of it (it is a given value where this value returns to the nature of the searching process). Consequently, we obtain the optimal value of λ * j at time step i from (11) Similarly, by subtracting (9) from (7), we have, This gives the optimal value of λ * j+ at time step i by solving the following equation: Let at least one of these boundaries satisfies that, Also, from (12), we conclude that at least one of these boundaries satisfies such that, From (15), (16) and by subsisting with λ * j and λ * j+ , we get If we know the optimal effort Z * i(j+ ) , then from (17) in (15), we get: Also, if we know the optimal effort Z * ij , we can get Z * i(j+ ) from solving the following equation: By knowing the minimum values λ * j , λ * j+ , Z * ij and Z * i(j+ ) , we can obtain the minimum value of H(Z; λ). This minimum values will maximize the probability of detecting the targets with minimum cost.

An algorithm
We use the following dynamic programming algorithm in contribution to solve larger instances of our problem to obtain the minimum search effort. The steps of the algorithm can be summarized as follows: Step 1. Insert the total number of time intervals N and the total number of cells M, E (X i ) , Var (X i ) , K p , the probability of the initial state of the first target P 0 , and the one-step transition probability matrix P. Step 2. At time interval i, use P and P 0 to generateP ij = P ij + P i(j+ ) the transition probability matrix of the two targets. Based on some recent information about the expected location of the other target, we can let Thus, one can obtain the value ofP ij . Step 3. Calculate the values of λ j and λ j+ from Eqs. (11) and (12), respectively. computes Z ij , Z i(j+ ) from (18) and (19) respectively. Now all anonymous values become known, then go to step 6; else, end the process. At the end of step 6, compute the value of H(Z). Do all the above steps for all time intervals and all cells whenever the conditions j ≤ M, i ≤ N are satisfied. Finally, in step 7, give the total value of H(Z) and then end the process.

One Markovian target
In this section, we will consider two cases for one Markovian target as follows.

Applying discount effort case
In the case of one target, the above DNLSP is equivalent to the following nonlinear stochastic programming problem (NLSP(2)), NLSP(2): Then, from (6), (8), and (10), we have, If U > 0, then we found that Z σ θ = −T θ P σ θ ; this is impossible because Z σ θ , T θ > 0 and 0 ≤ P σ θ ≤ 1. Thus, if U = 0, and subtracting (21) from (20), we have, which is the same result as in (13) (this gives λ * j ). In addition, to obtain Z * ij , we found that at least one of the boundaries for (21), (22) and (23) (where U = 0) equal to 0 as follows: Also, (26) can be obtained from (17) after substituting with Thus, one can get: The optimal value of undetecting probability function is given by:

Without applying discount effort case
Here, we do not use the discount effort function or we put λ j = 0 in the above NLSP(2), then we need to minimize the searching effort Z ij only. This makes the above NLSP(2) will take the form: By applying the Kuhn-Tucker conditions, we have, Leads to, Using (27), we have, The optimal value of undetecting probability function is given by:

Randomly located target
Let the probability of the target in cell j, j = 1, 2, ..., M, be π j . After the cell j has been searched, the searcher may either continue to search the same cell or switch without any delay to another cell. The searching process in each cell is conducted independently of previous searches and takes one unit of time. Thus, if the target has been stated in the cell j with probability ξ j , where 0 < ξ j < 1, Song and Teneketizs [3] showed that the probability of detecting the target in the ith time interval is P ij = π j ξ j 1 − ξ j i−1 , i = 1, 2, ..., N; j = 1, 2, ..., M. Consequently, in the case of applying the discount effort function case (applying discount effort case) as in NLSP (2), we get the equivalent optimization problem, NLSP(4): .., N and j = 1, 2, ..., M.
As in applying discount effort case , we get and the optimal value H(Z * ; λ * ) is given by:  Table 3 The optimal values of Z ij , i = 1, 2, 3, j = 1, 2 when we use the discount effort reward function for a Markovian target In addition, if we do not apply the discount effort in NLSP(4), then we get the following optimization problem, NLSP(5): and the optimal value of λ * j at time step i is given from solving the equation (13) or (23) or (34). Also, the optimal values of Z * ij and H(Z * ) are given by:

Application
We will consider the above dynamic programming algorithm in the above cases and compare between them to show the effectiveness of our model. Now, consider a Markovian target moves on two states with a transition matrix with initial probabilities: P 01 = 3 5 , P 02 = 2 5 and T j = j , j = 1, 2, i = 1, 2, 3. The probabilities P i1 and P i2 are 2 3 − (0.4) i−1 /15 and 1 3 + (0.4) i−1 /15 for i = 1, 2, 3, respectively (see Bhat [40]). In addition, let X i has a normal distribution with mean E(x i ) = 0.82 and variance Var(x i ) = 0.04. We assume that the standard normal random variable K p takes Table 4 The values of Z ij , i = 1, 2, 3, j = 1, 2 and H(Z; λ) for a randomly located target when we use the discount effort reward function   (28), see Table 1.
When we do not use the discount effort reward function and using the above assumption in this application, we get the optimal values of Z ij , i = 1, 2, 3, j = 1, 2 (from (32)) and H(Z) (from (33)) as in Table 2.
From the numerical calculations, we found that the value of H(Z; λ) (see Table 1) is very small than the value of H(Z) (see Table 2). This shows the effectiveness of our model. That happens although the values of Z ij , i = 1, 2, 3, j = 1, 2 in Table 2 are greater than the values of them in Table 1. Really this is true but when we use the discount effort reward function, the optimal values of Z ij are calculated from 1 − λ i * j Z * ij as in Table 3, where λ 1 = 0.4, λ 2 = 0.8.
This shows that the values of Z * ij , i = 1, 2, 3, j = 1, 2 in the case of using the discount effort reward function are smaller than the value of them in the other case.
On the other hand, if the probability of the target in the cell j, j = 1, 2 be π 1 = 0.2, π 2 = 0.8, respectively, and if we consider the target has been stated in the cell j with probability ξ 1 = 0.4, ξ 1 = 0.6, and when we use the discount effort reward function, the optimal values of Z ij , i = 1, 2, 3, j = 1, 2 are calculated from (35) and H(Z; λ) from (36), see Table 4.
Also, we see that the value of H(Z; λ) in Table 4 is very small than the value of H(Z) in Table 5. From Table 4, the optimal values of Z ij , i = 1, 2, 3, j = 1, 2 are greater than the values of them in Table 5. Thus, the optimal values of Z ij are calculated from 1 − λ i * j Z * ij as in Table 6, where λ 1 = 0.4, λ 2 = 0.8.
As in Table 6, the values of Z * ij , i = 1, 2, 3, j = 1, 2 in the case of using the discount effort reward function are smaller than the value of them in the other case.

Conclusion and future research
A new method has been presented to give the maximum discounted effort reward and the minimum possible cost for detecting two related targets (i. e., the targets which are related together in the movement). This method is different from the method which has been presented in El-Hadidy [5]. We minimize the values of the search effort Z ij , the tuning Table 6 The optimal values of Z ij , i = 1, 2, 3, j = 1, 2 when we use the discount effort reward function for a randomly located target parameter λ j , and the probability of undetected P ij , i = 1, 2, ..., N and j = 1, 2, ..., M at the same time. We present some special cases of one Markovian and hidden target. The experimental results are obtained from detecting two targets; one of them moves with a Markov process, and the other is randomly located. Also, compare these results in two cases, considering and ignoring the discount effort reward.
In future works, we will investigate and analyze the stability of NLSP(1), NLSP(2), NLSP(3), NLSP(4), and NLSP(5) by characterizing the set of feasible discounted effort reward parameters. Also, we can study the related dual problem of these problems. Also, this model is more suitable for using the multiple searchers case by considering the combinations of movement of multiple targets.