Monte Carlo simulations based on \(\mathcal {EGW}\) and \(\mathcal {EW}\) models
Computational experiments play an important role in probability and statistics since they can verify the validity of a hypothesis, examine the performance of something new or demonstrate a known truth. In this section, we present the estimates of the parameters under the maximum likelihood method for the \(\mathcal {EGW}\) and \(\mathcal {EW}\) models. They were obtained via BFGS, SANN, and Nelder–Mead, implemented in R OPTIM function [24]. For this, we implemented two other functions to automate the simulations: fitDist and getSimulation. The pseudo-codes of those algorithms as well as these functions can be seen in “Appendix.” Nowadays, with the available computational resources, such as parallel processing of many cores and multiple processes, it is possible speed-up the results of the computational simulations. Therefore, we run the simulations on parallel processes to explore the high-performance computing and runtime optimization. Thus, the results of the simulations as well as their execution times were gathered from a notebook Intel® Core\(^{{\mathrm{TM}}}\) i5-7200U, CPU 2.50 GHz, 2712 Mhz, 2 cores, 4 logical processors, RAM 8.00 GB, Microsoft® Windows 10 Home Single Language, X64 system, R\(^{\copyright }\) version 3.6.1, and RStudio\(^{\copyright }\) version 1.2.5001.
Simulation for the \(\mathcal {EGW}\) distribution
Samples of size \(50,\, 100,\, 500\,\,\, \text{ and } \,\,\,1000\) were obtained using the \(\mathcal {EGW}\) qf given by
$$\begin{aligned} Q_{\mathcal {EGW}}\left( q\right) =\left\{ \log \left[ 1-q^{\frac{1}{b}}\right] ^{-\left( \frac{1}{a \alpha ^\beta }\right) } \right\} ^{\frac{1}{\beta }}, \end{aligned}$$
(7)
where q takes random values from a \(U\left( 0,1\right)\), adopting \(a=2\), \(b=3\), \(\alpha =4\) and \(\beta =5\). The estimates were acquired by the maximum likelihood method via BFGS, SANN, and Nelder–Mead.
Figures 1 and 2 display the histogram from simulated data of the \(\mathcal {EGW}\) distribution with density for the \(\mathcal {EGW}\) distribution and the empirical distribution for data set size of \(50,\, 100,\, 500\,\,\, \text{ and }\,\,\, 1000\). The histogram was obtained using the qf of the \(\mathcal {EGW}\) distribution, and the algorithms BFGS, SANN, and Nelder–Mead obtained estimates via MLE.
Next, we present the results of the parameter estimation using the \(\mathcal {EGW}\) distribution. The BFGS method for estimating parameter a proved to be inefficient, even with the increase in the number of simulated data. For parameter b, the estimates showed reasonable results for 500 and 1000 simulated data. However, the method was not satisfactory regarding the \(\alpha\) parameter. Finally, a reasonable result was obtained for the \(\beta\) parameter only for 1000 simulated data.
Regarding the SANN method, the estimation was inefficient for the parameters a and \(\alpha\). The estimates for parameter b were reasonable only from 500 simulated data. For the \(\beta\) parameter, there was a reasonable estimate only when 1000 simulated data was reached.
The Nelder–Mead method did not give satisfactory results for the estimation of parameters a and \(\alpha\). However, it presented a reasonable estimate for parameter b from 500 simulated data, as well as for the \(\beta\) parameter, but only for 1000 simulated data.
In the simulations concerning the estimation of the parameters of the \(\mathcal {EGW}\) distribution, we obtained 81.25% (39/48) of inefficient estimates, 18.75% (9/48) of reasonable estimates and none satisfactory.
The graphs of all methods showed equivalent adjustments; more details are available in “Appendix.” See Table 2 including the standard error (SE) and the mean squared error (MSE) and Figs. 1, 2.
Simulation for \(\mathcal {EW}\) distribution
Although it is a well-known model and numerous other models generalize it, to our knowledge, simulation studies have not been carried out with the \(\mathcal {EW}\) distribution. Samples of size 50, 100, 500, and 1000 were obtained using the qf of the \(\mathcal {EW}\) distribution. The results of the simulations are presented in Table 3. The \(\mathcal {EW}\) qf is given by
$$\begin{aligned} Q_{EW}\left( q\right) =\left\{ \log \left[ 1-q^{\frac{1}{b}}\right] ^ {-\left( \frac{1}{c^\beta }\right) } \right\} ^{\frac{1}{\beta }}, \end{aligned}$$
(8)
where q takes random values from a \(U\left( 0,1\right)\) adopting \(b=3\), \(c=4\) , and \(\beta =5\). We obtain points of the \(\mathcal {EW}\) distribution given by (8).
Figures 3 and 4 present the histogram from simulated data of the \(\mathcal {EW}\) distribution density and the empirical distribution for data size of 50, 100, 500, and 1000 using the \(\mathcal {EW}\) qf, and BFGS, SANN, and Nelder–Mead performed the estimates via MLE.
The estimation of the parameters of the \(\mathcal {EW}\) distribution presented the following results.
For the BFGS method, with only 1000 simulated data, there was a reasonable result in estimating parameter b. Regarding parameter c, with 500 simulated data, we observed a reasonable estimate. However, for 1000 observations, the BFGS method had a satisfactory result. Regarding the \(\beta\) parameter, the estimates were reasonable only from 500 simulated data.
With respect to the SANN method, the estimates for parameter b were reasonable only for 1000 simulated data. For parameter c, there was a reasonable estimate for 500 simulated data. However, for 1000 simulated data, the estimation was satisfactory. For 500 simulated data onwards, the \(\beta\) parameter estimates were reasonable.
Finally, for the Nelder–Mead method, the estimation of parameter b was reasonable only for 1000 simulated data. The estimates for parameter c were reasonable and satisfactory, for 500 and 1000 simulated data, respectively. From 500 simulated data, the estimates for the \(\beta\) parameter were reasonable.
For the simulations generated for the \(\mathcal {EW}\) distribution, we obtained 58.33% (21/36) inefficient estimates, 33.34% (12/36) reasonable and 8.33% (3/36) satisfactory.
Thus, we can observe that the identifiability (reparameterization) of the \(\mathcal {EW}\) distribution provided better results in the simulations, as it decreased the amount of inefficient estimates (81.25% \(\rightarrow\) 58.33%) and increased the amount of reasonable estimates (18.75% \(\rightarrow\) 33.34%) and satisfactory (0% \(\rightarrow\) 8.33%).
The ratio between the execution times (in seconds) of the simulations of the \(\mathcal {EGW}\) and \(\mathcal {EW}\) distributions were as follows: 61,052/31845 (1.92), 164,702/55,106 (2.99), 397,079/231,317 (1.72), and 590,006/390,454 (1.51). These results show that the \(\mathcal {EW}\) distribution requires a much shorter execution time. Thus, the identifiability of the \(\mathcal {EW}\) distribution has the additional advantage of optimizing the time for running computer simulations.
Application with the \(\mathcal {EGW}\) distribution and the \(\mathcal {EW}\) distribution
In this section, we analyze a real data set of Nelore cattle [25] using the \(\mathcal {EGW}\) distribution and the \(\mathcal {EW}\) distribution. The algorithms of BFGS, SANN, and Nelder–Mead performed the maximum likelihood estimates. The commercial production of beef in Brazil, which mostly originates from the Nelore breed, searches to optimize the process to obtain a time for the calves to reach the specific weight from their birth to weaning. We observed the data with 69 Nelore bulls, the time (in days) until the animals achieved the weight of 160kg relative to the period from birth to weaning.
Figure 5 exhibits the results obtained for \(\mathcal {EGW}\) such as the plot 5 and the parameters estimation table (Table 4 in “Appendix”). One can note that the BFGS method performed a better fit concerning the empirical function and to the histogram than the other methods proposed in this article.
Analyzing the plots in Fig. 6 and the results Tables (see the Table 5 in “Appendix”), it is observed that the Nelder–Mead method adjusted the \(\mathcal {EW}\) distribution, concerning the histogram and the empirical function, better than the other methods. Notwithstanding, the estimation of the parameters by the Nelder–Mead method did not produce results for the SE of parameters b and c. Hence, as the estimation of the parameters by the BFGS method was the second-best fit, and the results were also produced for the SE for the parameters b, c and \(\beta\) one can consider that the BFGS method performed the most suitable adjustment for the data via \(\mathcal {EW}\) distribution.
Table 4 (in “Appendix”) shows that the Nelder–Mead method was able to perform the estimation of the parameters of the \(\mathcal {EGW}\) distribution, but there was failure to report the SE, since the produced Hessian returned NaN (abbreviation for Not a Number) for the first row and the first column, whose information refers to the parameter a.
This suggests that the solution found by the Nelder–Mead method is not reliable, in this case, and consequently, that the model adjusted by the estimates of the parameters found is not suitable for these data. This fact may be related to the lack of identifiability of the \(\mathcal {EGW}\) distribution.