Skip to main content

Analyzing and solving the identifiability problem in the exponentiated generalized Weibull distribution

Abstract

The well-known Weibull distribution can be used to model the decreasing and unimodal failure rate quite standard in reliability and biological studies. It is also commonly adopted as baseline to generate new distributions from generalized classes. In this paper, we investigate the identifiability of the exponentiated generalized class of distributions and in particular the exponentiated generalized Weibull distribution. We also develop conditions under which the model becomes identifiable. To further illustrate the identifiability issue, we consider a simulation study, and an application is presented to illustrate the potentialities of the model with the new parameterization.

Introduction

Lately, many authors have proposed new classes of distributions, which are modifications of the cumulative distribution functions (cdf) that provide hazard rate functions (hrf) taking various shapes. We can cite the exponentiated Weibull (\(\mathcal {EW}\))[1, 21, 22], which has an upside-down bathtub (unimodal) hrf form [2]. Carrasco et al. [3] showed a four-parameter distribution denoted generalized modified Weibull distribution whose hrf exhibits non-monotonic shapes such as a bathtub and upside-down bathtub; Gusmão et al. [4] introduced and studied the tri-parametric inverse Weibull generalized distribution that possesses failure rate with unimodal, increasing and decreasing form.

Several families proposed in the literature comprise a source of probability distributions for modeling lifetime data, since, in general, the resulting distribution and the baseline have the same support. Cordeiro et al. [5] proposed a new family, the exponentiated generalized (\(\mathcal {EG}\)) class of distributions, to generalize other distributions. Considering that a random variable T has distribution G, they suggest applying the new class of distributions to generalize any distribution G by

$$\begin{aligned} F_G(t; a, b)= \left\{ 1 - \left[ 1 - G(t)\right] ^a\right\} ^b, \end{aligned}$$
(1)

where \(a > 0\) and \(b > 0\) are two additional shape parameters. The authors point out that the new class of distributions is simpler and more tractable than the generalized beta family [6]. The quantile function (qf) of the new class has closed form. It entails that simulations regarding (1) are easier to perform.

The following well-known baseline distributions have been discussed in recent works for the exponentiated generalized class [5] (this list is not exhaustive): Birnbaum–Saunders distribution [7], generalized gamma distribution [8], Gumbel distribution [9], Dagum distribution [10], Weibull distribution [11], extended exponential distribution [12], arcsine distribution [13], standardized half-logistic distribution [14], extended Pareto distribution [15] standardized Gumbel distribution [16] and extended Gompertz [17].

It is well-known that the addition of parameters to distribution classes can lead to identifiability problems and consequently bring complications to the estimation of parameters in the proposed model. According to [18], a parameter \(\varvec{\theta }\) for a family of distributions \(\left\{ f\left( x, \varvec{\theta } \right) : \varvec{\theta } \in \varvec{\varTheta } \right\}\) is identifiable if different values of \(\varvec{\theta }\) correspond to different probability density functions (pdf) or probability mass functions. That is, if \(\varvec{\theta } \ne \varvec{\theta '}\), then \(f\left( x, \varvec{\theta } \right) \ne f\left( x, \varvec{\theta '} \right)\).

Jones et al. [19] define identifiability as follows: Consider a stack of probabilities \(p_{1},...,p_{n}\), \(n \in {\mathbb {N}}\), within a single vector \(\varvec{\psi }\) with dimensions \(q \times 1\) and the parametric model with a vector \(\varvec{\gamma }\) with dimensions \(r \times 1\). The presented model, implicitly specifies, a function F that determines how \(\varvec{\psi }\) is calculated from \(\varvec{\gamma }\),

$$\begin{aligned} \varvec{\psi }=F\left( \varvec{\gamma }\right) . \end{aligned}$$

Hence, the model will be identifiable if F is an invertible function; it follows that there is a one-to-one correspondence between \(\varvec{\gamma }\) and \(\varvec{\psi }\). If \(\varvec{\gamma }_{1}\ne \varvec{\gamma }_{2}\) and \(F\left( \varvec{\gamma }_{1}\right) = F\left( \varvec{\gamma }_{2}\right)\), the model will have identifiability problems. Nevertheless, Jones et al. [19] state that the model will be locally identifiable in a particular \(\varvec{\gamma }\) if F is an invertible function in the vicinity of \(\varvec{\gamma }\).

In a review paper on statistical identifiability, Paulino and Pereira [20] studied issues like parallelism between parametric identifiability and sample sufficiency. They also discussed how identifiability, measures of sample information and inferential estimation concepts are related. Additionally, classic and Bayesian methods were considered as strategies for making inferences on models with parametric identification problems.

Based on the aforementioned ideas and considering the relation between the parameters of the exponentiated generalized class of distributions and the baseline function, we used the Weibull distribution as a candidate for G. Using Eq. (1) and performing some mathematical manipulations, we obtain a parameterization for the exponentiated generalized Weibull (\(\mathcal {EGW}\)) distribution that was introduced by [11]. It was also studied by [1, 21, 22]. This paper aims to study the similarities that evince the problem of identifiability of the \(\mathcal {EGW}\) distribution.

Methods

The \(\mathcal {EGW}\) distribution and a study on identifiability

The Weibull distribution has received considerable attention in the statistical literature. Many authors have studied the shapes of the density and failure rate functions for the basic model of the Weibull distribution. Let T be a random variable with Weibull distribution, then its cdf can be written as:

$$\begin{aligned} G(t) =1- \exp \left[ - \left( \alpha t \right) ^{\beta }\right] , \quad t > 0, \end{aligned}$$
(2)

where \(\alpha > 0\), \(\beta > 0\).

Replacing G(t) in Eq. (1) by (2), we have

$$\begin{aligned} F_{\mathcal {EGW}}(t; a, b, \alpha , \beta ) = \left\{ 1- \exp \left[ - a \left( \alpha t \right) ^\beta \right] \right\} ^b \end{aligned}$$
(3)

where \(F_{\mathcal {EGW}}(\cdot )\) is the \(\mathcal {EGW}\) cdf. The pdf is given by

$$\begin{aligned} f_{\mathcal {EGW}}(t; a, b, \alpha , \beta ) = a\, b\, \beta \, \alpha ^{\beta }\, t^{\beta -1} \exp \left[ - a \left( \alpha t \right) ^\beta \right] \left\{ 1- \exp \left[ - a \left( \alpha t \right) ^\beta \right] \right\} ^{b-1}, \end{aligned}$$
(4)

where \(\varvec{\theta }=\left( a, b, \alpha , \beta \right)\) is the vector of parameters of \(F_{\mathcal {EGW}}\left( t; a, b, \alpha , \beta \right)\).

Consider that \(\varvec{\varTheta }_{\mathcal {EGW}}\) is the parametric space of the \(\mathcal {EGW}\) distribution, \(\varGamma\) is a specific set of indices and \(\varvec{\theta }_{i}=\left( a_{i}, b_{i}, \alpha _{i}, \beta _{i} \right) \in \varvec{\varTheta }_{\mathcal {EGW}}\) where \(a_{i}, b_{i}, \alpha _{i}, \beta _{i}>0\) for all \(i \in \varGamma\). Let \(F_{\varvec{\varTheta }_{\mathcal {EGW}}}=\left\{ F_{\mathcal {EGW}}\left( t; \varvec{\theta }_{i}\right) : \varvec{\theta }_{i} \in \varvec{\varTheta }_{\mathcal {EGW}}, \forall i \in \varGamma \right\}\) be a family of cdfs of the \(\mathcal {EGW}\) distribution. Given that \(i \ne j\) for all \(i,j \in \varGamma\), if \(\varvec{\theta }_{i} \ne \varvec{\theta }_{j} \Rightarrow F_{\mathcal {EGW}}\left( t; \varvec{\theta }_{i}\right) = F_{\mathcal {EGW}}\left( t; \varvec{\theta }_{j}\right)\), we say that \(\varvec{\varTheta }_{\mathcal {EGW}}\) is not identifiable.

Let \(\varvec{\theta }_{i}\) and \(\varvec{\theta }_{j}\) be such that \(\varvec{\theta }_{i} \ne \varvec{\theta }_{j}\) with \(a_{i} \ne a_{j}\), \(b_{i} = b_{j}=b\), \(\alpha _{i} \ne \alpha _{j}\) and \(\beta _{i} = \beta _{j}=\beta\). Then, by hypothesis, we have that

$$\begin{aligned} \alpha _{i} \ne \alpha _{j} \Rightarrow \left( \alpha _{i} t\right) ^{\beta } \ne \left( \alpha _{j} t\right) ^{\beta }. \end{aligned}$$

Take \(a_{i}=\frac{a_{j} \alpha _{j}^{\beta }}{\alpha _{i}^{\beta }}\), then

$$\begin{aligned} \exists \quad a_{i} \ne a_{j} : a_{i} \left( \alpha _{i} t\right) ^{\beta } = a_{j} \left( \alpha _{j} t\right) ^{\beta } \Rightarrow F_{\mathcal {EGW}}\left( t; \varvec{\theta }_{i}\right) = F_{\mathcal {EGW}}\left( t; \varvec{\theta }_{j}\right) . \end{aligned}$$

Therefore, the \(\varvec{\varTheta }_{\mathcal {EGW}}\) is not identifiable.

The \(\mathcal {EW}\) distribution and a study on identifiability

The reparameterization performed on the parameters \(\alpha a^{\frac{1}{\beta }}\) solves the problem of identifiability, see the work of [23], where a is the parameter recently introduced. Without this reparameterization various values of a and \(\alpha\) satisfy the relation \(c=a \alpha ^{\beta }\) for fixed value of c. With the cited relation it is possible to rewrite Eq. (3), obtaining the \(\mathcal {EW}\) cdf:

$$\begin{aligned} F_{\mathcal {EW}}(t; b, c, \beta ) = \left\{ 1-\exp \left[ -(c t)^{\beta } \right] \right\} ^b, \end{aligned}$$
(5)

wherein \(b > 0\) is the shape parameter, and \(c > 0\) is the scale parameter. Hence, the \(\mathcal {EW}\) distribution has three parameters, and its pdf is given by

$$\begin{aligned} f_{\mathcal {EW}}(t; b, c, \beta ) = \beta \, b\, c^\beta \, t^{\left( \beta -1\right) } \exp \left[ -(c t)^{\beta } \right] \left\{ 1-\exp \left[ -(c t)^{\beta } \right] \right\} ^{b-1}. \end{aligned}$$
(6)

Consider that \(\varvec{\varTheta }_{\mathcal {EW}}\) is the parametric space of the \(\mathcal {EW}\) distribution, \(\varGamma\) is a specific set of indices and \(\varvec{\theta }_{i}=\left( b_{i}, c_{i}, \beta _{i} \right) \in \varvec{\varTheta }_{\mathcal {EW}}\) where \(b_{i}, c_{i}, \beta _{i}>0\) for all \(i \in \varGamma\). Let \(F_{\varvec{\varTheta }_{\mathcal {EW}}}=\left\{ F_{\mathcal {EW}}\left( t; \varvec{\theta }_{i}\right) : \varvec{\theta }_{i} \in \varvec{\varTheta }_{\mathcal {EW}}, \forall i \in \varGamma \right\}\) be a family of cdfs of the \(\mathcal {EW}\) distribution. Given that \(i \ne j\) for all \(i,j \in \varGamma\), if \(\varvec{\theta }_{i} \ne \varvec{\theta }_{j} \Rightarrow F_{\mathcal {EW}}\left( t; \varvec{\theta }_{i}\right) = F_{\mathcal {EW}}\left( t; \varvec{\theta }_{j}\right)\), we say that \(\varvec{\varTheta }_{\mathcal {EW}}\) is not identifiable.

The vector \(\varvec{\theta }_{i}\) differs from \(\varvec{\theta }_{j}\) in seven ways. Next, consider Case 1. Let \(\varvec{\theta }_{i}\) and \(\varvec{\theta }_{j}\) such that \(\varvec{\theta }_{i} \ne \varvec{\theta }_{j}\) with \(b_{i} \ne b_{j}\), \(c_{i} = c_{j} = c\) and \(\beta _{i} = \beta _{j} = \beta\). Then, from this hypothesis, we have the following chain of implications:

$$\begin{aligned}&b_{i} \ne b_{j} \Rightarrow \left\{ 1-\exp \left[ -(c t)^{\beta } \right] \right\} ^{b_{i}} \ne \left\{ 1-\exp \left[ -(c t)^{\beta } \right] \right\} ^{b_{j}} \\&\qquad \Rightarrow F_{\mathcal {EW}}\left( t; \varvec{\theta }_{i}\right) \ne F_{\mathcal {EW}}\left( t; \varvec{\theta }_{j}\right) . \end{aligned}$$

Table 1 summarizes the proof of identifiability for each of the other cases from the hypothesis, and also displays its appropriate implications.

Table 1 Proof that \(\varvec{\varTheta }_{\mathcal {EW}}\) is identifiable

Therefore, the \(\varvec{\varTheta }_{\mathcal {EW}}\) is identifiable.

Note that \(F_{\mathcal {EGW}}\) and \(F_{\mathcal {EW}}\) are equal functions, as long as they have the same domain and image set. However, \(F_{\mathcal {EW}}\) as an identifiable cdf has reliable estimation which is quite different from \(F_{\mathcal {EGW}}\). Let \(F_{\mathcal {EGW}}\left( t; \varvec{\theta }\right)\) for all \(t > 0\) and \(\varvec{\theta }=\left( a, b, \alpha , \beta \right)\). Hence,

$$\begin{aligned} F_{\mathcal {EGW}}\left( t; \varvec{\theta }\right) =\left\{ 1-\exp \left[ -a(\alpha t)^{\beta } \right] \right\} ^b =\left\{ 1-\exp \left[ -a \alpha ^{\beta } t^{\beta } \right] \right\} ^b. \end{aligned}$$

Let \(c^{\beta }=a \alpha ^{\beta }\) where \(c > 0\), hence we have that

$$\begin{aligned} F_{\mathcal {EGW}}\left( t; \varvec{\theta }\right) =\left\{ 1-\exp \left[ -c^{\beta } t^{\beta } \right] \right\} ^b =\left\{ 1-\exp \left[ -\left( c t\right) ^{\beta } \right] \right\} ^b =F_{\mathcal {EW}}\left( t; \varvec{\theta }' \right) \end{aligned}$$

where \(\varvec{\theta }'=\left( b, c, \beta \right)\).

Therefore, \(F_{\mathcal {EGW}}\left( t; \varvec{\theta }\right) =F_{\mathcal {EW}}\left( t; \varvec{\theta }' \right)\) for all \(t > 0\).

Results and discussion

Monte Carlo simulations based on \(\mathcal {EGW}\) and \(\mathcal {EW}\) models

Computational experiments play an important role in probability and statistics since they can verify the validity of a hypothesis, examine the performance of something new or demonstrate a known truth. In this section, we present the estimates of the parameters under the maximum likelihood method for the \(\mathcal {EGW}\) and \(\mathcal {EW}\) models. They were obtained via BFGS, SANN, and Nelder–Mead, implemented in R OPTIM function [24]. For this, we implemented two other functions to automate the simulations: fitDist and getSimulation. The pseudo-codes of those algorithms as well as these functions can be seen in “Appendix.” Nowadays, with the available computational resources, such as parallel processing of many cores and multiple processes, it is possible speed-up the results of the computational simulations. Therefore, we run the simulations on parallel processes to explore the high-performance computing and runtime optimization. Thus, the results of the simulations as well as their execution times were gathered from a notebook Intel® Core\(^{{\mathrm{TM}}}\) i5-7200U, CPU 2.50 GHz, 2712 Mhz, 2 cores, 4 logical processors, RAM 8.00 GB, Microsoft® Windows 10 Home Single Language, X64 system, R\(^{\copyright }\) version 3.6.1, and RStudio\(^{\copyright }\) version 1.2.5001.

Simulation for the \(\mathcal {EGW}\) distribution

Samples of size \(50,\, 100,\, 500\,\,\, \text{ and } \,\,\,1000\) were obtained using the \(\mathcal {EGW}\) qf given by

$$\begin{aligned} Q_{\mathcal {EGW}}\left( q\right) =\left\{ \log \left[ 1-q^{\frac{1}{b}}\right] ^{-\left( \frac{1}{a \alpha ^\beta }\right) } \right\} ^{\frac{1}{\beta }}, \end{aligned}$$
(7)

where q takes random values from a \(U\left( 0,1\right)\), adopting \(a=2\), \(b=3\), \(\alpha =4\) and \(\beta =5\). The estimates were acquired by the maximum likelihood method via BFGS, SANN, and Nelder–Mead.

Figures 1 and 2 display the histogram from simulated data of the \(\mathcal {EGW}\) distribution with density for the \(\mathcal {EGW}\) distribution and the empirical distribution for data set size of \(50,\, 100,\, 500\,\,\, \text{ and }\,\,\, 1000\). The histogram was obtained using the qf of the \(\mathcal {EGW}\) distribution, and the algorithms BFGS, SANN, and Nelder–Mead obtained estimates via MLE.

Fig. 1
figure1

Estimated densities for \(\mathcal {EGW}\) distribution and the distribution of the empirical values for the sets of simulated data of sizes 50 and 100

Fig. 2
figure2

Estimated densities for \(\mathcal {EGW}\) distribution and the distribution of the empirical values for the sets of simulated data of sizes 500 and 1000

Next, we present the results of the parameter estimation using the \(\mathcal {EGW}\) distribution. The BFGS method for estimating parameter a proved to be inefficient, even with the increase in the number of simulated data. For parameter b, the estimates showed reasonable results for 500 and 1000 simulated data. However, the method was not satisfactory regarding the \(\alpha\) parameter. Finally, a reasonable result was obtained for the \(\beta\) parameter only for 1000 simulated data.

Regarding the SANN method, the estimation was inefficient for the parameters a and \(\alpha\). The estimates for parameter b were reasonable only from 500 simulated data. For the \(\beta\) parameter, there was a reasonable estimate only when 1000 simulated data was reached.

The Nelder–Mead method did not give satisfactory results for the estimation of parameters a and \(\alpha\). However, it presented a reasonable estimate for parameter b from 500 simulated data, as well as for the \(\beta\) parameter, but only for 1000 simulated data.

In the simulations concerning the estimation of the parameters of the \(\mathcal {EGW}\) distribution, we obtained 81.25% (39/48) of inefficient estimates, 18.75% (9/48) of reasonable estimates and none satisfactory.

The graphs of all methods showed equivalent adjustments; more details are available in “Appendix.” See Table 2 including the standard error (SE) and the mean squared error (MSE) and Figs. 1, 2.

Simulation for \(\mathcal {EW}\) distribution

Although it is a well-known model and numerous other models generalize it, to our knowledge, simulation studies have not been carried out with the \(\mathcal {EW}\) distribution. Samples of size 50, 100, 500, and 1000 were obtained using the qf of the \(\mathcal {EW}\) distribution. The results of the simulations are presented in Table 3. The \(\mathcal {EW}\) qf is given by

$$\begin{aligned} Q_{EW}\left( q\right) =\left\{ \log \left[ 1-q^{\frac{1}{b}}\right] ^ {-\left( \frac{1}{c^\beta }\right) } \right\} ^{\frac{1}{\beta }}, \end{aligned}$$
(8)

where q takes random values from a \(U\left( 0,1\right)\) adopting \(b=3\), \(c=4\) , and \(\beta =5\). We obtain points of the \(\mathcal {EW}\) distribution given by (8).

Figures 3 and 4 present the histogram from simulated data of the \(\mathcal {EW}\) distribution density and the empirical distribution for data size of 50, 100, 500, and 1000 using the \(\mathcal {EW}\) qf, and BFGS, SANN, and Nelder–Mead performed the estimates via MLE.

Fig. 3
figure3

Estimated densities for \(\mathcal {EW}\) distribution and the distribution of the empirical values for the sets of simulated data of sizes 50 and 100

Fig. 4
figure4

Estimated densities for \(\mathcal {EW}\) distribution and the distribution of the empirical values for the sets of simulated data of sizes 500 and 1000

The estimation of the parameters of the \(\mathcal {EW}\) distribution presented the following results.

For the BFGS method, with only 1000 simulated data, there was a reasonable result in estimating parameter b. Regarding parameter c, with 500 simulated data, we observed a reasonable estimate. However, for 1000 observations, the BFGS method had a satisfactory result. Regarding the \(\beta\) parameter, the estimates were reasonable only from 500 simulated data.

With respect to the SANN method, the estimates for parameter b were reasonable only for 1000 simulated data. For parameter c, there was a reasonable estimate for 500 simulated data. However, for 1000 simulated data, the estimation was satisfactory. For 500 simulated data onwards, the \(\beta\) parameter estimates were reasonable.

Finally, for the Nelder–Mead method, the estimation of parameter b was reasonable only for 1000 simulated data. The estimates for parameter c were reasonable and satisfactory, for 500 and 1000 simulated data, respectively. From 500 simulated data, the estimates for the \(\beta\) parameter were reasonable.

For the simulations generated for the \(\mathcal {EW}\) distribution, we obtained 58.33% (21/36) inefficient estimates, 33.34% (12/36) reasonable and 8.33% (3/36) satisfactory.

Thus, we can observe that the identifiability (reparameterization) of the \(\mathcal {EW}\) distribution provided better results in the simulations, as it decreased the amount of inefficient estimates (81.25% \(\rightarrow\) 58.33%) and increased the amount of reasonable estimates (18.75% \(\rightarrow\) 33.34%) and satisfactory (0% \(\rightarrow\) 8.33%).

The ratio between the execution times (in seconds) of the simulations of the \(\mathcal {EGW}\) and \(\mathcal {EW}\) distributions were as follows: 61,052/31845 (1.92), 164,702/55,106 (2.99), 397,079/231,317 (1.72), and 590,006/390,454 (1.51). These results show that the \(\mathcal {EW}\) distribution requires a much shorter execution time. Thus, the identifiability of the \(\mathcal {EW}\) distribution has the additional advantage of optimizing the time for running computer simulations.

Application with the \(\mathcal {EGW}\) distribution and the \(\mathcal {EW}\) distribution

In this section, we analyze a real data set of Nelore cattle [25] using the \(\mathcal {EGW}\) distribution and the \(\mathcal {EW}\) distribution. The algorithms of BFGS, SANN, and Nelder–Mead performed the maximum likelihood estimates. The commercial production of beef in Brazil, which mostly originates from the Nelore breed, searches to optimize the process to obtain a time for the calves to reach the specific weight from their birth to weaning. We observed the data with 69 Nelore bulls, the time (in days) until the animals achieved the weight of 160kg relative to the period from birth to weaning.

Fig. 5
figure5

a Estimated density for \(\mathcal {EGW}\) distribution and the empirical distribution for the set of Nelore data. b Estimated survival function for \(\mathcal {EGW}\) distribution and the Kaplan–Meier distribution for the set of Nelore data with a confidence interval of 0.95

Fig. 6
figure6

a Estimated density for \(\mathcal {EW}\) distribution and the empirical distribution for the set of Nelore data. b Estimated survive function for \(\mathcal {EW}\) distribution and the Kaplan–Meier distribution for the set of Nelore data with a confidence interval of 0.95

Figure 5 exhibits the results obtained for \(\mathcal {EGW}\) such as the plot 5 and the parameters estimation table (Table 4 in “Appendix”). One can note that the BFGS method performed a better fit concerning the empirical function and to the histogram than the other methods proposed in this article.

Analyzing the plots in Fig. 6 and the results Tables (see the Table 5 in “Appendix”), it is observed that the Nelder–Mead method adjusted the \(\mathcal {EW}\) distribution, concerning the histogram and the empirical function, better than the other methods. Notwithstanding, the estimation of the parameters by the Nelder–Mead method did not produce results for the SE of parameters b and c. Hence, as the estimation of the parameters by the BFGS method was the second-best fit, and the results were also produced for the SE for the parameters b, c and \(\beta\) one can consider that the BFGS method performed the most suitable adjustment for the data via \(\mathcal {EW}\) distribution.

Table 4 (in “Appendix”) shows that the Nelder–Mead method was able to perform the estimation of the parameters of the \(\mathcal {EGW}\) distribution, but there was failure to report the  SE, since the produced Hessian returned NaN (abbreviation for Not a Number) for the first row and the first column, whose information refers to the parameter a.

This suggests that the solution found by the Nelder–Mead method is not reliable, in this case, and consequently, that the model adjusted by the estimates of the parameters found is not suitable for these data. This fact may be related to the lack of identifiability of the \(\mathcal {EGW}\) distribution.

Conclusions

In this study, we presented a technique to reduce the parameters of the exponentiated generalized Weibull distribution \(\mathcal {(EGW)}\). Additionally, we identified that the exponentiated Weibull distribution \(\mathcal {(EW)}\) displayed more parsimony and identifiability in the parameters than the \(\mathcal {EGW}\). The performances of the two distributions were analyzed using simulated and a real dataset; the \(\mathcal {EW}\) performed slightly better with simulated data and lightly worse with real data.

Availability of data and materials

All data generated or analyzed during this study are included in this published article.

Abbreviations

cdf:

cumulative distribution function

hrf:

hazard rate function

\(\mathcal {EW}\)::

exponentiated Weibull

\(\mathcal {EGW}\)::

exponentiated generalized Weibull

qf:

quantile function

pdf:

probability density function

BFGS :

Brogden–Fletcher–Golfarb–Shanno

SANN :

simulated annealing

SE :

standard error

MSE :

mean squared error

References

  1. 1.

    Mudholkar, G.S., Srivastava, D.K., Freimer, M.: The exponentiated Weibull family: a reanalysis of the bus-motor-failure data. Technometrics 37(4), 436–445 (1995)

    Article  Google Scholar 

  2. 2.

    Xie, M., Lai, C.D.: Reliability analysis using an additive Weibull model with bathtub-shaped failure rate function. Reliab. Eng. Syst. Saf. 52(1), 87–93 (1996)

    Article  Google Scholar 

  3. 3.

    Carrasco, J.F., Ortega, E.M.M., Cordeiro, G.M.: A generalized modified Weibull distribution for lifetime modeling. Comput. Stat. Data Anal. 53(2), 450–462 (2008)

    MathSciNet  Article  Google Scholar 

  4. 4.

    Gusmão, F.R.S., Ortega, E.M.M., Cordeiro, G.M.: The generalized inverse Weibull distribution. Stat. Pap. 52, 591–619 (2011)

    MathSciNet  Article  Google Scholar 

  5. 5.

    Cordeiro, G.M., Ortega, E.M.M., Cunha, D.C.: The exponentiated generalized class of distributions. J. Data Sci. 11, 1–27 (2013)

    MathSciNet  Article  Google Scholar 

  6. 6.

    Eugene, N., Lee, C., Famoye, F.: Beta-normal distribution and its applications. Commun. Stat. Theory Methods 31, 497–512 (2002)

    MathSciNet  Article  Google Scholar 

  7. 7.

    Cordeiro, G.M., Lemonte, A.J.: The exponentiated generalized Birnbaum–Saunders distribution. Appl. Math. Comput. 247, 762–779 (2014)

    MathSciNet  MATH  Google Scholar 

  8. 8.

    Silva, R., Gomes-Silva, F., Ramos, M., Cordeiro, G.M.: A new extended gamma generalized model. Int. J. Pure Appl. Math. 100(2), 309–335 (2015)

    Article  Google Scholar 

  9. 9.

    Andrade, T., Rodrigues, H., Bourguignon, M., Cordeiro, G.M.: The exponentiated generalized Gumbel distribution. Rev. Colomb. Estad. 38(1), 123–143 (2015)

    MathSciNet  Article  Google Scholar 

  10. 10.

    Gomes-Silva, F., Silva, R., Percontini, A., Ramos, M., Cordeiro, G.M.: An extended Dagum distribution: properties and applications. Int. J. Appl. Math. Stat. 56, 35–56 (2017)

    MathSciNet  Google Scholar 

  11. 11.

    Oguntunde, P.E., Odetunmibi, O.A., Adejumo, A.O.: On the exponentiated generalized Weibull distribution: a generalization of the Weibull distribution. Indian J. Sci. Technol. 8(35), 67611 (2015)

    Article  Google Scholar 

  12. 12.

    Andrade, T.A.N., Bourguignon, M., Cordeiro, G.M.: The exponentiated generalized extended exponential distribution. J. Data Sci. 14, 393–414 (2016)

    Article  Google Scholar 

  13. 13.

    Cordeiro, G.M., Lemonte, A., Campelo, A.K.: Extended arcsine distribution to proportional data: properties and applications. Stud. Sci. Math. Hung. 53, 440–466 (2017)

    MathSciNet  MATH  Google Scholar 

  14. 14.

    Cordeiro, G.M., Andrade, T.A.N., Bourguignon, M., Gomes-Silva, F.: The exponentiated generalized standardized half-logistic distribution. Int. J. Stat. Probab. 6, 24–42 (2017)

    Article  Google Scholar 

  15. 15.

    Andrade, T., Zea, L.: The exponentiated generalized extended pareto distribution. J. Data Sci. 16, 781–800 (2018)

    Article  Google Scholar 

  16. 16.

    Andrade, T., Gomes-Silva, F., Zea, L.: Mathematical properties, application and simulation for the exponentiated generalized standardized Gumbel distribution. Acta Sci. Technol. 41, 1807–8664 (2019)

  17. 17.

    Andrade, T., Chakraborty, S., Handique, L., Gomes-Silva, F.: The exponentiated generalized extended Gompertz distribution. J. Data Sci. 17, 299–330 (2019)

    Article  Google Scholar 

  18. 18.

    Casella, G., Berger, R.L.: Statistical Inference. Duxbury Press, Belmont (2001)

    MATH  Google Scholar 

  19. 19.

    Jones, G., Johnson, W.O., Hanson, T.E., Christensen, R.: Identifiability of models for multiple diagnostic testing in the absence of a gold standard. Biometrics 66(3), 855–863 (2010)

    MathSciNet  Article  Google Scholar 

  20. 20.

    Paulino, C.D., Pereira, C.A.B.: On identifiability of parametric statistical models. Stat. Methods Appl. J. Ital. Stat. Soc. 3, 125–151 (1994)

    Article  Google Scholar 

  21. 21.

    Mudholkar, G.S., Srivastava, D.K.: Exponentiated Weibull family for analyzing bathtub failure-rate data. IEEE Trans. Reliab. 42(2), 299–302 (1993)

    Article  Google Scholar 

  22. 22.

    Jiang, R., Murthy, D.N.P.: The exponentiated Weibull family: a graphical approach. IEEE Trans. Reliab. 48(1), 68–72 (1999)

    Article  Google Scholar 

  23. 23.

    Gusmão, F.R.S., Ortega, E.M.M., Cordeiro, G.M.: Reply to the Letter to the Editor of M. C. Jones. Stat. Pap. 53, 252–254 (2012)

    Article  Google Scholar 

  24. 24.

    R Core Team: R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna (2012)

  25. 25.

    Colosimo, E.A., Giolo, S.R.: Análise de sobrevivência aplicada. Edgar Blucher, São Paulo (2006)

    Google Scholar 

  26. 26.

    Broyden, C.G.: The convergence of a class of double-rank minimization algorithms 1. General considerations. IMA J. Appl. Math. 6(1), 76–90 (1970)

    Article  Google Scholar 

  27. 27.

    Fletcher, R.: A new approach to variable metric algorithms. Comput. J. 13(3), 317–322 (1970)

    Article  Google Scholar 

  28. 28.

    Goldfarb, D.: A family of variable metric updates derived by variational means. Math. Comput. 24(109), 23–26 (1970)

    Article  Google Scholar 

  29. 29.

    Shanno, D.F.: Conditioning of quasi-Newton methods for function minimization. Math. Comput. 24(111), 647–656 (1970)

    MathSciNet  Article  Google Scholar 

  30. 30.

    Metropolis, N., Rosenbluth, A.W., Rosenbluth, M.N., Teller, A.H., Teller, E.: Equation of state calculations by fast computing machines. J. Chem. Phys. 21(6), 1087–1092 (1953)

    Article  Google Scholar 

  31. 31.

    Kirkpatrick, S., Gelatt, C.D., Vecchi, M.P.: Optimization by simulated annealing. Science 220(4598), 671–680 (1983)

    MathSciNet  Article  Google Scholar 

  32. 32.

    Bélisle, C.J.P.: Convergence theorems for a class of simulated annealing algorithms on \({\cal{R}}^d\). J. Appl. Probab. 29(4), 885–895 (1992)

  33. 33.

    Cortez, P.: Modern Optimization with R. Springer, Berlin (2014)

    Book  Google Scholar 

  34. 34.

    Nelder, J.A., Mead, R.: A simplex method for function minimization. Comput. J. 7(4), 308–313 (1965)

    MathSciNet  Article  Google Scholar 

  35. 35.

    Givens, G.H., Hoeting, J.A.: Computational Statistics, 2nd edn. Wiley, London (2013)

    MATH  Google Scholar 

Download references

Acknowledgements

Not applicable.

Funding

No funding was received.

Author information

Affiliations

Authors

Contributions

FRSG (was a major contributor in writing the manuscript) involved in writing—original draft, methodology and investigation; FGS involved in writing—review and editing, supervision and validation; CCRB involved in visualization, supervision and validation; FVJS involved in methodology and visualization; JSJ involved in methodology and software; SFAXJ involved in writing—review and editing and visualization; PRDM involved in methodology and software. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Frank Gomes-Silva.

Ethics declarations

Competing interests

The authors declare that they have no competing interests

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix

See Tables 2, 3, 4 and 5.

Table 2 MLE estimates for the parameters of \(\mathcal {EGW}\) distribution with simulated data from \(\mathcal {EGW}\) distribution via BFGS, SANN, and Nelder–Mead algorithms
Table 3 MLE estimates for the parameters of \(\mathcal {EW}\) distribution with simulated data from \(\mathcal {EW}\) distribution via BFGS, SANN, and Nelder–Mead algorithms
Table 4 MLE estimates for the parameters of \(\mathcal {EGW}\) distribution with Nelore data via BFGS, SANN, and Nelder–Mead algorithms
Table 5 MLE estimates for the parameters of \(\mathcal {EW}\) distribution with Nelore data via BFGS, SANN, and Nelder–Mead algorithms

BFGS algorithm

Henceforward, the following notation is used: p is the number of parameters to be estimated, \(\varvec{\theta }=(\theta _1, \ldots , \theta _p)^{\top }\in \varTheta\) is the vector of unknown parameters, \(\varvec{\theta _0}=(\theta _{01}, \ldots , \theta _{0p})^{\top }\in \varTheta\) the initial guess solution, f the objective function (minimization by default) representing the log-likelihood function \(\ell (\varvec{\theta }\vert x)\), and x the dataset. The BFGS is a Quasi-Newton second-derivative line search method used to solve unconstrained optimization problems. Algorithm 1 shows the pseudo-code of the BFGS algorithm [26,27,28,29].

figurea

SANN algorithm

Annealing is the physical process used to melt metals, which are heated to high temperatures and then cooled slowly, producing a homogeneous material. The simulated annealing (SA) algorithm was originally proposed by [30], being developed later by [31] in the context of optimization problem. The SANN is a variant of SA given in [32], and its pseudo-code is presented in Algorithm 2, adapted from [33].

figureb

Nelder–Mead algorithm

The [34] simplex method is an algorithm of unconstrained optimization that belongs to a more general class of direct search whose objective is to find the minimum of a function f. Algorithm 3 shows the pseudo-code of Nelder–Mead algorithm [35].

figurec

fitDist function

Algorithm 4 shows the pseudo-code to the fitDist function. This function is used to obtain parameter estimates as well as their log-likelihood, variance, confidence interval and MSE.

figured

getSimulation function

Algorithm 5 shows the pseudo-code to the function getSimulation. This is the main routine for generating the simulations of the distributions.

figuree

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Gusmão, F.R.S.d., Gomes-Silva, F., Brito, C.C.R.d. et al. Analyzing and solving the identifiability problem in the exponentiated generalized Weibull distribution. J Egypt Math Soc 29, 21 (2021). https://doi.org/10.1186/s42787-021-00130-x

Download citation

Keywords

  • Exponentiated
  • Generalized
  • Weibull
  • Identifiability
  • Simulation
  • Application

Mathematics Subject Classification

  • 62F10
  • 97K50
  • 97K60