Title: | Maximum Likelihood and Bayesian Estimation of Univariate Probability Distributions |
---|---|
Description: | Estimate parameters of univariate probability distributions with maximum likelihood and Bayesian methods. |
Authors: | Francois Aucoin |
Maintainer: | Thomas Petzoldt <[email protected]> |
License: | GPL-2 |
Version: | 1.3.7 |
Built: | 2024-11-10 04:45:02 UTC |
Source: | https://github.com/tpetzoldt/famle |
This package contains a series of functions that might be useful in carrying out maximum likelihood and Bayesian estimations of univariate probability distributions.
Francois Aucoin (author and original maintainer), Thomas Petzoldt (actual maintainer, applied formal changes to pass CRANs package check). Many thanks to the original author for his agreement.
This function allows the user to obtain draws from the (parametric) bootstrap distribution of the fitted model's parameters.
boot.mle(model, B = 200, seed = NULL, start = NULL, method = "Nelder-Mead")
boot.mle(model, B = 200, seed = NULL, start = NULL, method = "Nelder-Mead")
model |
|
B |
Requested number of bootstrap samples. |
seed |
A seed may be specified (see |
start |
Starting values for the optimization algorithm (if |
method |
Parametric bootstrap – see References.
model |
|
B |
Requested number of bootstrap samples. |
seed |
The specified seed (see |
par.star |
Array containing realized values from the bootstrap distribution of the maximum likelihood parameter estimators. |
gof |
The bootstrap distributions of two goodness-of-fit statistics: Anderson-Darling statistic and Pearson's correlation coefficient for the pair ("observed quantiles","fitted quantiles"). |
p.value |
Bootstrap p-values for the two goodness-of-fit statistics. |
failure.rate |
The proportion of bootstrap samples for which optimization failed using the specified starting values. |
total.time |
The total amount of time required to generate |
Davison, A.C., and Hinkley, D.V. (1997). Bootstrap methods and their application. Cambridge University Press.
data(yarns) x <- yarns$x fit.x <- mle(x,'weibull',c(.1,.1)) boot.x <- boot.mle(fit.x,B=10) boot.x$par.star boot.x$p.value
data(yarns) x <- yarns$x fit.x <- mle(x,'weibull',c(.1,.1)) boot.x <- boot.mle(fit.x,B=10) boot.x$par.star boot.x$p.value
This dataset is taken from Coles (2001) (also see references therein), and consists of 64 sea level (in meters) yearly maxima for the time period 1923-1987.
data(ColesData)
data(ColesData)
A data.frame
containing two columns named year
and sea.level
(in meters).
Coles (2001), page 4 (also see references therein).
Coles, S. (2001). An introduction to statistical modeling of extreme values. Springer.
This function can be used to call any of the 4 functions specific to a given probability distribution available in R.
distr(x, dist, param, type = "d", model = NULL, ...)
distr(x, dist, param, type = "d", model = NULL, ...)
x |
Vector (or array) of quantiles, vector (or array) of probabilities, or number of observations. |
dist |
Distribution name. |
param |
Vector (or array) of parameters. |
type |
Type of function to be called ( |
model |
Object from the class |
... |
Additional arguments |
For each distribution available in R, 4 functions can be called. For example, for the normal distribution, the following 4 functions are available: dnorm
, pnorm
, qnorm
, and rnorm
. For the normal distribution, based on the argument type
, distr
may be used to call any one of the previous four functions.
Returns the density, the distribution function, the quantile function, or random variates.
Most functions in FAmle
rely upon distr
.
## Example 1 dnorm(-4:4,0,1,log=TRUE) distr(-4:4,'norm',c(0,1),type='d',log=TRUE) ## Example 2 mu.vec <- c(1,100,100) sigma.vec <- c(1,11,111) n <- 3 set.seed(123) rnorm(n,mu.vec,sigma.vec) set.seed(123) distr(n,'norm',cbind(mu.vec,sigma.vec),'r') ## Example 3 qnorm(.9,mu.vec,sigma.vec) distr(.9,'norm',cbind(mu.vec,sigma.vec),'q')
## Example 1 dnorm(-4:4,0,1,log=TRUE) distr(-4:4,'norm',c(0,1),type='d',log=TRUE) ## Example 2 mu.vec <- c(1,100,100) sigma.vec <- c(1,11,111) n <- 3 set.seed(123) rnorm(n,mu.vec,sigma.vec) set.seed(123) distr(n,'norm',cbind(mu.vec,sigma.vec),'r') ## Example 3 qnorm(.9,mu.vec,sigma.vec) distr(.9,'norm',cbind(mu.vec,sigma.vec),'q')
Internal functions in the FAmle package .
cdf.plot(z) delta.Q(p, model, ln = FALSE) delta.QQ(model, alpha = 0.1, ln = FALSE) Diff.1(x, f, h = 1e-04) Diff.2(k, i, model, p, ln = FALSE) Diff.3(i, model, p, ln = FALSE) ## S3 method for class 'metropolis' hist(x, density = TRUE, ...) ## S3 method for class 'plot' hist(x,...) Plot.post.pred(x, ...) post.pred(z, fun = NULL) Quantile.plot(z, ci = FALSE, alpha = 0.05) Return.plot(model, ci = FALSE, alpha = 0.05) Carlin(x)
cdf.plot(z) delta.Q(p, model, ln = FALSE) delta.QQ(model, alpha = 0.1, ln = FALSE) Diff.1(x, f, h = 1e-04) Diff.2(k, i, model, p, ln = FALSE) Diff.3(i, model, p, ln = FALSE) ## S3 method for class 'metropolis' hist(x, density = TRUE, ...) ## S3 method for class 'plot' hist(x,...) Plot.post.pred(x, ...) post.pred(z, fun = NULL) Quantile.plot(z, ci = FALSE, alpha = 0.05) Return.plot(model, ci = FALSE, alpha = 0.05) Carlin(x)
z |
A |
p |
A vector of probabilities. |
model |
A |
ln |
Whether or not (TRUE or FALSE) computations should be carried out on the natural logarithmic scale. |
alpha |
The significance level. |
x |
Value at which the numerical derivative should be evaluated. For the |
f |
A function to be differentiated. |
h |
Small number representing a small change in |
k |
Parameter value at which the first derivative should be evaluated. |
i |
Position of the parameter, within a vector of parameters, with respect to which differentiation should be carried out. |
density |
Whether or not (TRUE or FALSE) a Kernel density should be added to the histogram - see |
... |
Additional arguments pertaining to |
fun |
optional argument that may be used to modify the scale on which the histogram will be plotted. |
ci |
Whether or not (TRUE or FALSE) approximated 100*(1- |
floodsNB
is a list
object containing the hydrometric stations considered for analysis... Each element from the list
corresponds to an hydrometric station located in the Canadian province of New Brunswick, for which the flow is unregulated. For each station, the following information is available:
data
: Maximum annual daily mean discharge (in );
peak
: Maximum annual daily peak discharge (in );
ln.drain
: Natural logarithm of the drainage area (in );
coor
: Coordinates (in latitude and longitude) of the hydrometric station;
status
: Station's status - Active
or Inactive
;
Aucoin.2001
: Whether or not (TRUE
or FALSE
) the station is retained for analysis in ....
data(floodsNB)
data(floodsNB)
A list
object whose elements correspond to distinct hydrometric stations.
HYDAT database.
Environment and Climate Change Canada Historical Hydrometric Data web site, https://wateroffice.ec.gc.ca/mainmenu/historical_data_index_e.html
For a given dataset, this function serves to approximate (using a Metropolis algorithm) the posterior distribution of the parameters for some specified parametric probability distribution.
metropolis(model, iter = 1000, tun = 2, trans.list = NULL, start = NULL, variance = NULL, prior = NULL, burn = 0, uniroot.interval = c(-100, 100),pass.down.to.C=FALSE)
metropolis(model, iter = 1000, tun = 2, trans.list = NULL, start = NULL, variance = NULL, prior = NULL, burn = 0, uniroot.interval = c(-100, 100),pass.down.to.C=FALSE)
model |
A |
iter |
The requested number of iterations - the Markov Chain's length. |
tun |
A tuning constant; value by which the covariance matrix of the multivariate normal proposal will be multiplied - see References. |
trans.list |
A |
start |
A vector of starting values for the algorithm. If
|
variance |
Covariance matrix of the multivariate normal proposal
distribution. If |
prior |
A function that corresponds to the joint prior distribution (see Example). Note that the prior distribution will be evaluated on the transformed parameter space(s). |
burn |
Burn-in period (see References). |
uniroot.interval |
Default is c(-100,100). This interval is used
by |
pass.down.to.C |
If |
This function uses a single block Metropolis algorithm with
multivariate normal proposal. For this function to work properly, all
parameters should be defined on the real line - parameter
transformation(s) might be required. If trans.list
is not
specified, the function will assume that the parameter distributions
are all defined on the real line (i.e., function(x) x
will be
used for each parameter). If no prior distribution is provided, an
improper prior distribution - uniform on the interval )-Inf,+Inf( -
will be used for all parameters (i.e., prior distribution proportional
to 1 - function(x) 1).
In order to minimize the number of arguments for metropolis
,
the function automatically computes the inverse of trans.list
:
this suppresses the need for the user to provide both the
"inverse transformation" and the "transformation". However, problems
may occur, and it is why the user is allowed to alter
uniroot.interval
. Depending on the number of errors reported,
future versions of this package may end up requesting that a list for
both the "inverse transformation" and the "transformation" be provided
by the user.
A nice list of references is provided below for more information on topics such as: MCMC algorithms, tuning of Metropolis-Hastings algorithms, MCMC convergence diagnostics, the Bayesian paradigm ...
rate |
MCMC acceptance rate. This value is computed before
applying the burn-in; i.e., it is computed for |
total.time |
Total computation time. |
sims.all |
Array containing all iterations. |
sims |
Array containing iterations after burn-in. |
input |
Inputted |
iter |
Number of iterations. |
prior |
Prior distribution. |
burn |
Integer corresponding to the number of iterations to be discarded - burn-in period. |
M |
Parameter vector whose elements correspond to the parameter
values (on the scales specified by |
V |
Covariance matrix computed using, after removing the burn-in
period, the joint posterior distribution of the parameters (on the
scales specified by |
Gelman, A., Carlin, J.B., Stern, H.S., and Rubin, D.B. (2004). Bayesian data analysis, 2nd edition, Chapman & Hall/CRC.
Carlin, B.P, and Louis, T.A. (2009). Bayesian methods for data analysis. Chapman & Hall/CRC.
Gamerman, D., and Lopes H.F. (2006). Markov Chain Monte Carlo: Stochastic simulation for Bayesian inference. 2nd edition, Chapman & Hall/CRC.
Gilks, W.R., Richardson, S., and Spiegelhalter, D.J. (1996). Markov Chain Monte Carlo in Practice. Chapman & Hall.
### These examples should be re-run with, e.g., iter > 2000. data(yarns) x <- yarns$x fit.x <- mle(x,'gamma',c(.1,.1)) bayes.x.no.prior <- metropolis(model=fit.x,iter=150, trans.list=list(function(x) x,function(x) exp(x))) plot(bayes.x.no.prior) # examples of prior distributions (note that these prior distribution # are specified for the transformated parameters; # i.e., in this case, 'meanlog' -> 'meanlog' and 'sdlog' -> 'ln.sdlog') # for the scale parameter only prior.1 <- function(x) dnorm(x[2],.8,.1) # for both parameters (joint but independent in this case) prior.2 <- function(x) dunif(x[1],3.4,3.6)*dnorm(x[2],1,1) bayes.x.prior.2 <- metropolis(model=fit.x,iter=150, trans.list=list(function(x) x,function(x) exp(x)),prior=prior.2) plot(bayes.x.prior.2) # Example where 'model' is not from the class 'mle'; i.e. # both 'start' and 'variance' need to be specified! #x <- rweibull(5,2,1) x <- c(0.9303492,1.0894917,0.9628029,0.6145032,0.4756699) # Here 'fit.x <- mle(x,'weibull',c(.1,.1))' is not used, model.x <- list(x=x,dist='weibull') # and an informative prior distribution is considered to ensure a proper posterior distribution prior.x <- function(x) dnorm(x[1],log(2),.1)*dnorm(x[2],log(1),.1) trans.list.x <- list(function(x) exp(x), function(x) exp(x)) bayes.x <- metropolis(model=model.x,iter=150,prior=prior.x,trans.list=trans.list.x, pass.down.to.C=TRUE,start=c(0,0),variance=diag(.1,2,2))
### These examples should be re-run with, e.g., iter > 2000. data(yarns) x <- yarns$x fit.x <- mle(x,'gamma',c(.1,.1)) bayes.x.no.prior <- metropolis(model=fit.x,iter=150, trans.list=list(function(x) x,function(x) exp(x))) plot(bayes.x.no.prior) # examples of prior distributions (note that these prior distribution # are specified for the transformated parameters; # i.e., in this case, 'meanlog' -> 'meanlog' and 'sdlog' -> 'ln.sdlog') # for the scale parameter only prior.1 <- function(x) dnorm(x[2],.8,.1) # for both parameters (joint but independent in this case) prior.2 <- function(x) dunif(x[1],3.4,3.6)*dnorm(x[2],1,1) bayes.x.prior.2 <- metropolis(model=fit.x,iter=150, trans.list=list(function(x) x,function(x) exp(x)),prior=prior.2) plot(bayes.x.prior.2) # Example where 'model' is not from the class 'mle'; i.e. # both 'start' and 'variance' need to be specified! #x <- rweibull(5,2,1) x <- c(0.9303492,1.0894917,0.9628029,0.6145032,0.4756699) # Here 'fit.x <- mle(x,'weibull',c(.1,.1))' is not used, model.x <- list(x=x,dist='weibull') # and an informative prior distribution is considered to ensure a proper posterior distribution prior.x <- function(x) dnorm(x[1],log(2),.1)*dnorm(x[2],log(1),.1) trans.list.x <- list(function(x) exp(x), function(x) exp(x)) bayes.x <- metropolis(model=model.x,iter=150,prior=prior.x,trans.list=trans.list.x, pass.down.to.C=TRUE,start=c(0,0),variance=diag(.1,2,2))
For a given dataset, this function serves to find maximum likelihood parameter estimates for some specified parametric probability distribution.
mle(x, dist, start = NULL, method = "Nelder-Mead")
mle(x, dist, start = NULL, method = "Nelder-Mead")
x |
A univariate dataset (a vector). |
dist |
Distribution to be fitted to |
start |
Starting parameter values for the optimization algorithm (see |
method |
The optimization method to be used (see |
fit |
|
x.info |
Array that contains the following columns:
|
dist |
Distribution fitted to |
par.hat |
Vector of estiamted parameters. |
cov.hat |
Observed Fisher's information matrix. |
k |
Number of parameters |
n |
Number of observations (i.e., |
log.like |
Log-likelihood value evaluated at the estimated parameter (i.e. |
aic |
Akaike information criterion computed as |
ad |
Anderson Darling statistic evaluated at the estimated parameter values. |
data.name |
Name for |
rho |
Pearson's correlation coefficient computed as |
optim
, distr
, boot.mle
, metropolis
, Q.conf.int
data(yarns) x <- yarns$x fit.x <- mle(x,'weibull',c(.1,.1)) fit.x names(fit.x) #plot(fit.x) #plot(fit.x,TRUE,alpha=.01) p <- c(.9,.95,.99) distr(p,model=fit.x,type='q') Q.conf.int(p,fit.x,.01) Q.conf.int(p,fit.x,.01,TRUE)
data(yarns) x <- yarns$x fit.x <- mle(x,'weibull',c(.1,.1)) fit.x names(fit.x) #plot(fit.x) #plot(fit.x,TRUE,alpha=.01) p <- c(.9,.95,.99) distr(p,model=fit.x,type='q') Q.conf.int(p,fit.x,.01) Q.conf.int(p,fit.x,.01,TRUE)
This function allows to user to call different plots for visual assessment of the posterior distribution(s).
## S3 method for class 'metropolis' plot(x, plot.type = "carlin", pos = 1:x$iter, ...)
## S3 method for class 'metropolis' plot(x, plot.type = "carlin", pos = 1:x$iter, ...)
x |
|
plot.type |
The user may choose betweew:
|
pos |
May be used by the user to plot a subset (i.e. a random subset, |
... |
Additional arguments pertaining to function |
See list of references for metropolis
.
data(yarns) x <- yarns$x fit.x <- mle(x,'gamma',c(.1,.1)) bayes.x <- metropolis(model=fit.x,iter=100, trans.list=list(function(x) exp(x),function(x) exp(x))) plot(bayes.x) plot(bayes.x,'hist',col='cyan') plot(bayes.x,'pairs',cex=.1,pch=19) plot(bayes.x,'pairs',pos=sample(1:bayes.x$iter,20),col='red') plot(bayes.x,'post.pred',col='green')
data(yarns) x <- yarns$x fit.x <- mle(x,'gamma',c(.1,.1)) bayes.x <- metropolis(model=fit.x,iter=100, trans.list=list(function(x) exp(x),function(x) exp(x))) plot(bayes.x) plot(bayes.x,'hist',col='cyan') plot(bayes.x,'pairs',cex=.1,pch=19) plot(bayes.x,'pairs',pos=sample(1:bayes.x$iter,20),col='red') plot(bayes.x,'post.pred',col='green')
This function returns diagnotic plots for a mle
object.
## S3 method for class 'mle' plot(x, ci = FALSE, alpha = 0.05,...)
## S3 method for class 'mle' plot(x, ci = FALSE, alpha = 0.05,...)
x |
|
ci |
Whether or not approximate confidence intervals should be added to the return period and quantile plots. |
alpha |
|
... |
none... |
data(yarns) x <- yarns$x fit.1 <- mle(x,'weibull',c(.1,.1)) fit.2 <- mle(x,'logis',c(.1,.1)) plot(fit.1,TRUE,.05) dev.new();plot(fit.2,TRUE,.05)
data(yarns) x <- yarns$x fit.1 <- mle(x,'weibull',c(.1,.1)) fit.2 <- mle(x,'logis',c(.1,.1)) plot(fit.1,TRUE,.05) dev.new();plot(fit.2,TRUE,.05)
See metropolis
.
## S3 method for class 'metropolis' print(x, stats.fun = NULL,...)
## S3 method for class 'metropolis' print(x, stats.fun = NULL,...)
x |
|
stats.fun |
An optional function that may be provided by the user in order to obtain a posterior summary (see Example). |
... |
none... |
data(yarns) x <- yarns$x fit.x <- mle(x,'gamma',c(.1,.1)) bayes.x <- metropolis(fit.x,50,trans.list= list(function(x) exp(x), function(x) exp(x))) print(bayes.x) print(bayes.x,stats.fun=function(x) c(mean=mean(x),CV=sd(x)/mean(x)))
data(yarns) x <- yarns$x fit.x <- mle(x,'gamma',c(.1,.1)) bayes.x <- metropolis(fit.x,50,trans.list= list(function(x) exp(x), function(x) exp(x))) print(bayes.x) print(bayes.x,stats.fun=function(x) c(mean=mean(x),CV=sd(x)/mean(x)))
See mle
.
## S3 method for class 'mle' print(x,...)
## S3 method for class 'mle' print(x,...)
x |
|
... |
none... |
data(yarns) x <- yarns$x fit.x <- mle(x,'gamma',c(.1,.1)) print(fit.x) print.mle(fit.x)
data(yarns) x <- yarns$x fit.x <- mle(x,'gamma',c(.1,.1)) print(fit.x) print.mle(fit.x)
This function can be used to derive parametric bootstrap confidence intervals for the p
-th quantile of the fitted distribution (see mle
).
Q.boot.ci(p,boot,alpha=.1)
Q.boot.ci(p,boot,alpha=.1)
p |
Vector of probabilities. |
boot |
An object obtained using |
alpha |
|
This functions returns two types of bootstrap confidence intervals for the p
-th quantile - one is based on the "percentile" method, while the other corresponds to the basis bootstrap interval or "reflexion" (see References).
See References for other means of deriving bootstrap intervals.
Davison, A.C., and Hinkley, D.V. (1997) Bootstrap methods and their application. Cambridge University Press.
data(yarns) x <- yarns$x fit.x <- mle(x,'gamma',c(.1,.1)) Q.conf.int(p=c(.5,.9,.95,.99),model=fit.x,alpha=.01,ln=FALSE) # should be run again with B = 1000, for example... boot.x <- boot.mle(model=fit.x,B=50) Q.boot.ci(p=c(.5,.9,.95,.99),boot=boot.x,alpha=.01)
data(yarns) x <- yarns$x fit.x <- mle(x,'gamma',c(.1,.1)) Q.conf.int(p=c(.5,.9,.95,.99),model=fit.x,alpha=.01,ln=FALSE) # should be run again with B = 1000, for example... boot.x <- boot.mle(model=fit.x,B=50) Q.boot.ci(p=c(.5,.9,.95,.99),boot=boot.x,alpha=.01)
This function can be used to derive approximate confidence intervals for the p
-th quantile of the fitted distribution (see mle
).
Q.conf.int(p, model, alpha = 0.1, ln = FALSE)
Q.conf.int(p, model, alpha = 0.1, ln = FALSE)
p |
Vector of probabilities. |
model |
|
alpha |
|
ln |
whether or not the confidence interval of the |
The p-th quantile confidence interval is derived using the observed Fisher's information matrix in conjuction with the well-known delta method. Here, Q.conf.int
allows the user to chose between two types of confidence intervals: one that is computed on the original scale and one that is computed on the quantile's natural logarithmic scale.
The function returns a 3-by-length(p)
array containing, for each value of p
, the confidence interval's lower and upper bounds, as well as the quantile point estimate (maximum likelihood).
Rice, J.A. (2006) Mathematical statistics and data analysis. Duxbury Press, 3rd edition (regarding the Delta method).
data(yarns) x <- yarns$x fit.x <- mle(x,'gamma',c(.1,.1)) Q.conf.int(p=c(.5,.9,.95,.99),model=fit.x,alpha=.01,ln=FALSE) Q.conf.int(p=c(.5,.9,.95,.99),model=fit.x,alpha=.01,ln=TRUE)
data(yarns) x <- yarns$x fit.x <- mle(x,'gamma',c(.1,.1)) Q.conf.int(p=c(.5,.9,.95,.99),model=fit.x,alpha=.01,ln=FALSE) Q.conf.int(p=c(.5,.9,.95,.99),model=fit.x,alpha=.01,ln=TRUE)
This dataset is taken from the HYDAT database, and corresponds to realized values of annual maximum daily mean flows (in ).
data(station01AJ010)
data(station01AJ010)
A vector of observations.
Hydrometric station 01AJ010
Environment Canada: https://wateroffice.ec.gc.ca/
This dataset is taken from Gamerman and Lopes (2006) (also see references therein), and consists of 100 cycles-to-failure times for airplane yarns.
data(yarns)
data(yarns)
A data.frame
object - one column of 100 observations.
Gamerman and Lopes (2006), page 255 (also see references therein).
Gamerman, D., and Lopes H.F. (2006). Markov Chain Monte Carlo: Stochastic simulation for Bayesian inference. 2nd edition, Chapman & Hall/CRC.