using Pkg Pkg.add("Extremes") Pkg.add("Dates") Pkg.add("DataFrames") Pkg.add("Distributions") Pkg.add("Gadfly") using Extremes, Dates, DataFrames, Distributions, Gadfly # Derived from blockMaxima.md ## Load the data Loading the annual maximum sea-levels at Port Pirie: ```@example portpirie data = Extremes.dataset("portpirie") first(data,5) ``` The annual maxima can be shown as a function of the year with the Gadfly package: ```@example portpirie set_default_plot_size(12cm, 8cm) plot(data, x=:Year, y=:SeaLevel, Geom.line) ``` ## Maximum likelihood inference ### GEV parameter estimation The GEV parameter estimation with maximum likelihood is performed with the [`gevfit`](@ref) function: ```@repl portpirie fm = gevfit(data, :SeaLevel) ``` The approximate covariance matrix of the parameter estimates can be obtained with the [`parametervar`](@ref) function: ```@repl portpirie parametervar(fm) ``` Confidence intervals for the parameters are obtained with the [`cint`](@ref) function: ```@repl portpirie cint(fm, 0.95) ``` For instance, the shape parameter 95% confidence interval is as follows: ```@repl portpirie cint(fm, 0.95)[3] ``` ### Diagnostic plots Several diagnostic plots for assessing the accuracy of the GEV model fitted to the Port Pirie data are can be shown with the [`diagnosticplots`](@ref) function: ```@example portpirie set_default_plot_size(21cm ,16cm) diagnosticplots(fm) ``` The diagnostic plots consist in the probability plot (upper left panel), the quantile plot (upper right panel), the density plot (lower left panel) and the return level plot (lower right panel). These plots can be displayed separately using respectively the [`probplot`](@ref), [`qqplot`](@ref), [`histplot`](@ref) and [`returnlevelplot`](@ref) functions. ### Return level estimation *T*-year return level estimate can be obtained using the [`returnlevel`](@ref) function. For example, the 100-year return level for the Port Pirie block maxima model is computed as follows: ```@repl portpirie r = returnlevel(fm, 100) ``` The return level can be accessed as follows: ```@repl portpirie r.value ``` The corresponding confidence interval can be computed with the [`cint`](@ref) function: ```@repl portpirie c = cint(r, 0.95) ``` To get the scalar return level in the stationary case, the following command can be used: ```@repl portpirie r.value[] ``` To get the scalar confidence interval in the stationary case, the following command can be used: ```@repl portpirie c[] ``` # derived from blockmaxima_ns.md #### The location parameter varying as a linear function of the year ```@repl portpirie fm₁ = gevfit(data, :SeaLevel, locationcovid = [:Year]) ``` ### Diagnostic plots Several diagnostic plots for assessing the accuracy of the GEV model fitted to the Port Pirie data can be shown with the [`diagnosticplots`](@ref) function: ```@example portpirie set_default_plot_size(21cm ,16cm) diagnosticplots(fm₁) ``` ### Return level estimation Since the model parameters vary in time, the quantiles also vary in time. Therefore, a *T*-year return level can be estimated for each year. This set of return levels are referred to as *effective return levels* as proposed by Katz *et al.* (2002). The 100-year effective return levels for the `fm₁` model can be computed using the [`returnlevel`](@ref) function: ```@repl portpirie r = returnlevel(fm₁, 100) ``` The return level vector can be accessed as follows: ```@repl portpirie r.value ``` The corresponding confidence interval can be computed with the [`cint`](@ref) function: ```@repl portpirie c = cint(r, 0.95) ``` The effective return levels along with their confidence intervals can be plotted as follows: ```@example portpirie rmin = [c[i][1] for i in eachindex(c)] rmax = [c[i][2] for i in eachindex(c)] df = DataFrame(Year = data[:,:Year], r = r.value, rmin = rmin, rmax = rmax) nothing # hide ``` ```@example portpirie set_default_plot_size(12cm, 9cm) plot(df, x=:Year, y=:r, ymin=:rmin, ymax=rmax, Geom.line, Geom.ribbon, Coord.cartesian(xmin=1920, xmax=2000), Guide.xticks(ticks=1920:20:2000), Guide.ylabel("100-year Effective Return Level")) ``` # derived from TresholdExceedence.md ## Threshold Exceedance Model The stationary [`ThresholdExceedance`](@ref) model is illustrated using the daily rainfall accumulations at a location in south-west England from 1914 to 1962. This dataset was studied by Coles (2001) in Chapter 4. The daily rainfall are assume **independent and identically distributed**. ```@setup rain using Extremes, Dates, DataFrames, Distributions, Gadfly ``` ## Load the data Loading the daily precipitations: ```@example rain data = Extremes.dataset("rain") first(data,5) ``` Plotting the data using the Gadfly package: ```@example rain set_default_plot_size(14cm ,8cm) # hide plot(data, x=:Date, y=:Rainfall, Geom.point, Theme(discrete_highlight_color=c->nothing)) ``` ## Threshold selection A suitable threshold for the Peaks-Over-Threshold model can be chosen by examining the mean residual life plot. The mean residual life is expected to be a linear function of the threshold when the latter is high enough. The mean residual life plot can be plotted with the [`mrlplot`](@ref) function: ```@example rain set_default_plot_size(14cm ,8cm) # hide mrlplot(data[:,:Rainfall]) ``` As concluded by Coles (2001, Chapter 4), a reasonable threshold is 30 *mm*. ```@example rain threshold = 30.0 nothing # hide ``` ## Exceedances extraction Parameter estimation of the Generalized Pareto distribution in *Extremes.jl* is performed using the threshold exceedances previously extracted. The support of the exceedances given in the fit function is therefore ``(0,∞)``. For the *Rainfall* example, let's extract the threshold exceedances. Identify first the threshold exceedances: ```@example rain df_exceedance = filter(row -> row.Rainfall > threshold, data) first(df_exceedance, 5) ``` Retrieve the exceedances above the threshold: ```@example rain df_exceedance.Exceedance = df_exceedance.Rainfall .- threshold select!(df_exceedance, :Date, :Exceedance) first(df_exceedance, 5) ``` Alternative, as in TresholdExceedence.md: ```@example rain df = filter(row -> row.Rainfall > threshold, data) df[!,:Rainfall] = df[!,:Rainfall] .- threshold rename!(df, :Rainfall => :Exceedance) first(df, 5) ``` ## Bayesian Inference Most functions described in the previous sections also work in the Bayesian context. To reproduce exactly the results, the seed should be fixed as follows: ```@example rain import Random Random.seed!(4786) nothing #hide ``` ### GP parameter estimation The Bayesian GP parameter estimation is performed with the [`gpfitbayes`](@ref) function: ```@repl rain fm = gpfitbayes(df_exceedance, :Exceedance) ``` The MCMC sample of the scale and shape parameters can be extracted respectively with the functions Extremes.scale and shape. The marginal chains for those parameters can be plotted as follows: ```@example rain set_default_plot_size(12cm ,10cm) #hide p1 = plot(y = Extremes.scale(fm), Geom.line, Guide.xlabel("Iteration"), Guide.ylabel("σ")) p2 = plot(y = Extremes.shape(fm), Geom.line, Guide.xlabel("Iteration"), Guide.ylabel("ξ")) vstack(p1, p2) ``` Several diagnostic plots for assessing the accuracy of the fitted GP distribution to the rainfall data are can be shown with the [`diagnosticplots`](@ref) function: ```@example rain set_default_plot_size(21cm ,16cm) diagnosticplots(fm) ``` The empirical covariance matrix of the parameters can be obtained with [`parametervar`](@ref) as follows: ```@repl rain parametervar(fm) ``` Credible intervals for the parameters are obtained with the [`cint`](@ref) function: ```@repl rain cint(fm, 0.95) ``` # derived from thresholdexceedance_ns.md ```@setup rainfall using Extremes, DataFrames, Dates, Distributions, Gadfly ``` ### Load the data Loading the daily rainfall accumulations: ```@example rainfall data = Extremes.dataset("rain") first(data,5) ``` Extract the exceedances over the threshold of 30 *mm*: ```@example rainfall threshold = 30.0 df = filter(row -> row.Rainfall > threshold, data) df[!,:Exceedance] = df[:,:Rainfall] .- threshold df[!,:Year] = year.(df[:,:Date]) set_default_plot_size(12cm, 8cm) # hide plot(df, x=:Date, y=:Exceedance, Geom.point) ``` ### Return level estimation Since the model parameters vary in time, the quantiles also vary in time. Therefore, a *T*-year return level can be estimated for each year. This set of return levels are referred to as *effective return levels* as proposed by Katz *et al.* (2002). The 100-year effective return levels for the `fm₁` model can be computed using the [`returnlevel`](@ref) function: ```@repl rainfall nobs = size(data,1) nobsperblock = 365 r = returnlevel(fm, threshold, nobs, nobsperblock, 100) ``` The effective return levels can be accessed as follows: ```@repl rainfall r.value ``` The corresponding confidence interval can be computed with the [`cint`](@ref) function: ```@repl rainfall c = cint(r, 0.95) ``` # derived from Declustering.md ```@example wooster data = Extremes.dataset("wooster") set_default_plot_size(12cm, 8cm) # hide plot(data, x=:Date, y=:Temperature, Geom.point) ``` ```@example wooster df = copy(data) df[!,:Temperature] = -data[:,:Temperature] filter!(row -> month(row.Date) ∈ (1,2,11,12), df) plot(df, x=:Date, y=:Temperature, Geom.point) ``` ## The runs method Two cluster definitions are used in *Extremes.jl*. The first one is based on the *runs method*: exceedances separated by fewer than ``r`` non-exceedances are assumed to belong to the same cluster (see Chapter 10, Beirlant *et al.*, 2004). The parameter ``r`` is generally referred to as the *runlength* parameter. When ``r = 0``, each exceedance forms a separate cluster. When using the threshold of -10°F and the runlength of 4, 17 clusters are idenfied: ```@example wooster threshold = -10.0 cluster = getcluster(df[:,:Temperature], threshold, runlength=4) nothing # hide ``` The first cluster can be retrieved as follows: ```@example wooster first(cluster) ``` The GP distribution can be fitted on the cluster maximal exceedances (the cluster maximum can be computed with the `maximum` method applied to the Cluster type): ```@repl wooster z = maximum.(cluster) ``` ## The two thresholds method With the two thresholds method, a cluster of threshold exceedances is defined as a streak of data higher than a second threshold ``u₂`` whose at least one data is higher than a first threshold ``u₁``, where ``u₁ ≥ u₂``. When using the ``u₁ = 0°F`` as the first threshold and ``u₂ = -10°F`` as the second threshold, 11 clusters are idenfied: ```@example wooster u₁ = -10 u₂ = -15 cluster = getcluster(df[:,:Temperature], u₁, u₂) nothing # hide ``` The first cluster can be retrieved as follows: ```@example wooster first(cluster) ``` The GP distribution can be fitted on the cluster maximal exceedances: ```@repl wooster z = maximum.(cluster) ``` # Derived from blockmaxima_ns.md ### Load the data Loading the annual maximum sea-levels at Fremantle: ```@example fremantle data = Extremes.dataset("fremantle") first(data,5) ``` The annual maxima can be plotted as function of the year: ```@example fremantle set_default_plot_size(12cm, 8cm) # hide plot(data, x=:Year, y=:SeaLevel, Geom.line, Coord.cartesian(xmin=1895, xmax=1990), Guide.xticks(ticks=1895:10:1990)) ``` and as function of the Southern Oscillation Index: ```@example fremantle set_default_plot_size(12cm, 8cm) # hide plot(data, x=:SOI, y=:SeaLevel, Geom.point) ``` ## Maximum likelihood inference ### GEV parameter estimation The GEV parameter estimation with maximum likelihood is performed with the [`gevfit`](@ref) function. The parameter estimate vector ``\mathbf{θ̂} = (\mathbf{β̂₁},\, \mathbf{β̂₂},\, \mathbf{β̂₃})^\top`` is contained in the field `θ̂` of the returned structure. Several non-stationary model can be fitted. #### The stationary model ```@repl fremantle fm₀ = gevfit(data, :SeaLevel) ``` #### The location parameter varying as a linear function of the year ```@repl fremantle fm₁ = gevfit(data, :SeaLevel, locationcovid = [:Year]) ``` #### The location parameter varying as a linear function of the year and the SOI ```@repl fremantle fm₂ = gevfit(data, :SeaLevel, locationcovid = [:Year, :SOI]) ``` #### Both the location and logscale parameters varying as a linear function of the year and the SOI ```@repl fremantle fm₃ = gevfit(data, :SeaLevel, locationcovid = [:Year, :SOI], logscalecovid = [:Year, :SOI]) ``` As show by Coles (2001), the best model is the one where the location parameter varies as a linear function of the year and the SOI, the `fm₂` in the present section. The vector of the parameter estimates (location scale and shape) can be extracted with the function [`params`](@ref): ```@repl fremantle params(fm₂) ``` The approximate covariance matrix of the parameter estimates for this model can be obtained with the [`parametervar`](@ref) function: ```@repl fremantle parametervar(fm₂) ``` Confidence intervals for the parameters are obtained with the [`cint`](@ref) function: ```@repl fremantle cint(fm₂, 0.95) ``` ### Diagnostic plots Several diagnostic plots for assessing the accuracy of the GEV model fitted to the Fremantle data can be shown with the [`diagnosticplots`](@ref) function: ```@example fremantle set_default_plot_size(21cm ,16cm) diagnosticplots(fm₂) ``` The diagnostic plots consist in the residual probability plot (upper left panel), the residual quantile plot (upper right panel) and the residual density plot (lower left panel) of the standardized data (see Chapter 6 of Coles, 2001). These plots can be displayed separately using respectively the [`probplot`](@ref), [`qqplot`](@ref), [`histplot`](@ref) and [`returnlevelplot`](@ref) functions. ### Return level estimation Since the model parameters vary in time, the quantiles also vary in time. Therefore, a *T*-year return level can be estimated for each year. This set of return levels are referred to as *effective return levels* as proposed by Katz *et al.* (2002). The 100-year effective return levels for the `fm₂` model can be computed using the [`returnlevel`](@ref) function: ```@repl fremantle r = returnlevel(fm₂, 100) ``` The effective return levels can be accessed as follows: ```@repl fremantle r.value ``` The corresponding confidence interval can be computed with the [`cint`](@ref) function: ```@repl fremantle c = cint(r, 0.95) ``` The effective return levels along with their confidence intervals can be plotted as follows: ```@example fremantle rmin = [c[i][1] for i in eachindex(c)] rmax = [c[i][2] for i in eachindex(c)] df = DataFrame(Year = data[:,:Year], r = r.value, rmin = rmin, rmax = rmax) nothing # hide ``` ```@example fremantle set_default_plot_size(12cm, 9cm) plot(df, x=:Year, y=:r, ymin=:rmin, ymax=rmax, Geom.line, Geom.ribbon, Coord.cartesian(xmin = 1895, xmax = 1990), Guide.xticks(ticks = 1895:10:1990), Guide.ylabel("100-year Effective Return Level")) ``` # Derived from thresholdexceedance_ns.md ### Load the data Loading the daily rainfall accumulations: ```@example rainfall data = Extremes.dataset("rain") first(data,5) ``` Extract the exceedances over the threshold of 30 *mm*: ```@example rainfall threshold = 30.0 df = filter(row -> row.Rainfall > threshold, data) df[!,:Exceedance] = df[:,:Rainfall] .- threshold df[!,:Year] = year.(df[:,:Date]) ``` ## Bayesian Inference Non-stationary parameter estimation can also be performed by the Bayesian approach. To reproduce exactly the results, the seed should be fixed as follows: ```@example fremantle import Random Random.seed!(4786) nothing #hide ``` ### GP parameter estimation The Bayesian GP parameter estimation is performed with the [`gpfitbayes`](@ref) function: ```@repl rainfall fm = gpfitbayes(df, :Exceedance, logscalecovid = [:Year]) ``` The empirical covariance matrix of the parameters and the credible intervals can be obtained with the functions [`parametervar`](@ref) and [`cint`](@ref), respectively: ```@repl rainfall parametervar(fm) ``` ```@repl rainfall cint(fm, 0.95) ``` ### Return level estimation The 100-year effective return level estimates can be obtained using the function [`returnlevel`](@ref): ```@repl rainfall nobs = size(data,1) nobsperblock = 365 r = returnlevel(fm, threshold, nobs, nobsperblock, 100) ``` The corresponding 95% credible interval can be computed using the function [`cint`](@ref): ```@repl rainfall c = cint(r, 0.95) ``` The 100-year effective return levels along with their 95% credible intervals can be illustrated as follows: ```@example rainfall df_plot = DataFrame(Year=df.Year, ReturnLevel = vec(mean(r.value, dims=1)), LowerBound = first.(c), UpperBound=last.(c)) set_default_plot_size(12cm, 8cm) plot(df_plot, x=:Year, y=:ReturnLevel, Geom.line, ymin=:LowerBound, ymax=:UpperBound, Geom.ribbon, Guide.ylabel("100-year effective return level")) ```