What I won’t talk about:
Published way back in 2006, Guszcza and Lommele (2006) presented a model to develop reserves based on individual claim data.
Regression based on individual claims looks pretty good. Axes are on a log scale.
However, things look different when we differentiate based on credit grouping.
We would have been fine.
Not surprising. The Poisson likelihood function aggregates naturally.
\[L=\prod_{i=1}^{N}\dfrac{\lambda^{x_i}e^{-\lambda}}{x_i!}\]
So we can split our data into subsets and get results that are just about as good as when fitting individual data without quite so much fuss.
What if we don’t want to look at all of our data at the same time?
What if we don’t have data?
Bayesian estimation allows us to replace data with prior judgment.
Hierarchical models | Bayesian | |
---|---|---|
Fit method | Maximum likelihood | Closed form if you’re lucky, numerical methods (like MCMC) if you’re not |
Complementary data | Part of the fitting process | Use a prior distribution |
Objectivity | Objective | Subjective when the prior swamps the data |
Imagine I’ve flipped a coin ten times and came up with two heads.
I need the following:
All of this information is stored in a block of data for Stan. Gather the data with R, pass it to Stan and let it go to work. I store my Stan information in a separate file.
data {
int<lower=0> sampleN;
int<lower=0, upper=1> heads[sampleN];
int<lower=0> predN;
int<lower=0> betaA;
int<lower=0> betaB;
}
parameters {
real<lower=0,upper=1> theta;
}
model {
theta ~ beta(betaA, betaB);
heads ~ bernoulli(theta);
}
In this simple example, there’s just the one parameter. \(\theta\) is the beta variable that serves as our range around the binomial coefficient, p.
generated quantities{
real heads_pred;
heads_pred <- binomial_rng(predN, theta);
}
The sample size doesn’t need to be the same as the predicted results. I could use five years of data to predict the next two, or whatever.
sampleN <- 10
heads <- c(1, 1, 0, 0, 0, 0, 0, 0, 0, 0)
predN <- 5
fit1 <- stan(file = './stan/bernoulli.stan'
, data = list(sampleN
, heads
, predN
, betaA = 1
, betaB = 1)
, iter = 1000
, seed = 1234)
Notice that the mean of the generated thetas is always between the sample mean of 0.2 and our prior belief of 0.5. In the first case, it’s 0.25. In the second it’s 0.4.
Is this a contrived scenario?
Consider:
This model was based on an example first described in (Gelman and Hill 2006). The stan model code may be found here: https://github.com/stan-dev/example-models/blob/master/ARM/Ch.8/roaches.stan
data {
// sample data
int<lower=0> numClaims;
vector[numClaims] Prior;
int<lower=0, upper=1> BadCredit[numClaims];
int Current[numClaims];
// prior parameters
real shape;
real rate;
// New predicted quantities
int<lower=0> numNewClaims;
vector[numNewClaims] NewPrior;
int<lower=0, upper=1> NewBadCredit[numNewClaims];
}
transformed data {
vector[numClaims] logPrior;
vector[numNewClaims] logNewPrior;
logPrior <- log(Prior);
logNewPrior <- log(NewPrior);
}
Because this is a Poisson GLM, we’ll take the log of the predictor.
parameters {
real credit;
real linkRatio;
}
transformed parameters {
real logLink;
logLink <- log(linkRatio);
}
We’re using Poisson with an offset, so we need to transform the parameters.
model {
for (i in 1:numClaims) {
linkRatio ~ gamma(shape, rate);
Current[i] ~ poisson_log(logPrior[i]
+ logLink + credit * BadCredit[i]);
}
}
generated quantities{
int newCurrent[numNewClaims];
for (i in 1:numNewClaims){
newCurrent[i] <- poisson_log_rng(logNewPrior[i]
+ logLink + credit * NewBadCredit[i]);
}
}
I’m free to do pretty much whatever I want with the model.
That’s a feature, not a bug!
Here, I’ll assume that the link ratio is something like 1.5 and I have a modest confidence interval around that guess.
“Example Models.” 2015. https://github.com/stan-dev/example-models/wiki.
Gelman, Andrew, and Jennifer Hill. 2006. Data Analysis Using Regression and Multilevel/Hierarchical Models. http://www.stat.columbia.edu/~gelman/arm/.
Guszcza, James. 2008. “Hierarchical Growth Curve Models for Loss Reserving.” Forum Fall 2008. https://www.casact.org/pubs/forum/08fforum/7Guszcza.pdf.
Guszcza, James, and Jan Lommele. 2006. “Loss Reserving Using Claim-Level Data.” Forum Spring 2006. https://www.casact.org/pubs/forum/06fforum/115.pdf.
Stan Development Team. 2015. Stan Modeling Language User’s Guide and Reference Manual, Version 2.10.0. http://mc-stan.org/.