Visualizing the History of Epidemics

I really like National Geographic. Their magazine is great, their television documentaries are done well and they helped give me a lifelong love of maps. They generate very good information and help shed light on the world we all share. So why is this graphic so awful?

Let's have a look:
National Geographic image

We'll start off by saying that no one will mistake me for Edward Tufte or Stephen Few or Nathan Yau, though I love their stuff, have read it and have tried to adopt as many of their more sensible recommendations as I can. That understood, I think I'm on solid footing when I say that at a minimum, all graphical elements should fit within the display surface. The first three quantities are so massive, that they can't be contained. How big are they? Well, we have the numbers within the circles, but beyond that, who knows? The plague of Justinian looks like it could be Jupiter to the Black Plague's Saturn, with modern epidemics having more of an Earthly size.

Speaking of circles, I try to avoid them. If those three aforementioned experts have taught me anything it's that the human brain cannot easily process the area of a round object. Quick: without looking at the numbers, tell me what's the relativity between HIV and ebola.

Did you have to scroll to look at both objects? I did. Not only do the largest epidemics spill over the display area, they make it difficult to view a large number of data points at the same time. As we scroll down, we eventually land on a display which has Asian flu at the top and the great plague of London at the bottom. Justinian, the black death and medieval history are erased from our thoughts.

And what's with the x-axis? The circles move from one side to the other, but this dimension conveys no meaning whatsoever.

As an aside, although I love having the years shown, it would have been good to use that to augment the graphic with something that conveys how epidemics have changed over time. Population has changed, medicine has changed and the character of human disease has changed. As I look at the graphic, what I tend to extrapolate from this is that surely the plague of Justinian wiped out most of southern Europe, Anatolia and Mesopotamia. In contrast, SARS likely appeared during a slow news cycle.

It would be disingenuous of me to criticize a display without proposing one of my own. So, here goes.

dfEpidemic = data.frame(Outbreak = c("Plague of Justinian", "Black Plague"
                                     , "HIV/AIDS", "1918 Flu", "Modern Plague"
                                     , "Asian Flu", "6th Cholera Pandemic"
                                     , "Russian Flu", "Hong Kong Flut"
                                     , "5th Cholera Pandemic", "4th Cholera Pandemic"
                                     , "7th Cholera Pandemic", "Swine Flu"
                                     , "2nd Cholera Pandemic", "First Cholera Pandemic"
                                     , "Great Plague of London", "Typhus Epidemic of 1847"
                                     , "Haiti Cholera Epidemic", "Ebola"
                                     , "Congo Measles Epidemic", "West African Meningitis"
                                     , "SARS")
                        , Count = c(100000000, 50000000, 39000000, 20000000
                                    , 10000000, 2000000, 1500000, 1000000
                                    , 1000000, 981899, 704596, 570000, 284000
                                    , 200000, 110000, 100000, 20000, 6631
                                    , 4877, 4555, 1210, 774)
                        , FirstYear = c(541, 1346, 1960, 1918, 1894, 1957, 1899, 1889
                                        , 1968, 1881, 1863, 1961, 2009, 1829, 1817
                                        , 1665, 1847, 2011, 2014, 2011, 2009, 2002))
dfEpidemic$Outbreak = factor(dfEpidemic$Outbreak
                             , levels=dfEpidemic$Outbreak[order(dfEpidemic$FirstYear
                                                                , decreasing=TRUE)])
library(ggplot2)
library(scales)
plt = ggplot(data = dfEpidemic, aes(x=Outbreak, y=Count)) + geom_bar(stat="identity") + coord_flip()
plt = plt + scale_y_continuous(labels=comma)
plt

plot of chunk GetDataFrame

I'm showing that data as a bar chart, so everything fits within the display and the relative size is easy to recognize. I also order the bars by starting year so that we can convey an additional item of information. Are diseases getting more extreme? Nope. Quite the reverse. 1918 flu and HIV have been significant health issues, but they pale in comparison to the plague of Justinian or the Black Death. HIV is significant, but we've been living with that disease for longer than I've been alive. If we want to convey a fourth dimension, we could shade the bars based on the length of the disease.

dfEpidemic$LastYear = c(542, 1350, 2014, 1920, 1903, 1958, 1923, 1890, 1969, 1896, 1879
                        , 2014, 2009, 1849, 1823, 1666, 1847, 2014, 2014, 2014, 2010, 2003)
dfEpidemic$Duration = with(dfEpidemic, LastYear - FirstYear + 1)
dfEpidemic$Rate = with(dfEpidemic, Count / Duration)

plt = ggplot(data = dfEpidemic, aes(x=Outbreak, y=Count, fill=Rate)) + geom_bar(stat="identity")
plt = plt + coord_flip() + scale_y_continuous(labels=comma)
plt

plot of chunk AddDuration

The plague of Justinian dwarfs everything. We'll have one last look with this observation removed. I'll also take out the Black Death so that we're a bit more focused on modern epidemics.

dfEpidemic2 = dfEpidemic[-(1:2), ]
plt = ggplot(data = dfEpidemic2, aes(x=Outbreak, y=Count, fill=Rate)) + geom_bar(stat="identity")
plt = plt + coord_flip() + scale_y_continuous(labels=comma)
plt

plot of chunk SansJustinian

HIV/AIDS now stands out as having the most victims, though the 1918 flu pandemic caused people to succomb more quickly.

These bar charts are hardly the last word in data visualization. Still, I think they convey more information, more objectively than the National Geographic's exhibit. I'd love to see further comments and refinements.

Session info:

## R version 3.1.1 (2014-07-10)
## Platform: x86_64-pc-linux-gnu (64-bit)
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] knitr_1.6        RWordPress_0.2-3 scales_0.2.4     ggplot2_1.0.0   
## 
## loaded via a namespace (and not attached):
##  [1] colorspace_1.2-4 digest_0.6.4     evaluate_0.5.5   formatR_0.10    
##  [5] grid_3.1.1       gtable_0.1.2     htmltools_0.2.4  labeling_0.2    
##  [9] MASS_7.3-34      munsell_0.4.2    plyr_1.8.1       proto_0.3-10    
## [13] Rcpp_0.11.2      RCurl_1.95-4.1   reshape2_1.4     rmarkdown_0.2.50
## [17] stringr_0.6.2    tools_3.1.1      XML_3.98-1.1     XMLRPC_0.3-0    
## [21] yaml_2.1.13

An Idiot Learns Bayesian Analysis: Part 3

A week or so ago, the grand Magus over at lamages.blogspot.com/ published a great, quick thought exercise taken from Daniel Kahneman’s book Thinking, Fast and Slow. Here are the particulars of the problem: you’re in a community with two different color vehicles; 85% are green and 15% are blue. A vehicle was involved in a hit and run accident. A witness says the car was blue. We can establish that the witness may correctly identify the color of a car 80% of the time. Given all that, what is the probability that the car is blue?

In an uncharacteristic fit of industriousness, I didn’t just read through the explanation, but tried to work it out myself. My natural inclination was to assume a table as follows:

Car is green Car is blue Marginal
Witness is correct 68% 12% 80%
Witness is incorrect 17% 3% 20%
Marginal 85% 15% 100%

The interior probabilities can be worked out by multiplying the marginals, which is a great thing for lazy people like me. However, structuring things in this way can make the problem a bit harder to work out. This configuration doesn’t directly address our question. If we want to know the chance that the car was blue- given that the witness says that it was blue- we have to pluck out the scenarios wherein the witness will state that she saw a blue car. These are: 1) the witness is correct and the car is blue and 2) the witness is incorrect and the car is green. The double negative does my head in. Notwithstanding that, if we add those two probabilities together, we get the 29% chance that the witness says blue and we can then normalize the 12% (car is blue) to get the 41% chance that the car is actually blue.

The 2×2 table suggested by the Mage’s approach is as follows. Note that he had to work out the marginals, as explained in his post.

Car is green Car is blue Marginal
Witness says green 68% 3% 71%
Witness says blue 17% 12% 29%
Marginal 85% 15% 100%

With this setup, the probability of a blue car is easy to isolate as the scenario now takes up a single row. Just divide the 12% by 29% to normalize the row and we arrive at the posterior probability of 41%.

It’s a subtle thing, but meaningful. The other nice thing about this approach is that it’s consistent with how we might look at a binary classification problem. It’s worth taking a quick moment to identify some of the salient attributes of the table. Positive predictive value is the ratio of true positives divided by total number of positive forecasts. This is the likelihood that the witness will be correct, when she says that a car is green. (“Positive” in this context may be regarded as a green car.) In this case, that number is 96%. This is very high. The negative predictive value, which is the likelihood the witness will be correct when she says blue is 41%.

This is what the Mage’s table looks like with the probabilities replaced by the terms we use to describe them. (Again, “positive” in this case is arbitrarily defined to be a green car.)

Car is green Car is blue
Witness says green Positive Predictive Value (PPV) False Discovery Rate (FDR)
Witness says blue Fale Omission Rate (FOR) Negative Predictive Value (NPV)
Marginal Prevalence 1-Prevalence

There’s a very important point that isn’t emphasized in the example. The witness’ Accuracy is less than the Prevalence. The witness could achieve an accuracy of 85% by simply guessing “green” for every car that she sees. This is what gives us the counterintuitive result that a witness who is accurate 80% of the time has less than a 50/50 chance of having seen a blue car. How accurate would they need to be to get at least 50%?

The quantity we’re interested in is the negative predictive value divided by the sum of NPV / FOR. I don’t have a letter for this, so I’m calling it q. (I’d love to hear that this thing has a name.)

q=\frac{A(1-P)}{A(1-P)+(1-A)P}=\frac{A-AP}{A-2AP+P}

Solving for A, we get:

A=\frac{qP}{2qP-q+1-P}

For the case where q=0.5, this simplifies to the Prevalence. If the witness’ accuracy is equal to the percentage of green cars, we can be 50% confident that she really saw a blue car. (Note that we can’t get to accuracy= Prevalence by always guessing a green car. If this were the case, we’d never have a situation where the witness said she saw a blue car.)

How does our table look now?

Car is green Car is blue Marginal
Witness says green 72.25% 2.25% 74.5%
Witness says blue 12.75% 12.75% 25.5%
Marginal 85% 15% 100%

We can see that when the witness says the car is blue, the odds are even that the car is blue. And if she says green? That’s 97%.

There is a bit of contrivance at work here in that we know the population of vehicles with certainty. We also presume that the population has no other defining characteristics that could improve our accuracy. Perhaps blue cars are driven at particular hours of the day or more in certain locations? However, there is a bit of a lesson. If you’re trying to identify the rare case, you must be able to generate an accuracy that’s higher than the prevalence of the baseline case. A rare disease needs a very precise diagnostic test. Or, in my field, if you’re trying to identify the one liability claim in 1,000 that will produce a massive jury award, your predictive model must be very, very good.

Session info:

## R version 3.1.1 (2014-07-10)
## Platform: x86_64-pc-linux-gnu (64-bit)
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] knitr_1.6        RWordPress_0.2-3
## 
## loaded via a namespace (and not attached):
##  [1] digest_0.6.4     evaluate_0.5.5   formatR_0.10     htmltools_0.2.4 
##  [5] markdown_0.7.2   RCurl_1.95-4.1   rmarkdown_0.2.50 stringr_0.6.2   
##  [9] tools_3.1.1      XML_3.98-1.1     XMLRPC_0.3-0     yaml_2.1.13

Stuff I’ve gotten horribly wrong

I'm the first (I hope) to admit when I've gotten something wrong. I like to think I'm humble enough to realize that there are limits to my knowledge. Actually, humility doesn't enter into it. Every day I'm confronted with things that I don't know or understand. Those same limits can often blind me to being sage enough to recognize when I've gone off the rails. With time, however, knowledge begins to seep in. So, here it is, stuff I've gotten wrong:

  1. Using a list to store complicated data types in S4 objects is absurd and unnecessary.
    There's a lenghty explanation here, but suffice it to say that it's absolutely possible to vectorize individual elements of your S4 object. I've done it and it's a gas. Don't get me wrong, it's not a walk in the park, but it allows you to build up very complicated objects. So long as accessor functions are coded cleanly, things will work out. Using a list to store complicated elements is a bad idea on a number of levels.

  2. It's totally possible to extract the contents of a data frame without fear of R returning a vector.
    This is really embarassing. All you need to do is set the parameter drop=FALSE.

  3. Computed columns might be a good idea. My thoughts on how to implement them and my response to alternate suggestions was moronic. I use reshape2 and plyr all the time. I'm still not happy that I can't simply define a computed column like I can in SQL, but I've not developed a better alternative.

I'm sure there are others. My initial epiphany about mapply and its relation to nested loops has faded. This is mostly the result of my having gained deeper experience with the vectorization of the language. I still use mapply in this way, so I'm not yet ready to concede that this is approach is “wrong”, per se.

A few weeks ago, I was in Africa as part of a team of instructors demonstrating how to use R. I sat with one of the students for two hours going over some basic coding. At one point, I could tell that he was reluctant to execute a command after he'd typed it. I told him, “Learning R means making many, many mistakes. Go ahead and get started and don't worry.” His code ran fine.

Recursive assignment

Here’s yet another example where I just need to read the help files. Before I go on, I should add my own notion as to why that’s not always easy to do. On loads of message boards, you’ll see people say- correctly- that the documentation is very clear on XYZ. True. But that’s only relevant if you read the bit of the documentation that actually matters to you and you have all of the context you need to understand the terse (though accurate!) descriptions there. It’s a bit like a bus schedule in Samarkand. Absolutely clear and useful if you’re in central Asia and know where you are and where you need to go and when you need to get there. If you’ve been walking the Silk Road for weeks and can’t tell Samarkand from Tashqent, that bus schedule may not do you as much good. So it is with R documentation. Sometimes you’ll have to dust off your shoes, get patient and ask a stranger for help.

So what I had wanted to do was to understand something fairly basic. How is the following statement processed:

myObject$MyColumn[2] = "New value"

This is a typical method to manipulate individual cells in a data frame and a very natural way to structure custom R objects. So, when creating my own objects, how do I implement it? If there is customization, where does it take place? Do I access the element in the $ or the [] first? What assignment operator is being used?

To investigate, I created a very simple object with easy properties that I could assign.

setClass("Person", representation(FirstName = "character", LastName = "character", 
    Birthday = "Date"))

I then created two easy access and set methods. For reasons that will become clear in a moment, I also added a statement to indicate when the methods had been called.

setMethod("$", signature(x = "Person"), function(x, name) {
    print("Just called $ accessor")
    arguments <- as.list(match.call())
    slot(x, name)
})
setMethod("$<-", signature(x = "Person"), function(x, name, value) {
    print("Just called $ assignment")
    arguments <- as.list(match.call())
    slot(x, name) = value
    x
})

And I created a new object.

objPeople = new("Person", FirstName = c("Ambrose", "Victor", "Jules"), LastName = c("Bierce", 
    "Hugo", "Verne"), Birthday = seq(as.Date("2001/01/01"), as.Date("2003/12/31"), 
    by = "1 year"))

So, I can access the properties and my methods will tell me when they've been accessed. I can also assign to the member and I’ll be told when that happens as well.

objPeople$FirstName
## [1] "Just called $ accessor"
## [1] "Ambrose" "Victor"  "Jules"
objPeople$FirstName = "Joe"
## [1] "Just called $ assignment"

Now here’s the interesting bit. (Interesting if you’ve just gotten to the train station in Samarkand and are trying to find your hotel. Not so interesting if you’ve been in Uzbekistan for a few weeks.)

objPeople$FirstName[2] = "Joe"
## [1] "Just called $ accessor"
## [1] "Just called $ assignment"

The assignment produced a call to the accessor function? Why? The answer may be found in one of two places. One is the very clear, concise and speedy answer that I got to a question I posed on StackOverflow, which may be read here. Two is the R documentation, which may be found here.

This will tell us that the following two sets of statements are equivalent. (For the rest of the post, I’m suppressing output, so the messages about when the ‘$’ operators are called will not appear.)

objPeople$FirstName[2] = "Joe"

`*tmp*` <- objPeople
objPeople <- `$<-`(`*tmp*`, name = "FirstName", value = `[<-`(`*tmp*`$FirstName, 
    2, value = "Joe"))

So what’s happening? When I want to assign to a subset, three things take place. First, I use my accessor to sort out precisely which value I’m extracting from. Next, I use bracket assignment to alter the elements of a subset of that vector. Finally, I assign the whole vector back to the component of my object. This is a bit easier to see, if we take the steps one at a time.

gonzo = objPeople$FirstName
mojo = `[<-`(gonzo, 2, value = "Joe")
objPeople = `$<-`(objPeople, "FirstName", mojo)

This is why the accessor is not called if there is no subset in the assignment. In that case, the equivalent expression is simply the following:

objPeople = `$<-`(objPeople, "FirstName", "Joe")

Welcome to Uzbekistan. Please enjoy our fine network of buses.

Watching Africa from a plane

I wrote this 8 or 9 days ago, while on a plane and am just now getting around to posting it.

It'€™s either 7:12 PM Friday or 2:12 AM Saturday. I'€™m somewhere over the Mediterranean, having just passed over Tunisia. Sunrise will happen too late for me to see the Sahara. It's a mass of beige on the tiny map; a green dotted line treks doggedly forward over a baked wasteland about the size of the entirety of the US east of the Mississippi. It's wrong to talk about Africa without talking about the enormity of it. Once that's cleared, we'll be in Addis Ababa, Ethiopia. I doubt there will be any on offer- and it won't be at all an appropriate time to drink it- but I'd love to try some Tej. This is likely moot as I have no Ethiopian currency. I'll be glad for a coffee and the chance to buy my kids some postcards.

I'm not sure who else is on this plane. Quite a few Africans, of course, but more white Americans than I was expecting. At least one is wearing a purple t-shirt with letters arranged in the shape of a cross. Are they all missionaries? Well, not all, obviously. I'm not and neither is the woman sitting next to me. She works with the university in Addis. And me? I've got this laptop and my brain and I'm going to try to share the contents of both with some students in Rwanda. I'll have help, of course. I remain intellectually embarrassed to be involved at all. It was only my eagerness to travel, to experience and to learn that got me here. That and, I expect, a surfeit of volunteers.

Still, the question remains: just what are we all doing in Africa? I can only answer the question for myself and there are two bits of it. The first is the easy bit. I love to travel. This trip has enough altruism that I don't feel too guilty leaving my family for 10 days on another of my crazy whims. The second is different. If the trip had been to Peru, Slovakia, or Sri Lanka it would not have caught me the same way. Africa. The continent which is too big to fail, but for which everyone has such bleak hopes. Africa. Origin of humanity. Eden. Africa, source of cheap natural resources, from oil to uranium to diamonds to its most devalued commodity: free human labor. Africa the home of failed states, dictatorships, foreign-drawn borders, heart of darkness, punishing sun, steamy jungles and parched sand. Africa. The place I'd chosen to ignore for the first 40 years of my life. The place that draws me in the same way other places have, with the whispered voice telling me, “There must be more than this. Everyone else surely has it wrong. The only way you'll find out is to go there.”

This won't be an exhaustive experience, mind. It's really just 9 days. Nowhere enough for insight, answers or truth. Yet more than I had when I woke up this morning. Before I dragged myself from my home, bleary-eyed, drove through the darkness to fly against the sun and compressed one day and half a world while sitting on a plane. Tomorrow, I’ll rise again and dust my eyes to greet the African dawn.

Triangle Open Data Day 2014

A rare live blog post today. I'm writing this from Triangle Open Data Day 2014. This will basically be a page of links that I'll try to get around to later.

GIS resources:

Open data resources:

Cloud development resources:

MongoDB presentation is about to start. Will likely update this post.

Another skewed normal distribution

At the CLRS last year, Glenn Meyers talked about something very near to my heart: a skewed normal distribution. In loss reserving (and I'm sure, many other contexts) standard linear regression is less than ideal as it presumes that deviations from the mean are equally distributed. We rarely expect this assumption to hold (though we should always test it!). Application of a log transform is one way to address this, but that option isn't available for negative observations. Negative incremental reported losses are very common and even negative payments which arise from salvage, subrogation or other factors happen often enough that (in my view) the log transform isn't an attractive option.

Meyers gave a talk where he described (among other things) the lognormal-normal mixture. That presentation, Stochastic Loss Reserving with Bayesian MCMC Models, is worth any actuary's time. The idea is simplicity itself. Z is lognormally distributed, with parameters mu and theta. X is normally distributed with parameters Z and delta.

Let's have a look at this distribution. Well, actually that's easier said than done. Here are the equations:

Z \sim Lognormal(\mu,\sigma)

X \sim Normal(Z,\alpha)

So Z is easy, it's just a lognormal. In fact, here it is:

sigma = 0.6
mu = 2
x = seq(-10, 60, length.out = 500)
Z = dlnorm(x, mu, sigma)
plot(x, Z, type = "l")

plot of chunk unnamed-chunk-1

X for the expected value of Z is also easy. Here it is:

expZ = exp(mu + sigma^2/2)
delta = 3
pdfX = dnorm(x, expZ, delta)
plot(x, pdfX, type = "l")

plot of chunk unnamed-chunk-2

Here, we've produced a normal centered around the expected value of the original lognormal distribution. Not skewed and not all that interesting. What we want is a distribution wherein the mean of the normal is itself a random variable. To get that, we have three options: one lazy, one easy and one. I'll show the lazy one first.

The lazy one is to randomly sample from Z and then feed that to X. We end up with a histogram which approximates a density function.

samples = 10000
Z = rlnorm(samples, mu, sigma)
X = rnorm(samples, Z, delta)
hist(X)

plot of chunk unnamed-chunk-3

That's undoubtedly skew and might even correspond to Glenn's graph on slide 36. But that was lazy. The easy way is to repeat a procedure similar to what I did a week ago when demonstrating a Bayesian model which combined a lognormal and an exponential. Here, we just calculate the joint density over a subspace of the probability domain, normalize it and then compute the marginal.

plotLength = 250
Z = seq(0.001, 40, length.out = plotLength)
X = seq(-10, 40, length.out = plotLength)

dfJoint = expand.grid(Z = Z, X = X)

dfJoint$Zprob = dlnorm(dfJoint$Z, mu, sigma)
dfJoint$Xprob = dnorm(dfJoint$X, dfJoint$Z, delta)
dfJoint$JointProb = with(dfJoint, Zprob * Xprob)
dfJoint$JointProb = dfJoint$JointProb/sum(dfJoint$JointProb)

jointProb = matrix(dfJoint$JointProb, plotLength, plotLength, byrow = TRUE)

filled.contour(x = X, y = Z, z = jointProb, color.palette = heat.colors, xlab = "X", 
    ylab = "Z")

plot of chunk unnamed-chunk-4

Groovy. X can be anything, but higher values of Z will pull it to the right. Here's the marginal distribution of X.

library(plyr)
marginalX = ddply(dfJoint[, c("X", "JointProb")], .variables = "X", summarize, 
    marginalProb = sum(JointProb))
plot(marginalX$X, marginalX$marginalProb, type = "l", xlab = "X", ylab = "Marginal probability")

plot of chunk unnamed-chunk-5

The hard way is to sit down with pen and paper and work this out algebraically. I tried. I worked through a few ugly integrations did some research on the interweb and have concluded that if there is a closed form solution, it's not something that people spend a great deal of time talking about. I will point to this paper and this website as material that I'd like to get better acquainted with. It would appear that this comes up in financial and time series analysis. No surprises there, I think there are similar reasons to need this sort of distribution.

For the record, here's what that integrand looks like.

f_{X}(X|Z)f_{Z}(Z) = \frac{1}{Z \delta \sigma 2\pi}e^{\frac{-(X-Z)^2}{2\delta^2}+\frac{-(ln(Z)-\mu)^2}{2\sigma^2}}

If you know how to integrate that over Z, please let me know.

If you've made it this far, odds are good that you're an actuary, a stats nerd or both. Whatever you are, take a moment to thank heaven for Glenn Meyers, who's both. He's made tremendous contributions to actuarial literature and we're all the better for it.

Session info:

## R version 3.0.2 (2013-09-25)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## 
## locale:
## [1] LC_COLLATE=English_United States.1252 
## [2] LC_CTYPE=English_United States.1252   
## [3] LC_MONETARY=English_United States.1252
## [4] LC_NUMERIC=C                          
## [5] LC_TIME=English_United States.1252    
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] knitr_1.5        RWordPress_0.2-3 plyr_1.8        
## 
## loaded via a namespace (and not attached):
## [1] evaluate_0.5.1 formatR_0.10   RCurl_1.95-4.1 stringr_0.6.2 
## [5] tools_3.0.2    XML_3.98-1.1   XMLRPC_0.3-0
Follow

Get every new post delivered to your Inbox.

Join 293 other followers