24 Days of R: Day 16

Yesterday I said that I'd carry on with the monte carlo simulation of insurance data. I'm not going to as I don't think I've got enough time and mental energy to do it justice. I'm sure tens of people are disappointed to learn this.

Instead, I'm going to have a look at the recently released PISA study, which assesses student performance in many countries around the world. This is always a subject of great (if fleeting) interest here in the US as we tend to punch well below our weight. There are many, many issues around education reform in the US and I couldn't possibly treat them here. Suffice it to say that the PISA study prompts many questions about why one of the world's wealthiest nations (seemingly) can't educate its children as well as other countries with fewer resources. News outlets will have their say, pundits will have theirs and wingnuts will also air their views.

But this isn't a site devoted to conjecture. I'd rather devote my time to the objective assessment of data. Noting that this is a nearly impossible goal- my biases will invariably surface- let's talk about data.

First, it's not easy to get and interpret. Some digging on the PISA site leads us to the spot where we can get data. All that may be found here (Thanks Australia!) After a few minutes sifting through various options, I opted to look at the schools questionnairre file. This is a fixed width file, so I get to have my first experience in using something other than read.csv. After a cursory look at the codebook, I'm going to focus on just a few columns of information. I'm not clever enought to wrap my head around how to use the fwf function to pull just the columns I want, so I'm going to write a helper function.

filename = "./Data/INT_SCQ12_DEC03/INT_SCQ12_DEC03.txt"
filewidth = 1271

ReadColumn = function(filename, start, width, filewidth) {
    if (start == 1) {
        df = read.fwf(filename, c(width, width - filewidth))
    } else {
        df = read.fwf(filename, c(-(start - 1), width, start + width - filewidth + 
            1))
    }
    df
}

And I quickly find that these column specifications are wrong. At this late hour, I can't spend any more time trying to decode every column, but I can identify whether or not a school is in an OECD country and whether or not it's private.

public = ReadColumn(filename, 32, 1, filewidth)
OECD = ReadColumn(filename, 18, 1, filewidth)

df = cbind(public, OECD)
colnames(df) = c("Public", "OECD")
library(reshape2)
df$variable = 1

pivot = dcast(df, "Public ~ OECD", sum)
pivot = pivot[pivot$Public <= 2, ]

public.oecd = pivot[1, 2]/sum(pivot[, 2])
public.other = pivot[1, 3]/sum(pivot[, 3])

The fraction of schools that are private in OECD countries is 0.8107 as compared to 0.8066 in other countries. That was a long walk to learn very little. There's undoubtedly loads of great information in here. It's a shame that the file specification isn't more clear. No wonder the pundits spend little time looking at the data.

sessionInfo()
## R version 3.0.2 (2013-09-25)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## 
## locale:
## [1] LC_COLLATE=English_United States.1252 
## [2] LC_CTYPE=English_United States.1252   
## [3] LC_MONETARY=English_United States.1252
## [4] LC_NUMERIC=C                          
## [5] LC_TIME=English_United States.1252    
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] knitr_1.5        RWordPress_0.2-3 reshape2_1.2.2   plyr_1.8        
## 
## loaded via a namespace (and not attached):
## [1] evaluate_0.5.1 formatR_0.10   markdown_0.6.3 RCurl_1.95-4.1
## [5] stringr_0.6.2  tools_3.0.2    XML_3.98-1.1   XMLRPC_0.3-0
About these ads

2 Responses to 24 Days of R: Day 16

  1. You might want to take a look at the code at asdfree for analyzing PISA data with R http://www.asdfree.com/2013/12/analyze-program-for-international.html

    • PirateGrunt says:

      That’s a fantastic site and I look forward to playing with it. Honestly, if the documentation had been clearer, I wouldn’t have wasted quite so much time. I was easily an hour into staring at fixed width columns that didn’t resemble their descriptions when I finally threw in the towel. I’ll have another go at them, but will explore monetdb first.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 295 other followers

%d bloggers like this: