Maddison Project database in R

17 Sep 2013

Economic data in R

There is a growing trend of new methods for assessing economic data in R. Packages as WDI: World Development indicators, Quandl, pwt: Penn World Table. Also the data sets -section in CRAN Task View: Computational Econometrics is worth looking.

Maddison Project database

One which is seemingly missing is the mighty Maddison Project database that is often used especially by economic historians. Data includes estimates of economic growth in the world economic between AD 1 and 2010. A major update was done in January 2013 and a detailed documentation of that update can be read from Bolt and Van Zanden (2013). The First Update of the Maddison Project; Re-Estimating Growth Before 1820, Maddison Project Working Paper 4.. The actual data as .xlsx-file is here: mpd_2013-01.xlsx

The Exercise

Below is a stepwise explanation on how to visualize the data.

Load the data

url <- ""
dat <- read.xls(url, header=TRUE, skip=1, stringsAsFactors = FALSE)

Manipulating the data

Excel-sheets usually require some processing in R. Below I’m removing row not needed and melting the data in long format, and convert in to numerical.

column.names <- as.character(dat[1,])
column.names[1] <- "year"
df.mad <- dat[-1,]
names(df.mad) <- column.names
df.mad.l <- melt(df.mad, id.vars="year")
df.mad.l$value <- as.numeric(df.mad.l$value)
df.mad.l <- df.mad.l[!$value), ]

Subset the data

My research emphasises in post-socialist space and therefore I will subset the data. In addition to post-socialist countries I’m including some Western countries, too.

cntry.list <- c("Czech Republic","Estonia","Hungary",
               "F. USSR","USA","Japan","Finland","Sweden")
df.mad.l$variable <-str_trim(df.mad.l$variable, side = "both")
df.mad.l2 <- df.mad.l[df.mad.l$variable %in% cntry.list, ]

Then I will group the countries in one sensible way.

df.mad.l2$group[df.mad.l2$variable %in% c("Czech Republic","Estonia","Hungary",
                                          "Slovakia","Slovenia","Romania","Bulgaria")] <- "CEE"
df.mad.l2$group[df.mad.l2$variable %in% c("Albania","Bosnia","Croatia","Macedonia",
                                          "Montenegro","Kosovo","Serbia")] <- "Balkan"
df.mad.l2$group[df.mad.l2$variable %in% c("Armenia","Azerbaijan","Belarus","Georgia","Kazakhstan",
                                          "Tajikistan","Ukraine","Uzbekistan")] <- "CIS"
df.mad.l2$group[df.mad.l2$variable %in% c("USA","Japan","Finland","Sweden")] <- "WEST"
df.mad.l2$group[df.mad.l2$variable %in% c("F. USSR")] <- "USSR"

Plotting the data from 1850 to 2010

As I mentioned in the beginning the data extends for over 2000 years. My interest is more of contemporary nature, so I will subset data to cover only the last 160 years from 1850 to 2010. Also, I will add some major historical events during that period that have had an effect on economic growth, too.

ggplot(data=df.mad.l2[df.mad.l2$year > 1850, ], 
       aes(x=year,y=value,group=variable,color=group)) + 
  geom_line() + geom_point() + 
            aes(x=year,y=value,label=variable)) +
  annotate("rect", xmin = 1914, xmax = 1918, ymin = 5, ymax = 20000,
  alpha = .2) +
  annotate("text", x = 1916, y = 21000, label = "WW1") +
  annotate("rect", xmin = 1939, xmax = 1945, ymin = 5, ymax = 20000,
  alpha = .2) +
  annotate("text", x = 1942, y = 21000, label = "WW2")  +
  annotate("segment", x = 1991, xend = 1991, y = 0, yend = 30000,
  colour = "blue") +
  annotate("text", x = 1991, y = 31000, label = "Dissolution of \n the Soviet Union") +
  theme_minimal() +
comments powered by Disqus