rustfare-package and using Rosstat regional statistics

08 Aug 2013

I have been working on Russian statistical data a lot during this year. However, accessing many open data sources has proven to be cumbersome. One example is mighty regional data resource by Rosstat. To facilitate access to this and other data sources I begun to write package for R-language called rustfare. Below you find some key characteristics of the rustfare at this point in time. It will be improved and extended constantly so make sure you have the latest version installed and that you follow the up-to-date instructions. I will announce major updates through this blog and my twitter.

rustfare-package

Installation

library(devtools)
install_github(repo = "rustfare", username = "muuankarski")
library(rustfare)

Examples: Rosstat regional statistics

Rosstat regional statistic includes values of the indicators on three levels:

  1. federal level
  2. federal district level
  3. regional level

To dowload the data you may use GetRosstat()-function that requires two arguments,

  1. indicator (from the listing above),
  2. level (federal/federal_district/region)

The code below returns a dataset at federal district level on infant mortality and plots a line graph over time.

library(rustfare) # load rustfare for obtaining the data
library(ggplot2) # load ggplot2 for plotting
dat <- GetRosstat("infant_mortality_rate",
                   "federal_district")
head(dat, 3) # print the first 6 rows of data.frame
ggplot(dat, aes(x=year,y=value,color=region_en)) +
  geom_point() + 
  geom_line() +
  geom_text(data = subset(dat, year == 2010), 
            aes(x=year,y=value,
                color=region_en,label=region_en),
            size=3, hjust=1) +
  theme(legend.position="none")

Next chunk of code extracts the same indicators but at the regional level

library(rustfare) # load rustfare for obtaining the data
library(ggplot2) # load ggplot2 for plotting
dat <- GetRosstat("infant_mortality_rate",
                   "region")
head(dat, 3) # print the first 6 rows of data.frame
ggplot(dat, aes(x=year,y=value,color=region_en)) +
  geom_point() + 
  geom_line() +
  geom_text(data = subset(dat, year == 2010), 
            aes(x=year,y=value,
                color=region_en,label=region_en),
            size=3, hjust=1) +
  theme(legend.position="none")
comments powered by Disqus