Thursday, October 2, 2014

Big data in R

Approaches and packages to handle big data in R
This post is largely credit to this blog.


Data Storage I/O

  • http://blog.revolutionanalytics.com/2009/12/r-tip-save-time-and-space-by-compressing-data-files.html
  • fread
  • data.table
  • http://stackoverflow.com/questions/1727772/quickly-reading-very-large-tables-as-dataframes-in-r
  • http://davetang.org/muse/2013/09/03/handling-big-data-in-r/


Data Manipulation

  • dplyr in plyr package


Data Visualization

  • bigvis
  • ggplot2

Memory

  • ffbase
  • http://www.slideshare.net/EdwindeJonge1/ffbase
Ensemble
http://www.r-bloggers.com/improve-predictive-performance-in-r-with-bagging/
http://vikparuchuri.com/blog/parallel-r-loops-for-windows-and-linux/


Parallelization

  • http://adv-r.had.co.nz/Profiling.html#parallelise
  • http://notjustmath.wordpress.com/2012/01/22/parallel-computing-with-r/
  • http://stackoverflow.com/questions/24335569/in-r-how-to-predict-with-svm-model-in-parallel-using-foreach-snow
  • http://topepo.github.io/caret/parallel.html
  • http://stackoverflow.com/questions/7782501/how-to-interpret-predict-result-of-svm-in-r?rq=1
  • http://www.r-bloggers.com/parallel-r-model-prediction-building-and-analytics/