Research Update: Week of December 15 — R Programming

An update as to my progress in learning R and Data Science.

After one week of research and trial and error, I have some progress to publish/report. I started the week by getting back into the R Programming language. I am using a short book from Packt Publishing, Social Media Mining with R as a primer. I also have R for Dummies (2012 edition) and Learning R (2013), but I am focusing on the small book as of now for three reasons: a) R is not the primary language I will be using; b) it is shorter and a relevant primer; and c) the other books are more recent.

The first two chapters (my self-imposed assignment for this book this week) start with leveling about data science, big data, and statistic analysis in the first chapter then the fundamentals of R in the second chapter. The fundamentals covered basic arithmetic, functions, vectors, variables, importing data, and basic use.

For example, to install packages in R, a simple command as such is issued:

install.package("<package_name>", dependencies=True)

This is similar to Python’s pip utility. To get pip, issue one (or both) of the commands below on a Debian or Ubuntu-based Linux host (based on which version of Python you are using):

apt-get install python-pipapt-get install python3-pip

Then to use pip to install a package for Python from Python Package Index (PyPI), issue this command (showing both pip and pip3 for instances where people may be running Python2 and Python3):

pip install <package_name>pip3 install <package_name>

Back to R. Setting a working directory can save a lot of headaches.

setwd("<full/path/to/directory>")

So to set it to Joe’s Windows or Linux home directory, we would issue the following commands:

Windows: setwd("C:\Users\Joe\R_Working_Directory")Linux: setwd("/home/Joe/R_Working_Directory")

Let’s Load a sample CSV file from the computer.

myfile <- read.csv("path/to/file"

If we wanted to load a file from the internet, substitute the path with the full URL (including http:// or https://).

Once we have the file loaded in R, we can analyze it.

Something notable about R:

The = operator is not used to set variables. <- is.
mydata = data is incorrect.mydata<- data is the correct syntax in R. 

That’s it for what I have learned so far. Next week, I will be covering Mining Twitter with R and the pitfalls of social media.