Research Update: Week of December 15 — R Programming
An update as to my progress in learning R and Data Science.
After one week of research and trial and error, I have some progress to publish/report. I started the week by getting back into the R Programming language. I am using a short book from Packt Publishing, Social Media Mining with R as a primer. I also have R for Dummies (2012 edition) and Learning R (2013), but I am focusing on the small book as of now for three reasons: a) R is not the primary language I will be using; b) it is shorter and a relevant primer; and c) the other books are more recent.
The first two chapters (my self-imposed assignment for this book this week) start with leveling about data science, big data, and statistic analysis in the first chapter then the fundamentals of R in the second chapter. The fundamentals covered basic arithmetic, functions, vectors, variables, importing data, and basic use.
For example, to install packages in R, a simple command as such is issued:
install.package("<package_name>", dependencies=True)
This is similar to Python’s pip utility. To get pip, issue one (or both) of the commands below on a Debian or Ubuntu-based Linux host (based on which version of Python you are using):
apt-get install python-pipapt-get install python3-pip
Then to use pip to install a package for Python from Python Package Index (PyPI), issue this command (showing both pip and pip3 for instances where people may be running Python2 and Python3):
pip install <package_name>pip3 install <package_name>
Back to R. Setting a working directory can save a lot of headaches.
setwd("<full/path/to/directory>")
So to set it to Joe’s Windows or Linux home directory, we would issue the following commands:
Windows: setwd("C:\Users\Joe\R_Working_Directory")Linux: setwd("/home/Joe/R_Working_Directory")
Let’s Load a sample CSV file from the computer.
myfile <- read.csv("path/to/file"
If we wanted to load a file from the internet, substitute the path with the full URL (including http:// or https://).
Once we have the file loaded in R, we can analyze it.
Something notable about R:
The = operator is not used to set variables. <- is.
mydata = data is incorrect.mydata<- data is the correct syntax in R.
That’s it for what I have learned so far. Next week, I will be covering Mining Twitter with R and the pitfalls of social media.