R Programming
Data Types
Basic Data Types
R has five basic or “atomic” classes of objects:
- character
* numeric (real numbers)
* Numbers in R as numeric objects by default. (Double precision real numbers)
*
Infrepresents infinity. *NA&NaNrepresents an undefined value and not a number.is.na()andis.nan()are used to test objects.NAvalues have a class also, so there are integerNA, characterNA, etc.- A
NaNvalue is alsoNAbut the converse is not true. * Attributes of an object like length and other metadata can be access using theattributes()function. * integer -1is a numeric object.1Lis an integer. * complex * logical (True/False)
Complex Data Types
Vectors
The most basic object is a vector.
* A vector can only contain objects of the same class.
* BUT: The one exception is a list, which is represented as a vector but can contain objects of
different classes (indeed, that’s usually why we use them)
* Empty vectors can be created with the vector() function.
* Vector examples
1 2 3 4 5 6 | |
- Lists are a special type of vector that can contain elements of different classes.
1 2 3 4 5 6 7 8 | |
- Mixing Objects
When different objects are mixed in a vector, coercion occurs so that every element in the vector is of the same class.
1 2 3 | |
- Explicit Coercion
Objects can be explicitly coerced from one class to another using theas.*()functions, if available.
1 2 3 4 5 6 7 8 9 | |
- Nonsensical coercion results in NAs
1 2 3 4 5 6 7 8 9 | |
- Matrices Matrices are vectors with a dimension attribute. The dimension attribute is itself an integer vector of length 2 (nrow, ncol).
1 2 3 4 5 6 7 8 9 10 | |
- cbind-ing and rbind-ing
Matrices can be created by column-binding or row-binding with
cbind()andrbind().
1 2 3 4 5 6 7 8 9 10 11 | |
Factors
- Factors are nothing but enumeration data types used to represent categorical data.
- Can be ordered or unordered.
- Factor can be thought of as an integer vector where each integer has a label.
1 2 3 4 5 6 7 8 9 10 11 12 | |
Data Frames
- A special type of list where every list is of same length.
- All columns in data frame must have names
-
- Unlike matrices, data frames can store different classes of objects in each column.
- To create -
read.table()orread.csv() - To convert to a matrix -
data.matrix()
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | |
data.frame()# create data frame or table e.g.,test.data.frame<-data.frame(id=c(1,2,3,4,5),name=c("a","b","c","d","e"))edit(framename)# edit the content of the tablestr(frame)# prints the structure and data types of the data framenames(framename)– prints the column namesdataframe[index.ROW, index.COLUMN]dataframe[[1]]# returns the first column as a vectordataframe[1]# returns the first column wrapped in a data framedataframe[c(1,3)]# returns the 1st and 3rd column wrapped in a data framedataframe[1:3,]# displays all columns but first 1-3 rows onlyis.data.frame(framename)#checks if an object is a data frame
Basics Statistic Functions
mean(x)# x is a vectormedian(x)sd(x)var(x)cor(x,y)cov(x,y)lapply(dataframe, function)# Apply function like mean/median over a list/dataframe
Graphs
plot(x,y)# Scatter plot. Only numeric vectors or dataframes are allowed.barplot()- Bar Chartboxplot()- Box Plot. Provides a quick visual summary of a dataset. Thick line in middle is the median. Box identifies the 1st(bottom) and 3rd(top) quartiles.hist(x)- Histogram. Groups data into bins
Appendix
Command Reference
ls()orls(all.names = TRUE)# Lists all variables/objects defined in the * sessionsetwd(“c:/xyz”)# sets working directorygetwd()# Gets working directoryrunif(8)# generates 8 random numbersx <- 9# assigns 9 to object x in workspacex# prints the value of xrm(x)# removes the object xrm(list=ls())# removes all objects in workspace- Save & Load (Binary)
save()# saves all objects to default file .RData. Objects still exist in *memory(binary format)save(obj1, obj2, file=”filename”)load(“filename”)# loads from file to memory
- Save & Load (Text)
write.table(obj1, file=”filename”)# only 1 obj at a timeload.table()
Packages
install.packages(c("ggplot2", "devtools", "KernSmooth")# install the collection of packages from CRANlibrary()#list all available packageslibrary(package)# loads package on to memoryrequire(package)# loads package on to memory. Used in scripts. Returns loading status as boolean.detach(package:name)# unloads package from memory.
Help
?func# open help page on function ‘func’help(func)# same as aboveapropos("foo")# list all functions containing string fooexample(foo)# show an example of function foovignette()# show available vignettes on installed packagesvignette("foo")# show specific vignette
Bibliography
- R Inferno
- Software for Data Analysis - Programming with R (http://www.springer.com/statistics/computational+statistics/book/978-0-387-75935-7)
- The R book (http://www.wiley.com/WileyCDA/WileyTitle/productCd-0470973927.html)
- The Art of R Programming
- R in Action
- Ref Cards
- http://cran.r-project.org/doc/contrib/Short-refcard.pdf
- http://www.statmethods.net/interface/help.html
- Tutorials
- http://www.johndcook.com/R_language_for_programmers.html
- http://cran.r-project.org/doc/manuals/R-intro.pdf
- http://www.decisionsciencenews.com/?p=261
- http://www.r-tutor.com/r-introduction/data-frame
- http://tryr.codeschool.com/levels/2/challenges/1