R Programming
Data Types
Basic Data Types
R has five basic or “atomic” classes of objects:
- character
* numeric (real numbers)
* Numbers in R as numeric objects by default. (Double precision real numbers)
*
Inf
represents infinity. *NA
&NaN
represents an undefined value and not a number.is.na()
andis.nan()
are used to test objects.NA
values have a class also, so there are integerNA
, characterNA
, etc.- A
NaN
value is alsoNA
but the converse is not true. * Attributes of an object like length and other metadata can be access using theattributes()
function. * integer -1
is a numeric object.1L
is an integer. * complex * logical (True/False)
Complex Data Types
Vectors
The most basic object is a vector.
* A vector can only contain objects of the same class.
* BUT: The one exception is a list, which is represented as a vector but can contain objects of
different classes (indeed, that’s usually why we use them)
* Empty vectors can be created with the vector()
function.
* Vector examples
1 2 3 4 5 6 |
|
- Lists are a special type of vector that can contain elements of different classes.
1 2 3 4 5 6 7 8 |
|
- Mixing Objects
When different objects are mixed in a vector, coercion occurs so that every element in the vector is of the same class.
1 2 3 |
|
- Explicit Coercion
Objects can be explicitly coerced from one class to another using theas.*()
functions, if available.
1 2 3 4 5 6 7 8 9 |
|
- Nonsensical coercion results in NAs
1 2 3 4 5 6 7 8 9 |
|
- Matrices Matrices are vectors with a dimension attribute. The dimension attribute is itself an integer vector of length 2 (nrow, ncol).
1 2 3 4 5 6 7 8 9 10 |
|
- cbind-ing and rbind-ing
Matrices can be created by column-binding or row-binding with
cbind()
andrbind()
.
1 2 3 4 5 6 7 8 9 10 11 |
|
Factors
- Factors are nothing but enumeration data types used to represent categorical data.
- Can be ordered or unordered.
- Factor can be thought of as an integer vector where each integer has a label.
1 2 3 4 5 6 7 8 9 10 11 12 |
|
Data Frames
- A special type of list where every list is of same length.
- All columns in data frame must have names
-
- Unlike matrices, data frames can store different classes of objects in each column.
- To create -
read.table()
orread.csv()
- To convert to a matrix -
data.matrix()
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
|
data.frame()
# create data frame or table e.g.,test.data.frame<-data.frame(id=c(1,2,3,4,5),name=c("a","b","c","d","e"))
edit(framename)
# edit the content of the tablestr(frame)
# prints the structure and data types of the data framenames(framename)
– prints the column namesdataframe[index.ROW, index.COLUMN]
dataframe[[1]]
# returns the first column as a vectordataframe[1]
# returns the first column wrapped in a data framedataframe[c(1,3)]
# returns the 1st and 3rd column wrapped in a data framedataframe[1:3,]
# displays all columns but first 1-3 rows onlyis.data.frame(framename)
#checks if an object is a data frame
Basics Statistic Functions
mean(x)
# x is a vectormedian(x)
sd(x)
var(x)
cor(x,y)
cov(x,y)
lapply(dataframe, function)
# Apply function like mean/median over a list/dataframe
Graphs
plot(x,y)
# Scatter plot. Only numeric vectors or dataframes are allowed.barplot()
- Bar Chartboxplot()
- Box Plot. Provides a quick visual summary of a dataset. Thick line in middle is the median. Box identifies the 1st(bottom) and 3rd(top) quartiles.hist(x)
- Histogram. Groups data into bins
Appendix
Command Reference
ls()
orls(all.names = TRUE)
# Lists all variables/objects defined in the * sessionsetwd(“c:/xyz”)
# sets working directorygetwd()
# Gets working directoryrunif(8)
# generates 8 random numbersx <- 9
# assigns 9 to object x in workspacex
# prints the value of xrm(x)
# removes the object xrm(list=ls())
# removes all objects in workspace- Save & Load (Binary)
save()
# saves all objects to default file .RData. Objects still exist in *memory
(binary format)save(obj1, obj2, file=”filename”)
load(“filename”)
# loads from file to memory
- Save & Load (Text)
write.table(obj1, file=”filename”)
# only 1 obj at a timeload.table()
Packages
install.packages(c("ggplot2", "devtools", "KernSmooth")
# install the collection of packages from CRANlibrary()
#list all available packageslibrary(package)
# loads package on to memoryrequire(package)
# loads package on to memory. Used in scripts. Returns loading status as boolean.detach(package:name)
# unloads package from memory.
Help
?func
# open help page on function ‘func’help(func)
# same as aboveapropos("foo")
# list all functions containing string fooexample(foo)
# show an example of function foovignette()
# show available vignettes on installed packagesvignette("foo")
# show specific vignette
Bibliography
- R Inferno
- Software for Data Analysis - Programming with R (http://www.springer.com/statistics/computational+statistics/book/978-0-387-75935-7)
- The R book (http://www.wiley.com/WileyCDA/WileyTitle/productCd-0470973927.html)
- The Art of R Programming
- R in Action
- Ref Cards
- http://cran.r-project.org/doc/contrib/Short-refcard.pdf
- http://www.statmethods.net/interface/help.html
- Tutorials
- http://www.johndcook.com/R_language_for_programmers.html
- http://cran.r-project.org/doc/manuals/R-intro.pdf
- http://www.decisionsciencenews.com/?p=261
- http://www.r-tutor.com/r-introduction/data-frame
- http://tryr.codeschool.com/levels/2/challenges/1