Time to gently introduce R into MKTR. In Session 3, I attempt to demonstrate sampling related concepts using R. This does not require that you as students do anything other than watch and listen, discuss and learn. However, I would encourage you to try the same thing I demo in class at home on R. All you have to do is copy-paste the Rcode below (in the grey shaded boxes) onto the R console. Since there is no grading or assignment pressure involved, consider this a gentle intro to R. Later, when the graded homework assignments happen, then also, expect the same thing - I will put out R code here on the blog and you have the option of using it to solve your homeworks directly.
1. To download and install R:
Go to the following link
CRAN project: R download links
download the installer, and follow its instructions.
After that, I expect that in < 20 minutes (on a good net connection), you will have top-class computational firepower ready on your machines.
Open R and, well, look around. The GUI won't look like much probably, but appearances can be deceiving (as we'll see later in the course in Session 6 - Qualitative research and Session 7 - Experimentation).
2. Read in Session 3 Dataset into R
This data is available for copying from this google spreadsheet.
Please copy the data onto a .txt file on your computer and save it (preferably on the desktop). Use the code in step 3 to read it into R.
- The '#' symbol is a comment character and text following it on any line is not executed
- The code below will read the dataset into an R object called 'mydata'. You can give any name you want to the dataset.
mydata = read.table(file.choose(), header = TRUE) # TRUE only if there're column headers |
- After reading in *any* dataset, its always good practice to lookup data summaries - to eyeball if all is well or not.
- If you want to understand the code, copy-paste the text in the grey boxes line by line. If you don't particularly care for coding in R, just copy paste the entire block of code in the grey boxes onto the R console.
dim(mydata) # show dataset's dimensions summary(mydata) # show variables' descriptive summaries mydata[1:5,] # show first 5 rows of mydata |
# What are the data like? Visualizing. # attach(mydata) # enables calling columns by name
par(mfcol=c(2,1))# makes plots on a single page hist(mydata[,i], breaks=30, # histogram with 30 breaks main=dimnames(mydata)[[2]][i], # give plot title col="gray") # shade the hist grey abline(v=mean(mydata[,i]), # draw vertical line at mean lwd=3, lty=2, col="Red") # of color red & width 3 } # loop ends |
Those wanting to see what the code does should copy-paste line-by-line. Others can copy-paste the entire block of code in grey.
# Randomly sample 10 values & estimate mean height, weight. k = 10 # set sample size 'k' to 10 ht = sample(height,k); ht # sample k height values randomly wt = sample(weight,k); wt # same for weight mean(ht) # show mean of the height sample mean(wt) # show mean of the weight sample error.ht = mean(height)-mean(ht) # calculate sampling error error.ht # show sampling error in height
error.wt = mean(weight)-mean(wt) |
# Randomly sample 10 values & estimate mean height, weight. k = 40 ht = sample(height,k); ht wt = sample(weight,k); wt mean(ht); mean(wt) mean(height); mean(weight) error.ht = mean(height)-mean(ht); error.ht error.wt = mean(weight)-mean(wt); error.wt |
par(mfrow=c(2,2)) # draws plots in 2x2 pattern
hist(mydata[,1], breaks=30, # same hist() function
hist(mydata[,2], breaks=30, main="Population Weight", xlim=c(40,100), col="gray")
hist(ht, breaks=20, main="Sample size k=40", xlim=c(140,200), col="beige")
hist(wt, breaks=20, main="Sample size k=40", xlim=c(40,100), col="beige") |
outp = matrix(0,nrow=1000,ncol=2)# build empty output matrix k = 10;# set sample size for (i in 1:1000){ # open loop
outp[i,1]=mean(sample(height,k))# save sample statistics } # close loop par(mfrow=c(2,2)) # 2x2 pattern plots for (i in 1:ncol(outp)){ # open plotting loop again hist(outp[,i], breaks=10,main=c( "sample size=",k), xlab=dimnames(mydata)[[2]][i],xlim=range(mydata[,i]))} |
outp = matrix(0,nrow=1000,ncol=2) k = 40;# set sample size for (i in 1:1000){ outp[i,1]=mean(sample(height,k)) outp[i,2]=mean(sample(weight,k))} #par(mfrow=c(2,2)) for (i in 1:ncol(outp)){ hist(outp[,i], breaks=10,main=c( "sample size=",k), xlab=dimnames(mydata)[[2]][i],xlim=range(mydata[,i]))} |
P.S.
I also tried the flipped classroom thing and have putup two youtube vids on how to read in and write out data from R. Here are the links:
5 Steps to Read data into R
4 Steps to Save data from R
No comments:
Post a Comment
Constructive feedback appreciated. Please try to be civil, as far as feasible. Thanks.