Showing posts with label Session 7. Show all posts
Showing posts with label Session 7. Show all posts

Friday, November 8, 2013

Session 7 Updates

Hi all,

Session 7 was about the Experimentation approach in MKTR. Some big picture insights:

  • Experiments are a very powerful confirmatory tool that can be applied in a variety of business situations.
  • Experiments are gaining widespread acceptability in business as the cost of conducting them drops and the benefits derived pile up (i.e., their ROI keeps rising)
  • The colloquial usage of the term 'experiment' often confuses people. In MKTR, and in the Research in general, 'true' experiments test treatment against an equivalent control group to cancel out extraneous effects.
  • Experiments rely on logical hypotheses and measurable outcomes.
  • While web-based and services firms were the first to leverage this powerful tool, product based firms are finally getting into the act - testing innovations big and small is now commonplace in even FMCG and engineering goods firms.
  • Many firms go with pseudo- or quasi- experiments when the exacting conditions required for a true experiment may not be justified under the cost/benefit calculations.

Admittedly this year, in general, am quite happy with my own time-management in class - most classes have had a fair amount in-class Q&A time and have tended to end on time. However, Session 7 was quite a bit rushed towards the second half and I felt the conjoint portion for once could certainly have doen with more time.

To make up, here's a lengthy, colloquial blog-post on how you may want to use metric conjoint analysis for your project, for example.

Suppose your project R.O. says:

R.O.- Find customers' atribute preferences for Breakfast Noodles

Since you're asked to find attribute preference (or attribute importance) in a bundle of attributes, this is a clear cut case for conjoint analysis application.

You determine through qualitative study that Breakfast noodles product has five key attributes along which people tend to evaluate it: (i)Price, (ii)PackSize, (iii)Brand, (iv)Whether there are special flavors or not and (v) whether it is 'vitamin fortified' or not. You can make the attributes and attribute levels table in Excelf or MEXL analysis thus:

Notice that while the attributes are clear cut, the use of "High", "medium/low" in the attribute levels is imprecise. Who knows how different respondents may view or understand it? Hence, if your product development and competition benchmarking is fairly advanced, then you should ideally put in hard numbers there as far as possible. For instance, see below:

Clearly, there are two vertically differentiated bundles {assuming people will generally tend to prefer a well-known international brand (Nestle) over a well-known Indian one ('Parle') over an unknown local one 'Desi'}

Now prepare a set of product bundles for respondents to evaluate. They shouldn't include the vertically differentiated bundles as far as possible, in order to better force trade-offs in the purchase decision. Say you choose 8 bundles to show:

The hard 'design' part is over, its time to program the Qs you have into a websurvey. The bundle rating questions may look like this in qualtrics:

Ensure the bundles are presented to respondents in randomized order to avoid (or rather, to average out) any order effects.

Like I mentioned in class, metric conjoint is practically obselete now. Firms have moved on to choice based conjoint (or, CBC) in which you would present a bouquet of bundle options in each 'choice task' (e.g. like the one below) and the respondent makes a binary choice - picks one bundle as most preferred in the bouquet. See below for how a CBC choice task might look like when programmed into a web survey:

In the above figure, I couldn't figure out how to get Qualtrics to have both the image and the multiple choice question together in one question, so I did them as separate questions and grouped them together.

Chalo, I hope that helps clarify the conjoint part of what is going on. Metric conjoint interpretation is fairly straightforward and MEXL help is quite good, I hear. Should you require specific assistance in interpreting conjoint results for your projects etc, pls let me know and we can go over it individually.

Sudhir

Sunday, December 16, 2012

Session 7 - Hypothesis Testing R code

Hi all,

The previous blog-post introduced Unstructured text analysis in Session 7. This one is a separate module - to do Hypothesis testing. In this blog-post, I post R code required for classwork examples for two common types of hypothesis tests - tests of differences and tests of association.

Module 1. Hypothesis Testing

  • t-tests for differences
Use the notepad titled 'Nike dataset.txt' for this.
## hypothesis testing
## Nike example

# read-in data
nike = read.table(file.choose(), header=TRUE)
dim(nike); summary(nike); nike[1:5,] # some summaries
attach(nike) # Call indiv columns by name

# Q is: "Is awareness for Nike (X, say) different from 3?” We first delete those rows that have a '0' in them as these were non-responses. Then we proceed with t-tests as usual.

# remove rows with zeros #
x1 = Awareness[(Awareness != 0)]
# test if x1's mean *significantly* differs from mu
t.test(x1, mu=3) # reset 'mu=' value as required.
This is the result I got:
To interpret, first recall the hypothesis. The null said:"mean awareness is no different from 3". However, the p-value of the t-test is well below 0.01. Thus, we reject the null with over 99% confidence and accept the alternative (H1: Mean awareness is significantly different from 3). Happily, pls notice that R states the alternative hypothesis in plain words as part of its output.

The second Q we had was:

# “Q: Does awareness for Nike exceed 4?”
# change 'mu=3' to 'mu=4' now
t.test(x1, mu=4, alternative=c("greater")) # one-sided test
The p-value I get is 0.2627. We can no longer reject the Null even at the 90% confidence level. Thus, we infer that mean awareness of 4.18 odd does *not* significantly exceed 4.

Next Q was: "Does awareness for Nike in females (Xf, say) exceed that in males (Xm)?”

# first make the two vectors
xm = x1[(Gender==1)]
xf = x1[(Gender==2)]
# Alternative Hypothesis says xf '>' xm
t.test(xf, xm, alternative="greater")
In the code above, we specify for 'alternative=' whatever the alternative hypothesis says. In this case it said greater than, so we used "greater". Else we would have used "less".
The p-value is slightly above 0.05. So we should reject the Null at the 95% level. However, it is almost at 95% and the confidence level limits we put are essentially arbitrary only. We can choose accept to the alternative that "xf at 5.18 significantly exceeds xm at 3.73" in such circumstances. It is a judgment call.

  • chi.square-tests for Association
The data for this example is on the same google doc starting cell K5. Code for the first classwork example on Gender versus Internet Usage follows. First, let me state the hypotheses:
  • H0: "There is no systematic association between Gender and Internet Usage." . In other words, distribution of people we see in the 4 cells is a purely random phenomenon.
  • H1: "There is systematic association between Gender and Internet Usage."
# read in data WITHOUT headers
a1=read.table(file.choose()) # 2x2 matrix
a1 # view the data once
chisq.test(a1) # voila. Done!
This is the result I get:
Clearly, with a p-value of 0.1441, we can no longer reject the Null at even the 90% level. So we cannot infer any significant association between Gender and Internet Usage levels. The entire sample we had was 30 people large. Suppose we had a much bigger sample but a similar distribution across cells. Would the inference change? Let's find out.

Suppose we scaled up the previous matrix by 10. We will then have 300 people and a corresponding distribution in the four cells. But now, at this sample size, random variation can no longer explain the huge differences we will see between the different cells and the inference will change.

a1*10 # view the new dataset
chisq.test(a1*10)
The results are as expected. While a 5 person difference can be attributed to random variation in a 30 person dataset, a 50 person variation cannot be so attributed in a 300 person dataset.

Our last example on tests of association uses the Nike dataset. Does Nike Usage vary systematically with Gender? Suppose we had reason to believe so. Then our Null would be: Usage does not vary systematically with Gender, random variation can explain the pattern of variation. Let us test this in R:

# build cross tab
mytable=table(Usage,Gender)
mytable #view crosstab

chisq.test(mytable)
The example above is also meant to showcase the cross-tab capabilities of R using the table() function. R peacefully does 3 way cross-tabs and more as well. Anyway,here's the result: seems we can reject the Null at the 95% level.

Dassit from me for now. See you all in Class

Sudhir

Session 7 R code for Basic Text Analysis

Hi all,

Session 7 has the following modules:

  • Continuing Qualitative MKTR from session 6, we cover Focus Group Discussions (or FGDs) in the first sub-module. A short video 'reading' will be present.
  • The second sub-module in Qualitative MKTR - Analysis of Unstructured text - is covered next. The R code below deals with elementary text mining in R.
  • The third Module deals with building and Testing Hypotheses in MKTR. We'll use R's statistical abilities to run quick tests on two major classes of Hypotheses.
So without any further ado, here goes sub-module 1.

1. Elementary Text Mining in R

First install the required libraries as shown below and read in the data (in 'Q25.txt' in the Session 7 folder on LMS).

library(tm)
library(Snowball)
library(wordcloud)
library(sentiment)

###
### --- code for basic text mining ---
###

# first, read-in data from 'Q25.txt'
x = readLines(file.choose())
x1 = Corpus(VectorSource(x))

Now, run the following code to process the unstructured text and obtain from it a Term-frequency document matrix. The output obtained is show below.

# standardize the text - remove blanks, uppercase #
# punctuation, English connectors etc. #
x1 = tm_map(x1, stripWhitespace)
x1 = tm_map(x1, tolower)
x1 = tm_map(x1, removePunctuation)
x1 = tm_map(x1, removeNumbers)
# removing it from stopwords

myStopwords <- c(stopwords('english'), "ice", "cream")

x1 = tm_map(x1, removeWords, myStopwords)
x1 = tm_map(x1, stemDocument)

# make the doc-term matrix #
x1mat = DocumentTermMatrix(x1)
# --- sort the TermDoc matrix --- #
# removes sparse entries
mydata = removeSparseTerms(x1mat,0.998)
dim(mydata.df <- as.data.frame(inspect(mydata))); mydata.df[1:10,]
mydata.df1 = mydata.df[ ,order(-colSums(mydata.df))]
dim(mydata.df1)
mydata.df1[1:10, 1:10] # view 10 rows & 10 cols

# view frequencies of the top few terms
colSums(mydata.df1) # term name & freq listed
'Stopwords' in the above code are words we do not want analyzed. The list of stopwords can be arbitrarily long.
Notice the Document Corpus x1, the term-frequency document matrix (first 10 rows & cols only). The image below gives the tital frequency of occurrence of each term in the entire corpus.

2. Making Wordclouds

Wordclouds are useful ways to visualize relative frequencies of words in the corpus (size is proportional to frequency). The colors of the words are random, though.

# make wordcloud to visualize word frequencies
wordcloud(colnames(mydata.df1), colSums(mydata.df1)*10, scale=c(4, 0.5), colors=1:10)
So, what can we say from a wordcloud? The above one seems to suggest that 'chocolate' is the flavor most on the minds of people, followed by vanilla. The relative importance of other words can be assessed similarly. But precious little else we find.

It would be more useful to actually see what words are used together most often in a document. For example, 'butter' could mean anything - from 'butter scotch' to 'butter pecan' to 'peanut butter'. To obtain which pairs of words occur most commonly together, I tweaked the R code a bit and the below resulted in a 'collocation dendogram':

# --- making dendograms to #
# visualize word-collocations --- #
min1 = min(mydata$ncol,25) # of top 25 terms
test = matrix(0,min1,min1)
test1 = test
for(i1 in 1:(min1-1)){ for(i2 in i1:min1){
test = mydata.df1[ ,i1]*mydata.df1[ ,i2]
test1[i1,i2] = sum(test); test1[i2, i1] = test1[i1, i2] }}

# make dissimilarity matrix out of the freq one
test2 = (max(test1)+1) - test1
rownames(test2) <- colnames(mydata.df1)[1:min1]
# now plot collocation dendogram
d <- dist(test2, method = "euclidean") # distance matrix
fit <- hclust(d, method="ward")
plot(fit) # display dendogram

Click on the image for larger size. Can you look at the image and say what flavors seem to go best with 'coffee'? With 'Vanilla'? With 'Strawberry'?

It is important not to lose sight of the weaknesses and problems that dog current text mining capabilities.

  • For instance, someone writing 'not good' versus 'good' will both have 'good' picked up by the text miner. So 'meaning' per se is lost on the miner.
  • The text miner is notoriously poor at picking up wit, sarcasm, exaggerations and the like. Again, the 'meaning' part is lost on the miner.
  • Typos etc can also play havoc. Synonyms can cause trouble too, in some contexts. E.g., Some say 'complex', others may say 'complicated' to refer to the same attribute. These will appear as separate terms in the analysis.
  • So text mining is more useful as an exploratory tool, to point out qualitatively what topics and words appear to weigh most on respondent minds. It helps downstream analysis by providing inputs for hypothesis building, for more in-depth investigation later and so on.
  • It *is* important that the more interesting comments, opinions etc be manually checked before arriving at any conclusions.
That concludes our small, elementary text-mining foray using R. R's capabilities in the text-mining arena are quite advanced, extensible and evolving. We ventured in to get a simple example done. This example however scales up easily to larger and more complex datasets of unstructured text. Onwards now to our second sub-module wherein we combine two important elements within text analysis for MKTR - social media chatter and sentiment mining.

3. Sentiment Mining of twitter data

the twitteR package in R showcases well the social media capabilities of R. It allows you to search for particular keywords in particular geographic areas (cities, for example). Thus, you could compare the response to the movie 'Talaash' in Delhi versus say, in Hyderabad.

For our class work exercise, I am using twitter data on #skyfall in the weekend following its release, from London. Read the data in. Run the R code for text mining as we had done above. After that only, implement basic sentiment analysis on the tweets as follows:

#######################################
### --- sentiment mining block ---- ###
#######################################

# After doing text analysis, run this
### --- sentiment analysis --- ###
# read-in positive-words.txt
pos=scan(file.choose(), what="character", comment.char=";")
# read-in negative-words.txt
neg=scan(file.choose(), what="character", comment.char=";")
# including our own positive words to the existing list
pos.words=c(pos,"wow", "kudos", "hurray")
neg.words = c(neg)

# match() returns the position of the matched term or NA
pos.matches = match(colnames(mydata.df1), pos.words)
pos.matches = !is.na(pos.matches)
b1 = colSums(mydata.df1)[pos.matches]
b1 = as.data.frame(b1)
colnames(b1) = c("freq")
wordcloud(rownames(b1), b1[,1], scale=c(5, 1), colors=1:10)
neg.matches = match(colnames(mydata.df1), neg.words)
neg.matches = !is.na(neg.matches)
b2 = colSums(mydata.df1)[neg.matches]
b2 = as.data.frame(b2)
colnames(b2) = c("freq")
wordcloud(rownames(b2), b2[,1], scale=c(5, 1), colors=1:10)
2 Wordclouds will appear - one only for words that have positive sentiment or emotional content. The other for negative ones.

4. Determining Sentiment Polarities

Can we measure 'how much' emotional content or intensity etc a tweet or comment may contain? Well, at least at the ordinal level, perhaps. The package 'sentiment' offers a way to measure sentiment polarities in terms of log-likelihoods of comments being of one polarity versus another. This can be a useful first step in basic sentiment analysis.

#######################################
### --- sentiment mining block II ---- ###
#######################################

### --- inspect only those tweets #
# which got a clear sentiment orientation ---
a1=classify_emotion(x1)
a2=x[(!is.na(a1[,7]))] # 447 of the 1566 tweets had clear polarity
#a3=PlainTextDocument(a2)
a2[1:10]
# what is the polarity of each tweet? #
# that is, what's the ratio of pos to neg content? #
b1=classify_polarity(x1)
dim(b1)
b1[1:5,] # view polarities table

5. Determining Sentiment Dimensions

Can we do more than just do sentiment polarities? Can we get more specific about which primary emotion dominates a particular tweet or opinion or comment? Turns out the sentiment package in R does provide one way out. How well established or usable this is in a given context is caveat emptor.

a1a=data.matrix(as.numeric(a1))
a1b=matrix(a1a,nrow(a1),ncol(a1))
# build sentiment type-score matrix
a1b[1:4,] # view few rows

# recover and remove the mode values
mode1 <- function(x){names(sort(-table(x)))[1]}
for (i1 in 1:6){ # for the 6 primary emotion dimensions
mode11=as.numeric(mode1(a1b[,i1]))
a1b[,i1] = a1b[,i1]-mode11}

summary(a1b)
a1c = a1b[,1:6]
colnames(a1c) <- c("Anger", "Disgust", "fear", "joy", "sadness", "surprise")
a1c[1:10,]
## -- see top 20 tweets in "Joy" (for example) ---
a1c=as.data.frame(a1c);attach(a1c)
test = x[(joy != 0)]; test[1:10]
# for the top few tweets in "Anger" ---
test = x[(Anger != 0)]; test[1:10]
test = x[(sadness != 0)]; test[1:10]
test = x[(Disgust != 0)]; test[1:10]
test = x[(fear != 0)]; test[1:10]
That does it for text and sentiment analysis on R. Again, these were just exploratory forays. The serious manager can choose to invest in R's capabilities for delivering these analytical capabilities quickly and economically. Your exposure to this area now enables you to take that call as a manager tomorrow.

This is it for now. Will putup the Hypothesis testing related R code in a separate blog-post.

Sudhir