Marketing Yogi: Session 5 Homework

Friday, September 19, 2014

Session 5 Homework

Update:

Sorry folks, typed in the wrong URL by mistake. Those looking for sessions 8 and 9 pre-reads, pls go here.

The following are pre-reads for session 7:

1. AI meets the C-Suite (McKinsey Quarterly)

and 2. Track Customer Attitudes to predict their behaviors (HBR)

The course-pack reading for session 7 is optional and these two are mandatory.

Sudhir

-----------------------------

Hi all,

You might want to lookup the list of shiny apps available listed in the session 5 updates blog post here, before we start.

The homework has three parts. Only two of the three need be done and submitted.

Homework part 1: (group submission, mandatory)

Choose any non-obscure product or service on Flipkart or Amazon (or any other review aggregation source).

Your R.O.s are: (1) Find the top few things people like about the product.

(2) Find the top few things people dislike about the product.

(3) Suggest a (re-)positioning strategy for the product based on the above.

Pull 100+ reviews of the product.

Note: A Flipkart shinyapp is available already. Just follow instructions on the first page of the shinyapp.

We're working on an amazon shinyapp as well. watch this space for updates.

Update: Turns out Amazon pages are now dynamic. They were regular pages till last year. So no shinyapp happening on it.

Text analyze the corpus for insights.

Not everything we can do is up on shiny. Would help massively if at least one member per group runs the classwork R code successfully on their machines.

Homework part 2: (Individual submission - option 1)

Use tm.plugin.webmining to pull data from any of the following news aggregators. Pick any product/ firm/ brand/ celebrity that has been in the news lately.

Pull the last 100+ news articles wherein this entity was mentioned in the article title.

Recall the classroom example wherein we did this for Zara:

install.packages("tm") # if using for the first time

install.packages("tm.plugin.webmining")

library(tm)
library(tm.plugin.webmining)

# Note: Run below on base R, not RStudio

zara <- WebCorpus(GoogleNewsSource("Zara"))

x1 = zara # save the corpus in a local file

x1 = unlist(lapply(x1, content)) # strip relevant content from x1

x1 = gsub("\n", "", x1) # remove newline chars

x1[1:5] # view content

write.table(x1, file.choose(), row.names=F, col.names=F) # save file as 'zara_news.txt'

Replace 'zara' above with whatever entity you chose.

Alternately, try running this shiny app for googlenews pulls. Its not very stable but will do for now.

Text-analyze the corpus for sentiment.

Note: Do you see how the corpus thus obtained can potentially help you mine, measure and score some notion of "PR buzz" for the entity?

Your task: ID the two most positive and two most negative articles.

In a PPT slide or two, write what you found about the reasons for positive and negative sentiment.

Update: Pls insert the following lines of code after you run the older code for sentiment analysis.

This is to obtain the most positive and negative documents.

###############

head(pol$all[(order(pol$all[,3], decreasing=T)),]) #– Top positive polarity document

head(pol$all[(order(pol$all[,3], decreasing=F)),]) #- Top negative polarity document

##################

Homework part 2: (Individual submission - option 2)

Alternately, instead of HW part 2 above, you could do the following.

Take any long (as in 10+ pages) soft copy article that you know and have read.

Use the textsplit shiny app to split it into uniform length parts (of say 25-50 words each).

Now, text anaylze the split document for topics using the shinyapp for topic mining.

In a PPT, paste the wordclouds for each topic and write your interpretation for what that topic means (in a few descriptive words, is all).

Deliverables and Deadlines:

The deadline for this session's HWs is a week from now. Next week Friday (26-sept) midnight.

Drop boxes will be up for session 5 HW part 1 and HW part 2 separately.

For both homework parts, pls submit a zipped folder containing (a) the text dataset you used, and (b) the PPT you made.

Pls remember to write your (group) name and PGID on the title slide. Name the PPT as name_HWnumber.pptx

Added later: The PPT should be <10 slides in length. Feel free to add more slides in an annexure, if required.

The HWs are all HCC level 0. Feel free to take any help from anybody as required.

Any queries etc, contact me.

Ciao.

Sudhir

17 comments:

AnonymousSeptember 21, 2014 at 2:49 PM
Dear Sir,

Could you kindly explain how the document frequency is computed? As per my understanding, it is number of times a word occurs per 100 words in the document. Kindly correct me if I am wrong.

Regards,
Nithya
ReplyDelete
Replies
Sudhir VoletiSeptember 21, 2014 at 2:53 PM
Hi Nithya,

Are you referring to the TFIDF weighing scheme? Well, in the classroom example, my corpus had 100 docs, hence I divided term freq TF by 100. Else, we divide by the no. of docs in the corpus.

In any case, there exist many schema to compute TFIDFs and we can always come up with our own, besides.

So, for now, don't worry about it and use R's internal tfidf scheme. Hope that helps.

Sudhir
ReplyDelete
Replies
AnonymousSeptember 23, 2014 at 3:59 PM
Hello Prof,

While executing the command
zara <- WebCorpus(GoogleNewsSource("Zara"))

I get the following error:
Error in function (type, msg, asError = TRUE) : couldn't connect to host

How can I fix this?
ReplyDelete
Replies
AnonymousSeptember 24, 2014 at 11:44 PM
Hello Prof. In homework part 2 option 1, you wrote "Your task: ID the two most positive and two most negative articles." How do we ID articles? Did you mean topics?
ReplyDelete
Replies
AnonymousSeptember 25, 2014 at 7:04 PM
Hi Professor,

I tried running the R code and well the shiny app for google news pull. Both seem to be timing out while trying to establish the connection.
Please help!

Error as seen in R:

Error in function (type, msg, asError = TRUE) : connect() timed out!

Thanks
Sonam
ReplyDelete
Replies
AditiSeptember 25, 2014 at 8:07 PM
Sir, in the shiny app for text analysis, Im getting the following error: NA indices not allowed

Request your help
ReplyDelete
Replies
AnonymousSeptember 25, 2014 at 8:59 PM
Hi Professor,

While topic mining should we go with the number of topics suggested by the Log Bayes factor or can we input the number of topics. I see that the Narendra Modi I Day speech had 6 topics.

Regards,
Rohit
ReplyDelete
Replies
AnonymousSeptember 26, 2014 at 1:43 PM
Hi Professor,

The shiny app for basic text analysis appears to be down..been trying to access it for quite some time..Please help!

Thanks
Sonam
ReplyDelete
Replies
AnonymousSeptember 27, 2014 at 12:33 PM
This comment has been removed by the author.
ReplyDelete
Replies

Constructive feedback appreciated. Please try to be civil, as far as feasible. Thanks.

Marketing Yogi

Friday, September 19, 2014

Session 5 Homework

17 comments:

Blog Archive