Friday, September 19, 2014

Session 5 Updates

Hi all,

Session 5 text analytics, was R heavy and generally heavy.

'course we barely scratched the surface, but even that was quite a bit for a 110 minute session.

Session Summary:

To recap, the main things covered were:

  • Some understanding of the bag of words formulation, of elementary pre-processing and of Document-Term Matrices and so on.
  • Basic descriptive text analysis and wordclouds (level '0'), given a clean, well-structured text dataset in excel format
  • Grouping structure in top terms - using qgraph() to plot and see which terms co-occur in documents more often than at random. (Level '1')
  • Grouping structure in documents which typically map one-to-one to respondents. IOW, respondent segmentation possibilities using simple k-means. (Level '2')
  • Basic sentiment analysis using qdap. Finding sentiment laden words, understanding document valence, measuring sentiment polarities. (Level '3')
  • Topic mining text data. Use of Latent Dirichlet Allocation or LDA models. For both corpora and single documents split into parts. (Level '4')
  • Web extraction of text data from structured pages and using tm.plugin.webmining from select sources.
Now, since MKTR is a hands-on course, its time to apply session 5 learnings by doing-it-yourself. There'll be two avenues for this:

(a) replicating classwork examples at home. The final PPT slide deck and the classwork R code should, by now, be up on LMS.

(b) doing the homeworks diligently either individually or in groups. The homeworks are coming up in the next blog post.

About R and Shiny:

Shiny is a wrapper for R code that can be run off the cloud on any browser from anywhere in the world without needing local machine R installation etc.

We're trying to shiny-fy as much of the analysis we're doing in class. The list of shiny Apps till session 5:

Shinyapp for factor analysis

shinyapp for cluster analysis (all 3 types - Hclust, mclust and kmeans)

shinyapp for basic i.e. descriptive text analysis and wordcloud

shinyapp to split a single long article into multiple parts of uniform length

shinyapp for flipkart data extraction

shinyapp for foundational topic mining using textir and maptpx packages

shinyapp for JSMs

The landing page of the shinyapp will have basic instruction and detail for running the shinyapp.

Should things still be unclear, pls email aashish_pandey@isb.edu with a copy to me and let us know.

For the record, I encourage you to use them as backup for your homeworks etc, should regular R code run into glitches.

About the R code now on LMS:

We double-checked the R code we've now putup on LMS, one reason for the delay in its release.

Sure, during the class itself, there were missteps and unexpected glitches here and there, what with trying to run so much disparate code on an unfamiliar machine (i.e. the classroom laptop).

Chances are you'll run into a few glitches here and there yourself while trying to run the codes, perhaps.

If so, first ask around among your peers if someone has faced and solved such an issue.

If the issue remains unresolved, drop a comment on the relevant session blogpost. Then contact my RA Mr Aashish Pandey aashish_pandey@isb.edu

Finally, if the issue is still unresolved, drop by my office with your machine.

P.S. Sorry about the delay in releasing this blog post. Should've come right after Tuesday, ideally.

Dassit for now. Ciao.

Sudhir

No comments:

Post a Comment

Constructive feedback appreciated. Please try to be civil, as far as feasible. Thanks.