Dear Professor,
We have identified three distinct factors in the car data and have divided the data into 4 clusters.
In order to characterize the clusters, analyzing the data manually and eyeballing for differences between clusters is proving to be difficult, even with pivot tables.
We were wondering if there are any statistical procedures that will help us identify the differences between the four clusters.
N
My response:
Hi N,
The point of cluster analysis is segmentation. The segments should have some coherence, some identity.
A natural way to characterize clusters is thus, by finding the means of the clustering variables for each segment and seeing how different they are from that of other segments. IMO, this is what the book calls ‘cluster centroids’.
Also, the means of variables that were not used to ID the clusters – say, demographics, can also be used to profile that segment. Find the means of the segments along these variables also and see if these could be used as demographic markers to reach that segment.
As for statistical tests, a 2-sample t-test can be used to test if clusters (say) 1 and 2 differ on emphasis on Price (say). The mean, stdev and number of respondents are available for both segments, after all.
Another option could be mapping the centroids of the 2 most important variables across all clusters on a 2 dimensional map to see which clusters diverge most on those variables.
Hey, just throwing ideas out there. You don't have to spend a lot of time on this. Just try to zero in on which few variables are most actionable from the client POV - how to target the segment, how to reach the segment, segment size etc and which also can adequately differ one cluster to the next. And you are good to go.
Hope that helped.
Sudhir
No comments:
Post a Comment
Constructive feedback appreciated. Please try to be civil, as far as feasible. Thanks.