Guide
Methodology
Notes on Concept, how indicators are calculated. and some general advice of how to interprete indicators.
Notes on interpreting the data
When using indicator data, it's a good idea to keep the following points in mind:
- Indicators that include comments tend to have lower values than those that exclude comments. This is due to the more direct and emotional statements made by commenters, while online publications use more moderate language. Therefore, it can be misleading to compare the values of indicators where comments have been excluded and those where they have been included.
- When asking how representative the indicator values are for the mood of a particular discussion, it's a good idea to consider the number of data points (posts analyzed), the median, and the standard deviation of the data points. A high number of data points combined with a low standard deviation and a sentiment value close to the median indicates a clear trend. Under these circumstances, it seems likely that the indicators are giving a representative picture of sentiment around a keyword and selection of subreddits.
- What is a large number of data points? The highest number of data points possible with the current settings is 250, or the maximum number of subreddits multiplied by the maximum number of posts per subreddit. For example, if I query subreddits for the keyword "food" and posts for the keyword "banana", a data point value of 250 would mean that 10 subreddits related to "food" were found, and each of those subreddits has the keyword "banana" in its top 25 posts in either its title or description. Considering this, a good rule of thumb seems to be that 20 data points is a relatively high number and should produce reasonably representative indicators.
- It should be noted that the analysis and the resulting indicators don't give a direct indication of the sentiment towards a certain topic, group, person, product or keyword in general. The assumption is that if these texts are filtered and sorted according to certain criteria, the sentiment of these texts will to some extent reflect the sentiment towards these keywords. This assumption has clear limitations, mostly in the context of single data points. - If I filter subreddits for the keyword "music" and posts for the keyword "metallica," and I assume that the indicator is pointing in a negative direction, that doesn't necessarily mean that people hate Metallica. Maybe a concert was canceled and people didn't get a refund for their tickets. In that case, the negative sentiment would be directed at the promoter rather than the band itself. But if the time series data for the same indicator over a long period of time shows a negative direction, then it seems pretty clear that the sentiment is negative toward the band itself.