Guide

Methodology

Notes on Concept, how indicators are calculated. and some general advice of how to interprete indicators.

Indicator Construction

The algorithm for calculating sentiment values consists of the following steps

  1. Initiation: If an entry in the database for the request was conducted less than 60 seconds ago (RENEW_DATAPOINTS_AFTER_MS), no analysis will be conducted. Based on the User-Agent string, it is determined whether the request is from a bot or a human visitor (bot).
  2. Subreddit selection: Creating a list of subreddits that have a certain keyword (subreddit_filter) in their title or description. The list is sorted (subreddit_sort) according to one of the Reddit's sorting criteria. The first 10 items (SUBREDDIT_SEARCH_LIMIT) on the list are selected for the next step.
  3. Posts selection: For each of the selected subreddits, the first 25 posts (POST_SEARCH_LIMIT) are filtered for a specific keyword (postings_filter). The keyword can appear in the title or description of the post. The list is sorted (postings_sort) according to one of Reddit's categories.
  4. Text selection: Depending on the type of query (whether comments are included, excluded, or merged with the post) (comments), one post and its comments are excluded or merged into a single text string. Text fragments containing certain phrases defined in the blacklist e.g. "I am a bot" are excluded (ANALYZE_EXCLUDE_TERMS). Sticky posts are excluded as well.
  5. Weighting: In the same step as step 4, after every post analyzed the starting weight is divided by (ANALYZE_B_WEIGHT_DIVIDER) and saved in a secondary temporary array. Text fragments merged from the first analyzed post carries more weight compared to the second, and so forth.
  6. Data points: Sentiment analysis is conducted on the merged text strings. The resulting value is stored in a temporary array, producing a data point for each iteration of this step. The result is an array of of sentiment values, where the first post of the first subreddit is on top and the last post of the last subreddit on the bottom. Which subreddit and posting is first or last, is a function of the sorting criteria.
  7. Statistical analysis: Based on the collected datapoints of the temporary array, the following values are determined for Sentiment A:
    For Sentiment B, weighted metrics are calculated. For this, collected datapoints in the second temporary array are included.
    The number of analyzed posts are counted (dp). One data point is one analyzed Reddit post (depending on configuration with comments), only posts are counted if they were analyzed - filtered ones don't count. In this context, this number reprensents the sample size of the measurement.


Variables
I/O NameTypeValueDescription
Input RENEW_DATAPOINTS_AFTER_MSinteger60000milliseconds
SUBREDDIT_SEARCH_LIMITinteger10items
POST_SEARCH_LIMITinteger25items
ANALYZE_B_WEIGHT_DIVIDERinteger1.618degree of which weight loss of Sentiment B
ANALYZE_EXCLUDE_TERMSarrray['I am a bot']items
subreddit_filterstringuser definedSubreddits search filter
subreddit_sortenumrelevance, popular, activitySubreddits sort criteria
postings_sortenumnew, hot, rising, top, controversialposting sort criteria
postings_filterstringuser definedpostings search filter
commentsenumcommentsonly, withcomments, nocommentsinclude / exclude comments
Output timetimestampUNIX epochmilliseconds
sa_vfloatmeanarithmetic mean of Sentiment A scores
sa_dfloatstandard deviationstandard deviation of Sentiment A scores
sa_mfloatmedianmedian of Sentiment A scores
sb_vfloatmeanarithmetic mean of Sentiment B scores
sb_dfloatstandard deviationstandard deviation of Sentiment B scores
sb_mfloatmedianmedian of Sentiment B scores
dpintegersumsum of posts from where texts are analyzed
botbooleantrue, falseuser type retrieved from request