Guide

Methodology

Notes on Concept, how indicators are calculated. and some general advice of how to interprete indicators.

Indicator Construction

The algorithm for calculating sentiment values consists of the following steps

Initiation: If an entry in the database for the request was conducted less than 60 seconds ago (RENEW_DATAPOINTS_AFTER_MS), no analysis will be conducted. Based on the User-Agent string, it is determined whether the request is from a bot or a human visitor (bot).
Subreddit selection: Creating a list of subreddits that have a certain keyword (subreddit_filter) in their title or description. The list is sorted (subreddit_sort) according to one of the Reddit's sorting criteria. The first 10 items (SUBREDDIT_SEARCH_LIMIT) on the list are selected for the next step.
Posts selection: For each of the selected subreddits, the first 25 posts (POST_SEARCH_LIMIT) are filtered for a specific keyword (postings_filter). The keyword can appear in the title or description of the post. The list is sorted (postings_sort) according to one of Reddit's categories.
Text selection: Depending on the type of query (whether comments are included, excluded, or merged with the post) (comments), one post and its comments are excluded or merged into a single text string. Text fragments containing certain phrases defined in the blacklist e.g. "I am a bot" are excluded (ANALYZE_EXCLUDE_TERMS). Sticky posts are excluded as well.
Weighting: In the same step as step 4, after every post analyzed the starting weight is divided by (ANALYZE_B_WEIGHT_DIVIDER) and saved in a secondary temporary array. Text fragments merged from the first analyzed post carries more weight compared to the second, and so forth.
Data points: Sentiment analysis is conducted on the merged text strings. The resulting value is stored in a temporary array, producing a data point for each iteration of this step. The result is an array of of sentiment values, where the first post of the first subreddit is on top and the last post of the last subreddit on the bottom. Which subreddit and posting is first or last, is a function of the sorting criteria.
Statistical analysis: Based on the collected datapoints of the temporary array, the following values are determined for Sentiment A:
- mean (sa_v)
- median (sa_m)
- standard deviation (sa_d)
For Sentiment B, weighted metrics are calculated. For this, collected datapoints in the second temporary array are included.
- weighted mean (sb_v)
- weighted median (sb_m)
- weighted standard deviation (sb_d)
The number of analyzed posts are counted (dp). One data point is one analyzed Reddit post (depending on configuration with comments), only posts are counted if they were analyzed - filtered ones don't count. In this context, this number reprensents the sample size of the measurement.

Variables
I/O	Name	Type	Value	Description
Input	`RENEW_DATAPOINTS_AFTER_MS`	integer	60000	milliseconds
	`SUBREDDIT_SEARCH_LIMIT`	integer	10	items
	`POST_SEARCH_LIMIT`	integer	25	items
	`ANALYZE_B_WEIGHT_DIVIDER`	integer	1.618	degree of which weight loss of Sentiment B
	`ANALYZE_EXCLUDE_TERMS`	arrray	['I am a bot']	items
	`subreddit_filter`	string	user defined	Subreddits search filter
	`subreddit_sort`	enum	`relevance`, `popular`, `activity`	Subreddits sort criteria
	`postings_sort`	enum	`new`, `hot`, `rising`, `top`, `controversial`	posting sort criteria
	`postings_filter`	string	user defined	postings search filter
	`comments`	enum	`commentsonly`, `withcomments`, `nocomments`	include / exclude comments
Output	`time`	timestamp	UNIX epoch	milliseconds
	`sa_v`	float	mean	arithmetic mean of Sentiment A scores
	`sa_d`	float	standard deviation	standard deviation of Sentiment A scores
	`sa_m`	float	median	median of Sentiment A scores
	`sb_v`	float	mean	arithmetic mean of Sentiment B scores
	`sb_d`	float	standard deviation	standard deviation of Sentiment B scores
	`sb_m`	float	median	median of Sentiment B scores
	`dp`	integer	sum	sum of posts from where texts are analyzed
	`bot`	boolean	`true`, `false`	user type retrieved from request