News Sentiment
Two types of news datasets have been developed, one is ticker-matched, and the next is theme-matched.
Dataset contains 2000+ tickers, available from 2017-01-01 onwards.
Tutorials
are the best documentation — News Sentiment Analaysis Tutorial
Input Datasets
News Scrapers, Public Event Data
Models Used
Fuzzy Matching
Model Outputs
Sentiment Scores
Description
This dataset provides comprehensive news sentiment analysis, offering ticker-matched and theme-matched data on various aspects of news coverage.
It includes metrics on sentiment, tone, polarity, and article count, enabling investors and analysts to gauge public perception and potential market impacts of news.
Data Access
Sentiment Data - All Data
import sovai as sov
sov.data("news/sentiment", full_histor=True)
Sentiment Data - Latest Data
import sovai as sov
sov.data("news/sentiment", full_history=True)
Sentiment Data -Filtered Dataset
import sovai as sov
df_news = sov.data("news/sentiment", start_date="2017-03-30", tickers=["MSFT","TSLA"])
As you have done for sentiment
above you can do for news tone
, polarity
, activeness
etc.
All Variations
import sovai as sov
# Sentiment Dataset
df_sentiment = sov.data("news/sentiment")
# Provides sentiment scores for news articles, helping gauge the overall emotional tone of news coverage.
# Tone Dataset
df_tone = sov.data("news/tone")
# Offers insights into the overall tone of news articles, differentiating between neutral, positive, or negative coverage.
# Positive Sentiment Dataset
df_positive = sov.data("news/positive")
# Focuses specifically on positive sentiments expressed in news articles.
# Negative Sentiment Dataset
df_negative = sov.data("news/negative")
# Provides information on negative sentiments in news articles, valuable for risk assessment.
# Polarity Dataset
df_polarity = sov.data("news/polarity")
# Measures how polarizing news coverage is, indicating how divisive or controversial certain topics or entities are.
# Match Quality Dataset
df_match = sov.data("news/match_quality")
# Assesses the quality of matches between news articles and specific entities or topics.
# Pronouns Dataset
df_pronouns = sov.data("news/pronouns")
# Analyzes the use of pronouns in news articles.
# Activeness Dataset
df_activeness = sov.data("news/activeness")
# Measures the level of activity or dynamism in news coverage.
# Associated People Dataset
df_associated_people = sov.data("news/associated_people")
# Tracks individuals mentioned in association with specific entities or topics.
# Article Count Dataset
df_article_count = sov.data("news/article_count")
# Provides data on the volume of articles related to specific topics or entities.
# Associated Companies Dataset
df_associated_companies = sov.data("news/associated_companies")
# Tracks companies mentioned in association with specific entities or topics in news articles.
Themed Sentiment
df_sentiment_score = sov.data("news/sentiment_score")
Measures emotional tone of news articles. Positive scores: favorable news; Negative scores: unfavorable news.
import sovai as sov
df_sentiment_score = sov.data("news/sentiment_score")

df_polarity_score = sov.data("news/polarity_score")
Gauges opinion intensity in news. Higher scores: stronger opinions; Lower scores: more neutral reporting.
import sovai as sov
df_polarity_score = sov.data("news/polarity_score")
df_topic = sov.data("news/topic_probability")
Indicates topic prevalence in news. Higher values: more frequently discussed topics.
All use various statistical measures (mean, median, etc.) across financial/economic topics over time.
import sovai as sov
df_topic = sov.data("news/topic_probability")
Vizualisations
Strategy
import sovai as sov
sov.plot("news", chart_type="strategy", ticker='NVDA')

Econometrics
import sovai as sov
sov.report("news", report_type="econometric")

Analysis
import sovai as sov
sov.plot("news", chart_type="analysis")

Data Dictionary
match_quality
Quality score of the match between the article and the entity, indicating the relevance and accuracy of the match.
float
99.75
within_article
Number of mentions of the entity within the article, indicating the focus on the entity in the article's content.
int
2
relevance
The average salience of the entity across the articles, indicating the importance or prominence of the entity.
float
0.022049
magnitude
A measure of the intensity or strength of the sentiment expressed in the article.
float
18.203125
sentiment
A score representing the overall sentiment (positive or negative) of the article.
float
0.054504
article_count
The total number of articles associated with the entity, indicating the level of media attention or coverage.
int
1666
associated_people
Count of unique people mentioned in the context of the entity, reflecting its association with various individuals.
int
143
associated_companies
Count of unique companies mentioned in relation to the entity, indicating its business connections.
int
287
tone
The overall tone of the article, derived from a textual analysis of its content.
float
0.237061
positive
The score quantifying the positive sentiments expressed in the article.
float
2.828125
negative
The score quantifying the negative sentiments expressed in the article.
float
2.591797
polarity
The degree of polarity in the sentiment, indicating the extent of opinionated content.
float
5.421875
activeness
A measure of the dynamism in the language used, possibly indicating the urgency of the article.
float
22.031250
pronouns
The count of pronouns used in the article, indicative of the narrative style or subject focus.
float
0.995117
word_count
The total number of words in the article, giving an indication of its length or detail.
int
1084
Use Case
This dataset provides a comprehensive analysis of various entities (such as companies and individuals) based on their media coverage and associated articles. It's designed to assist investors in understanding the market sentiment, media focus, and the overall perception of entities in which they might be interested. The data is extracted and processed from a wide range of articles, ensuring a broad and in-depth view of each entity.
This dataset is an invaluable resource for investors seeking to gauge public perception, media sentiment, and the prominence of entities in the news. It can be used for:
Sentiment analysis to understand the market mood.
Identifying trends in media coverage related to specific entities.
Assessing the impact of news on stock performance.
Conducting peer comparison based on media presence and sentiment.
Last updated
Was this helpful?