News Sentiment

Two types of news datasets have been developed, one is ticker-matched, and the next is theme-matched.

Data is updated quarterly as data arrives after market close US-EST time.

Tutorials are the best documentation — News Sentiment Analaysis Tutorial

Input Datasets

News Scrapers, Public Event Data

Models Used

Fuzzy Matching

Model Outputs

Sentiment Scores

Description

This dataset provides comprehensive news sentiment analysis, offering ticker-matched and theme-matched data on various aspects of news coverage.

It includes metrics on sentiment, tone, polarity, and article count, enabling investors and analysts to gauge public perception and potential market impacts of news.

Data Access

Institutional Trading Data

This data is around 1GB if you download the entire dataset.

from sovai import sov
df_news = sov.data("news/daily")

Filtered Dataset

from sovai import sov
df_news = sov.data("news/daily", start_date="2017-03-30", tickers=["MSFT","TSLA"])

Ticker Level Full-Files

from sovai import sov

# Sentiment Dataset
df_sentiment = sov.data("news/sentiment")
# Provides sentiment scores for news articles, helping gauge the overall emotional tone of news coverage.

# Tone Dataset
df_tone = sov.data("news/tone")
# Offers insights into the overall tone of news articles, differentiating between neutral, positive, or negative coverage.

# Positive Sentiment Dataset
df_positive = sov.data("news/positive")
# Focuses specifically on positive sentiments expressed in news articles.

# Negative Sentiment Dataset
df_negative = sov.data("news/negative")
# Provides information on negative sentiments in news articles, valuable for risk assessment.

# Polarity Dataset
df_polarity = sov.data("news/polarity")
# Measures how polarizing news coverage is, indicating how divisive or controversial certain topics or entities are.

# Match Quality Dataset
df_match = sov.data("news/match_quality")
# Assesses the quality of matches between news articles and specific entities or topics.

# Pronouns Dataset
df_pronouns = sov.data("news/pronouns")
# Analyzes the use of pronouns in news articles.

# Activeness Dataset
df_activeness = sov.data("news/activeness")
# Measures the level of activity or dynamism in news coverage.

# Associated People Dataset
df_associated_people = sov.data("news/associated_people")
# Tracks individuals mentioned in association with specific entities or topics.

# Article Count Dataset
df_article_count = sov.data("news/article_count")
# Provides data on the volume of articles related to specific topics or entities.

# Associated Companies Dataset
df_associated_companies = sov.data("news/associated_companies")
# Tracks companies mentioned in association with specific entities or topics in news articles.

Themed Sentiment

df_sentiment_score = sov.data("news/sentiment_score") Measures emotional tone of news articles. Positive scores: favorable news; Negative scores: unfavorable news.

from sovai import sov
df_sentiment_score = sov.data("news/sentiment_score")

df_polarity_score = sov.data("news/polarity_score") Gauges opinion intensity in news. Higher scores: stronger opinions; Lower scores: more neutral reporting.

from sovai import sov
df_polarity_score = sov.data("news/polarity_score")

df_topic = sov.data("news/topic_probability") Indicates topic prevalence in news. Higher values: more frequently discussed topics.

All use various statistical measures (mean, median, etc.) across financial/economic topics over time.

from sovai import sov
df_topic = sov.data("news/topic_probability")

Vizualisations

Strategy

from sovai import sov
sov.plot("news", chart_type="strategy", ticker='NVDA')

Econometrics

from sovai import sov
sov.report("news", report_type="econometric")

Analysis

from sovai import sov

sov.plot("news", chart_type="analysis")

Data Dictionary

Feature NameDescriptionTypeExample

match_quality

Quality score of the match between the article and the entity, indicating the relevance and accuracy of the match.

float

99.75

within_article

Number of mentions of the entity within the article, indicating the focus on the entity in the article's content.

int

2

relevance

The average salience of the entity across the articles, indicating the importance or prominence of the entity.

float

0.022049

magnitude

A measure of the intensity or strength of the sentiment expressed in the article.

float

18.203125

sentiment

A score representing the overall sentiment (positive or negative) of the article.

float

0.054504

article_count

The total number of articles associated with the entity, indicating the level of media attention or coverage.

int

1666

associated_people

Count of unique people mentioned in the context of the entity, reflecting its association with various individuals.

int

143

associated_companies

Count of unique companies mentioned in relation to the entity, indicating its business connections.

int

287

tone

The overall tone of the article, derived from a textual analysis of its content.

float

0.237061

positive

The score quantifying the positive sentiments expressed in the article.

float

2.828125

negative

The score quantifying the negative sentiments expressed in the article.

float

2.591797

polarity

The degree of polarity in the sentiment, indicating the extent of opinionated content.

float

5.421875

activeness

A measure of the dynamism in the language used, possibly indicating the urgency of the article.

float

22.031250

pronouns

The count of pronouns used in the article, indicative of the narrative style or subject focus.

float

0.995117

word_count

The total number of words in the article, giving an indication of its length or detail.

int

1084

Use Case

This dataset provides a comprehensive analysis of various entities (such as companies and individuals) based on their media coverage and associated articles. It's designed to assist investors in understanding the market sentiment, media focus, and the overall perception of entities in which they might be interested. The data is extracted and processed from a wide range of articles, ensuring a broad and in-depth view of each entity.

This dataset is an invaluable resource for investors seeking to gauge public perception, media sentiment, and the prominence of entities in the news. It can be used for:

  • Sentiment analysis to understand the market mood.

  • Identifying trends in media coverage related to specific entities.

  • Assessing the impact of news on stock performance.

  • Conducting peer comparison based on media presence and sentiment.


Last updated