Methodology · EGC Journal

Scoring Methodology

How relevance scores and signal rankings are computed across the platform.

📊

Polymarket Relevance Score v1 · Mar 2026

Two-stage keyword filter that scores prediction markets 0–1 for energy & macro relevance.
Markets scoring 0.0 are discarded entirely and never stored.

Stage 1 — Hard Exclusion

Before any scoring, markets are dropped if they match a hard exclusion pattern. This eliminates noise from categories that are never relevant to an energy fund.

NFL / NBA / NHL / MLB Oscar / Emmy / Grammy Bitcoin / ETH / crypto Celebrity / reality TV Video games / esports Meme coins Sports championships

Stage 2 — Tiered Keyword Scoring

Tier	Topic	Score range
T1	Core energy & commodities	0.70 – 1.00
T2	Energy-adjacent & broader commodities	0.45 – 0.75
T3	Macro & geopolitical signals	0.20 – 0.40

Scoring is additive within each tier up to its cap. Only the highest matching tier is used — a T1 match short-circuits T2/T3 evaluation.

Tier 1 — Core Energy Keywords

crude oil wti brent oil price natural gas lng lpg opec opec+ petroleum refinery refining gasoline diesel pipeline shale fracking permian offshore drilling oil output oil production oil reserves oil supply energy crisis straits of hormuz strait of malacca

Tier 2 — Energy-Adjacent Keywords

energy power grid electricity utility utilities nuclear solar wind power renewable clean energy carbon emissions co2 net zero climate coal mining metals copper aluminum lithium commodities gulf of mexico north sea caspian

Tier 3 — Macro & Geopolitical Signals

fed federal reserve interest rate inflation cpi pce recession gdp us dollar dxy treasury rate cut rate hike iran russia ukraine china middle east saudi arabia venezuela opec meeting g7 g20 tariff sanctions war conflict

score = 0.0 if T1_matches ≥ 1: score = min(1.00, 0.70 + T1_matches × 0.15) elif T2_matches ≥ 1: score = min(0.75, 0.45 + T2_matches × 0.10) elif T3_matches ≥ 1: score = min(0.40, 0.20 + T3_matches × 0.05) if score == 0.0: discard market (never stored)

Known limitations: Pure keyword matching — no semantic understanding. A market like "Will Trump tweet about oil?" might score 0 despite being energy-relevant. Future improvement: semantic embedding scoring (OpenAI embeddings) as an optional second pass.

🐦

X / Twitter Handle Relevance Score v1 · Mar 2026

Scores each tracked handle 0–1 based on bio semantics, follower reach, and verified status.
Used to sort handles on the Twitter/X Intelligence page and prioritise signal weight.

Bio Keyword Scoring

Tier	Topic	Per match	Cap
T1	Energy & commodities	+0.35	0.50
T2	Finance & macro	+0.15	0.25
T3	Geopolitical	+0.05	0.10

Reach & Credibility Bonuses

Signal	Bonus
Followers ≥ 10,000	+0.05
Followers ≥ 100,000	+0.10
Followers ≥ 1,000,000	+0.15
Verified account (✓)	+0.05

Follower bonuses are non-cumulative — only the highest tier applies. Total score is capped at 1.0.

bio_score = min(0.50, T1_matches × 0.35) + min(0.25, T2_matches × 0.15) + min(0.10, T3_matches × 0.05) follower_bonus = 0.15 if followers ≥ 1M else 0.10 if followers ≥ 100k else 0.05 if followers ≥ 10k else 0.00 verified_bonus = 0.05 if verified else 0.00 score = min(1.0, bio_score + follower_bonus + verified_bonus)

Tier 1 Bio Keywords (Energy)

oil crude energy gas lng opec petroleum refin commodit brent wti ngl shale offshore pipeline

Substring match — e.g. "refin" matches "refining", "refinery".

Tier 2 Bio Keywords (Finance / Macro)

macro market trade invest equity fund asset analyst research economics finance portfolio hedge commodities mining metals copper lithium

Tier 3 Bio Keywords (Geopolitical)

geopolit sanctions iran russia saudi opec climate renewable transition carbon esg risk inflation central bank fed

Score Interpretation

High relevance (≥ 0.60)

≥ 60%

Medium (0.30 – 0.59)

30–59%

Low (< 0.30)

< 30%

Known limitations: Bio scoring uses substring matching against a manually curated keyword list. Handles with sparse or non-English bios may score lower than their actual relevance warrants. Profile data is refreshed on-demand (👤 button) or via weekly scheduled task — scores are not live.

🧮

MSCI Style Factor Definitions EFMGEMTR · June 2022

Definitions for all 17 style factors tracked in the EGC risk platform, as defined in the MSCI Global Equity Factor Trading Model (EFMGEMTR) Empirical Notes, Section 3.4. Factors are standardised to a cap-weighted mean of zero and an equal-weighted standard deviation of one.

The platform tracks 17 style factors drawn from the EFMGEMTR model. These are grouped below by their Level 1 type as defined in Appendix E of the model handbook. Each exposure is the portfolio's net weighted-average exposure to that factor — positive values indicate a long tilt, negative values a short tilt.

Volatility

Factor	Definition (EFMGEMTR §3.4)	Key Descriptors
Beta	Captures market risk that cannot be explained by the Market factor. Computed by a time-series regression of excess stock returns against the cap-weighted estimation universe. Typically the strongest style factor by volatility.	HBETA (Historical Beta)
Residual Volatility	Captures volatility in stock returns not explained by the Beta factor. Orthogonalised to the Beta, Liquidity, and Size factors.	HSIGMA · DSTD · CMRA

Size

Factor	Definition (EFMGEMTR §3.4)	Key Descriptors
Size	A strong source of equity return covariance. Captures return differences between large-cap and small-cap stocks. Measured by the log of market capitalisation.	LNCAP (Log Mkt Cap)

Value

Factor	Definition (EFMGEMTR §3.4)	Key Descriptors
Value	Captures the extent to which a company is mispriced using Book-to-Price and Sales-to-Price ratios. Book-to-Price is the most important descriptor.	BTOP · STOP
Earnings Yield	Captures return differences due to various forms of earnings-to-price ratios, including analyst-predicted E/P, historical E/P, cash E/P, and enterprise multiple (EBIT/EV).	ETOPF · ETOP · CETOP · EBITTOEV
Dividend Yield	Captures return differences due to companies' historical and analyst-predicted dividend-to-price ratios.	DTOP · DPIBS
LT Reversal	Captures return differences due to the stocks' long-term (four years lagged by 13 months) relative performance. Orthogonalised to the Momentum factor.	LTRSTR · LTHALPHA

Momentum

Factor	Definition (EFMGEMTR §3.4)	Key Descriptors
Momentum	Captures return differences due to stocks' recent 12-month performance, excluding the most recent 11 days to avoid short-term reversal contamination. Second strongest style factor by volatility after Beta.	RSTR · HALPHA
ST Reversal	Captures how stocks under- or over-performed the market in the recent past, as this effect is expected to reverse in the near future.	STREV

Quality

Factor	Definition (EFMGEMTR §3.4)	Key Descriptors
Profitability	Combines profitability measures characterising efficiency of a firm's operations and total activities: asset turnover, gross profitability, gross profit margin, return on assets, and return on equity.	ROA · GP · GPM · ATO
Leverage	Captures return differences between high- and low-leverage stocks. Descriptors include market leverage, book leverage, and liabilities-to-assets ratio.	MLEV · BLEV · DTOA
Earnings Quality	Captures return differences due to companies' cash-earnings-to-earnings ratio and accrual components of earnings.	ABS · ACF · CETEO

Liquidity

Factor	Definition (EFMGEMTR §3.4)	Key Descriptors
Liquidity	Captures return differences due to relative trading activity, measured by the fraction of total shares outstanding traded over monthly, quarterly, and annual trailing windows.	STOM · STOQ · STOA · ATVR

Growth

Factor	Definition (EFMGEMTR §3.4)	Key Descriptors
Growth	Captures return differences due to analyst-predicted earnings growth and historical sales and earnings growth. Analyst-predicted earnings growth (mid-term) is the most important descriptor.	EGRMF · EGRLF · EGRO · SGRO

Sentiment

Factor	Definition (EFMGEMTR §3.4)	Key Descriptors
Short Interest	Captures the extent to which a stock is sold short. Three descriptors: short utilisation rate (shares short ÷ shares available to borrow), borrow rate charged by prime brokers, and days-to-cover (shares on loan ÷ average daily volume).	SHORTUTIL · BORROWRATE · DAYSCOVER

Machine Learning

Factor	Definition (EFMGEMTR §3.4)	Key Descriptors
ML Factor	Captures non-linearities and interactions in the relationship between factor exposures and stock returns using machine learning to identify those relationships. Uses neural networks with 24-, 48-, and 72-month lookback windows.	ML_GEMTR_NN_24M · _48M · _72M

Crowding

Factor	Definition (EFMGEMTR §3.4)	Key Descriptors
Stock Crowding	Measures crowdedness of a stock based on deviations of its current factor exposures from their historical medians. Six descriptors drawn from Value, Earnings Yield, Short Interest, Liquidity, Momentum, and Residual Volatility factors — the first three are most important.	CROWD_VALUE · CROWD_EARNYILD · CROWD_SHORTINT · CROWD_LIQUIDITY · CROWD_MOMENTUM · CROWD_RESVOL

Source: MSCI Global Equity Factor Trading Model (EFMGEMTR) Empirical Notes, George Bonne et al., June 2022, Section 3.4 — Style Factors. Definitions reproduced for internal reference only.

➕

More methodology docs coming

Planned: semantic search scoring, portfolio beta methodology, risk limit calibration, synthesis prompt design.