Methodology Report
StockFlow US Short-Term Forecast Evaluation
Published on February 21, 2026. This benchmark compares three signal systems on the same universe, date split, thresholds, and turnover cost assumptions.
One-line conclusion
The compatibility hybrid is better calibrated and more defensive, while baseline StockFlow keeps the top directional hit rate.
Calibration
Brier -0.0039
Hybrid vs StockFlow baseline on test split.
Trading Discipline
Action Rate -13.21pct
Fewer forced trades in noisy zones.
Downside
Max Drawdown +4.28pct
Hybrid drawdown is shallower than baseline.
Risk-Adjusted
Sharpe +0.091
Hybrid improves risk-adjusted profile on test split.
Evaluation Setup
- Universe: 16 tickers (AAPL, MSFT, NVDA, AMZN, GOOGL, META, TSLA, AVGO, AMD, NFLX, JPM, XOM, LLY, SPY, QQQ, IWM)
- Data range: 2021-01-01 to 2026-02-21 (Yahoo Finance adjusted daily bars)
- Split: train before 2025-01-01, test from 2025-01-01
- Decision rule: BUY if p_up >= 0.56, SELL if p_up <= 0.44, else HOLD
- Cost model: 5 bps per position change
Test Split Metrics
| Model | Accuracy | Brier | Buy Precision | Action Rate | Total Return | Sharpe | Max Drawdown |
|---|---|---|---|---|---|---|---|
| kimi_rule | 0.5104 | 0.2571 | 0.5258 | 0.6979 | -0.0202 | 0.0169 | -0.2804 |
| stockflow_v1 | 0.5159 | 0.2591 | 0.5321 | 0.6601 | -0.0397 | -0.0549 | -0.3097 |
| stockflow_hybrid_v2 | 0.5115 | 0.2552 | 0.5384 | 0.5280 | -0.0192 | 0.0362 | -0.2669 |
Compatibility Upgrade Plan
- 1. Keep StockFlow factor backbone as the stable core and layer Kimi short-horizon extremes as tactical signals.
- 2. Add calibration (Platt/Isotonic) and regime-specific thresholds to reduce probability distortion.
- 3. Integrate point-in-time news/options snapshots before scoring those factors in live production.
Download Data
The CSV files below are generated by the reproducible evaluation script and can be used for independent checks.
Research and education only. Not investment advice.