Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

Getting Started

WormsCanned edited this page Oct 26, 2025 · 1 revision

Getting Started

This guide will help you set up and run the GEX LLM Patterns validation framework.


Prerequisites

System Requirements

  • Python: 3.9 or higher
  • OS: Linux, macOS, or Windows (WSL recommended for Windows)
  • Memory: 4GB RAM minimum
  • Storage: 2GB for code + cache

Required Accounts

  1. OpenAI API Key: For GPT-4 LLM calls

  2. Options Data Source (Optional):

    • Currently uses yfinance (free, limited historical data)
    • For production: Consider HistoricalOptionData.com, OptionMetrics, etc.

Installation

Step 1: Clone Repository

git clone https://github.com/iAmGiG/gex-llm-patterns.git
cd gex-llm-patterns

Step 2: Install Dependencies

# Using pip
pip install -r requirements.txt

# Or using conda
conda create -n gex-llm python=3.9
conda activate gex-llm
pip install -r requirements.txt

Key Dependencies:

  • openai - LLM API client
  • pandas - Data manipulation
  • numpy - Numerical computing
  • yfinance - Options data fetching (free tier)
  • pyyaml - Validation report generation

Step 3: Set Up Environment Variables

# Set Python path (required for imports)
export PYTHONPATH=$(pwd):$PYTHONPATH

# Set OpenAI API key
export OPENAI_API_KEY="sk-your-key-here"

# Optional: Configure LLM model
export LLM_MODEL="gpt-4o-mini"  # Default: gpt-4o-mini (cheap)
# export LLM_MODEL="gpt-4"      # More accurate but expensive

Tip: Add these to your ~/.bashrc or ~/.zshrc for persistence

Step 4: Verify Installation

# Check imports work
python -c "from src.agents.market_mechanics_agent import MarketMechanicsAgent; print('✅ Imports OK')"

# Check API key configured
python -c "import os; print('✅ API key set' if os.getenv('OPENAI_API_KEY') else '❌ No API key')"

Quick Start: Run a Single Pattern Validation

Option 1: Validate Gamma Positioning (Q1 2024)

python scripts/validation/validate_pattern_taxonomy.py \
  --pattern gamma_positioning \
  --symbol SPY \
  --start-date 2024-01-02 \
  --end-date 2024-03-29 \
  --confidence 60.0

Expected Output:

  • Processing bar: Processing dates: 100%|████████████| 53/53
  • Validation report: reports/validation/pattern_taxonomy/gamma_positioning_SPY_2024Q1.yaml
  • Summary: Detection rate, predictive accuracy, net alpha

Time: ~5-10 minutes for 53 days (with GPT-4o-mini) Cost: ~$1-2 in API calls

Option 2: Validate All Patterns (Batch)

python scripts/validation/validate_all_patterns.py \
  --patterns gamma_positioning stock_pinning 0dte_hedging \
  --start-date 2024-01-02 \
  --end-date 2024-03-29 \
  --skip-completed

Expected Output:

  • 3 YAML reports (one per pattern)
  • Summary table comparing detection rates

Time: ~15-30 minutes Cost: ~$3-6 in API calls


Understanding the Output

Validation Report Structure (YAML)

# reports/validation/pattern_taxonomy/gamma_positioning_SPY_2024Q1.yaml

pattern_name: gamma_positioning
symbol: SPY
date_range: 2024-01-02 to 2024-03-29
total_days: 53

# Aggregate Results
detection_rate_pct: 100.0        # LLM detected constraint on 100% of days
predictive_accuracy_pct: 96.2    # 96.2% of predictions materialized
avg_return_pct: 0.26             # Average daily return
net_alpha_pct: 0.21              # Return above risk-free rate
sample_size: 53                  # Number of test days

# Per-Day Results
results:
  - test_date: 2024-01-02
    obfuscated_date: "Day T+0"
    detected: true                # LLM detected constraint
    confidence: 85.0              # LLM confidence (0-100)
    predicted_direction: "UP"     # LLM prediction
    forward_return_t1: 0.45       # Actual T+1 return (%)
    prediction_correct: true      # Did prediction materialize?
    net_gex_usd: -8950000000.0   # -$8.95B (negative gamma)
    spot_price: 474.60

  - test_date: 2024-01-03
    # ... (52 more days)

Interpreting Results

Detection Rate:

  • 100%: LLM detected constraint on every test day
  • 60-80%: Strong detection (pattern is mechanical)
  • <60%: Weak detection (pattern may be narrative)

Predictive Accuracy:

  • 96%: LLM predictions materialized 96% of time
  • High accuracy = LLM understands causal mechanism
  • Low accuracy = pattern detected but doesn't drive price

Net Alpha:

  • +0.21%: Strategy outperformed risk-free rate by 21 bps/day
  • Note: Q1 2024 was profitable, but Q3/Q4 declined to near-zero
  • Detection remains stable despite alpha decline (key finding!)

Common Use Cases

1. Reproduce Paper #1 Results

Run full 2024 validation (Q1, Q3, Q4) for all 3 patterns:

# Q1 2024 (Jan-Mar)
python scripts/validation/validate_all_patterns.py \
  --patterns gamma_positioning stock_pinning 0dte_hedging \
  --start-date 2024-01-02 \
  --end-date 2024-03-29

# Q3 2024 (Jul-Sep)
python scripts/validation/validate_all_patterns.py \
  --patterns gamma_positioning stock_pinning 0dte_hedging \
  --start-date 2024-07-01 \
  --end-date 2024-09-30

# Q4 2024 (Oct-Dec)
python scripts/validation/validate_all_patterns.py \
  --patterns gamma_positioning stock_pinning 0dte_hedging \
  --start-date 2024-10-01 \
  --end-date 2024-12-31

Total: 9 validation reports matching Paper #1 results

2. Test New Pattern

Define a new pattern in src/validation/pattern_taxonomy.py:

PATTERNS = {
    # ... existing patterns ...

    "my_new_pattern": {
        "name": "My New Pattern",
        "status": "MECHANICAL",
        "description": "Clear description of constraint",
        "who": "Market participants",
        "whom": "Who is forced?",
        "what": "What are they forced to do?",
        "constraint_mechanism": "Why can't they avoid it?",
        "academic_basis": "Published research citation"
    }
}

Run validation:

python scripts/validation/validate_pattern_taxonomy.py \
  --pattern my_new_pattern \
  --symbol SPY \
  --start-date 2024-01-02 \
  --end-date 2024-03-29

3. Test on Different Asset

# Validate gamma positioning on QQQ instead of SPY
python scripts/validation/validate_pattern_taxonomy.py \
  --pattern gamma_positioning \
  --symbol QQQ \
  --start-date 2024-01-02 \
  --end-date 2024-03-29

Note: Requires options data for that ticker (may need premium data source)

4. Compare Biased vs Unbiased Prompts

# Unbiased (default)
python scripts/validation/validate_pattern_taxonomy.py \
  --pattern gamma_positioning \
  --symbol SPY \
  --start-date 2024-01-02 \
  --end-date 2024-03-29

# Biased (assumes pattern exists)
python scripts/validation/validate_pattern_taxonomy.py \
  --pattern gamma_positioning \
  --symbol SPY \
  --start-date 2024-01-02 \
  --end-date 2024-03-29 \
  --biased

Compare detection rates (biased should be 100%, unbiased more realistic)


Configuration

LLM Model Selection

Available Models:

  • gpt-4o-mini: Fast, cheap (~$0.03/day), good accuracy
  • gpt-4: Slower, expensive (~$0.15/day), highest accuracy
  • gpt-4-turbo: Balanced performance

How to Switch:

# Via environment variable
export LLM_MODEL="gpt-4"

# Or edit config/config.json
{
  "llm": {
    "model": "gpt-4",
    "temperature": 0.0,
    "max_tokens": 2000
  }
}

Obfuscation Settings

Default: Obfuscation enabled (recommended for research)

Disable (for debugging only):

python scripts/validation/validate_pattern_taxonomy.py \
  --pattern gamma_positioning \
  --symbol SPY \
  --start-date 2024-01-02 \
  --end-date 2024-03-29 \
  --no-obfuscate

Warning: Disabling obfuscation may allow LLM to use temporal context (invalidates methodology)

Cache Settings

Default: Options data cached in .cache/

Clear cache (force fresh data fetch):

rm -rf .cache/options_data_cache.db

Rebuild historical GEX database:

python scripts/data/rebuild_historical_gex.py \
  --symbol SPY \
  --start-date 2024-01-01 \
  --end-date 2024-12-31

Troubleshooting

Error: "No module named 'src'"

Cause: PYTHONPATH not set

Fix:

export PYTHONPATH=$(pwd):$PYTHONPATH

Error: "OpenAI API key not found"

Cause: API key not in environment

Fix:

export OPENAI_API_KEY="sk-your-key-here"

Error: "No options data found for date X"

Cause: yfinance doesn't have data for that date (weekends, holidays, or too old)

Fix:

  • Use business days only (skip weekends)
  • Check if date is a market holiday
  • Consider premium data source for complete history

Slow Performance

Symptoms: Validation takes >1 hour for 50 days

Causes & Fixes:

  1. Using GPT-4 → Switch to gpt-4o-mini (10x faster)
  2. Fresh data fetches → Enable caching (default)
  3. Serial processing → Use batch mode (experimental)
# Faster: Use gpt-4o-mini + ensure caching
export LLM_MODEL="gpt-4o-mini"
python scripts/validation/validate_pattern_taxonomy.py --pattern gamma_positioning ...

High API Costs

Symptoms: Validation costs $10+ for 50 days

Cause: Using expensive model (GPT-4)

Fix:

# Switch to gpt-4o-mini (10x cheaper, similar accuracy)
export LLM_MODEL="gpt-4o-mini"

Cost Comparison (50 days):

  • GPT-4: ~$7.50
  • GPT-4o-mini: ~$0.75

Next Steps

Learn More

Run Experiments

  • Reproduce Paper #1: Validate all 3 patterns across full 2024
  • Test new patterns: Define and validate your own dealer constraints
  • Compare assets: Run on QQQ, IWM, or individual stocks

Contribute


Support

Issues: https://github.com/iAmGiG/gex-llm-patterns/issues

Documentation: https://github.com/iAmGiG/gex-llm-patterns/tree/development/docs

Contact: See Publications page


Last Updated: October 25, 2025

Morty Proxy This is a proxified and sanitized view of the page, visit original site.