Transform unstructured earnings call transcripts into actionable investment insights using Snowflake Cortex AI, ML Model Registry, and intelligent agents - all accelerated by Cortex Code.
Financial analysts spend countless hours manually reviewing earnings call transcripts. This guide demonstrates how to systematically process unstructured data at scale using AI Functions (AI_COMPLETE, AI_SQL) - turning raw transcript text into structured sentiment scores, analyst participation metrics, and investment signals that feed directly into quantitative models.
Full Guide: For detailed architecture, business impact, and use cases, see the Snowflake Developers Guide.
Harry Yu
Senior Data Scientist, Finance | Snowflake
📧 h.yu@snowflake.com | 💻 GitHub
- How to use Cortex Code to build entire ML pipelines through natural language
- How to extract structured insights from unstructured text using
AI_COMPLETE() - How to train and register ML models in Snowflake's Model Registry
- How to create semantic search over unstructured data with Cortex Search
- How to build a Semantic View for natural language SQL queries via Cortex Analyst
- How to build a Cortex Agent that combines multiple AI tools
- How to access your agent through Snowflake Intelligence
- A sentiment analysis pipeline using
AI_COMPLETE()to score earnings call transcripts (1-10 scale) - A LightGBM stock prediction model with walk-forward validation, registered in Snowflake Model Registry
- A Cortex Search service for semantic search over sentiment insights
- A Semantic View enabling natural language queries via Cortex Analyst
- A Cortex Agent that orchestrates ML predictions, structured queries, semantic search, and email notifications—all accessible via Snowflake Intelligence
- Snowflake account (sign up for a free trial) with
ACCOUNTADMINaccess (see note below)
Note on Privileges: This guide uses
ACCOUNTADMINfor simplicity in demo and learning environments. For production deployments, follow the principle of least privilege by creating a dedicated role with only the specific grants required.
- In Snowsight, navigate to Projects > Workspaces
- Create a new SQL file and copy the contents from
scripts/setup.sql - Run the entire script
This creates the complete demo environment including:
- Auto-installs Snowflake Public Data (Free) from Marketplace
- Database, warehouse, and role setup
- Pre-computed ML features (
FSI_DATAtable) - Tables, stored procedures, and ML model infrastructure
- Deploys reference notebooks from this repository
| Path | Description |
|---|---|
| Path A: Cortex Code | Build everything through natural language prompts |
| Path B: Notebooks | Run pre-built notebooks |
Build the entire quantitative research pipeline through conversation with Cortex Code.
Before starting, run the setup script to create the required database objects:
- Open a SQL worksheet in Snowsight
- Copy and run the contents of scripts/setup.sql
This creates the database, schema, role, warehouse, and base tables needed for the lab.
- Navigate to Projects → Notebooks in Snowsight
- Click + Notebook (top-right)
- Configure the notebook:
- Notebook location:
FSI_DEMO_DB→ANALYTICS - Notebook warehouse:
FSI_DEMO_WH
- Notebook location:
- Click Create
- Delete the auto-populated sample cells (select cell → delete)
- Add a Python cell with this starter code and run it:
import pandas as pd
import numpy as np
from snowflake.snowpark.context import get_active_session
session = get_active_session()
session.use_role("FSI_DEMO_ROLE")
session.use_warehouse("FSI_DEMO_WH")
session.use_database("FSI_DEMO_DB")
session.use_schema("ANALYTICS")-
Click Packages (top menu) and add:
lightgbmscikit-learnsnowflake-ml-pythonmatplotlibseabornstatsmodels
-
Click Start to activate the notebook
Click the Cortex Code icon (bottom-right corner of the notebook).
Tip: Refresh the page before your first prompt - this helps Cortex Code recognize the notebook context.
Use the following prompts one at a time. Run the generated code after each prompt before moving to the next.
Tip: You have multiple options to run generated code:
- Click the + button to add it as a new cell in your notebook
- Use the Run option when Cortex Code offers it
- Click the play button to run it within the chat interface
Using FSI_DEMO_DB.ANALYTICS.UNIQUE_TRANSCRIPTS table, extract analyst sentiment from earnings call transcripts.
Use AI_COMPLETE with claude-4-sonnet to analyze each transcript. Focus ONLY on analyst questions and tone (ignore management remarks). Score sentiment on 1-10 scale where 1=extremely negative, 5=neutral, 10=extremely positive.
Return JSON with: score (1-10), reason (brief explanation), analyst_count (number of unique analysts).
Insert results into AI_TRANSCRIPTS_ANALYSTS_SENTIMENTS table with columns: PRIMARY_TICKER, EVENT_TIMESTAMP, EVENT_TYPE, CREATED_AT, SENTIMENT_SCORE, UNIQUE_ANALYST_COUNT, SENTIMENT_REASON.
Filter out events with analyst_count <= 1.
Using FSI_DEMO_DB.ANALYTICS.FSI_DATA table which has columns: ticker, date, price, r_1, r_5_1, r_10_5, r_21_10, r_63_21, and y (target).
Train a quarterly walk-forward LightGBM regression model:
- Features: r_1, r_5_1, r_10_5, r_21_10, r_63_21
- Target: y (5-day forward return)
- For each test quarter Q: train on quarters < Q-2, validate on Q-2 and Q-1, test on Q
- Use L2 metric with early stopping (200 rounds patience)
- Hyperparameter grid: learning_rate [0.03, 0.05, 0.10], num_leaves [31, 63]
Register each quarter's best model to Snowflake Model Registry as FIS_{quarter} (e.g., FIS_2024Q3, FIS_2025Q1) with:
- version_name="v1"
- sample_input_data from training data (100 rows)
- options={"relax_version": False, "target_methods": ["predict"], "method_options": {"predict": {"case_sensitive": True}}}
Do NOT pass metrics to log_model. Do NOT use target_methods as a separate parameter.
Create a Cortex Search Service named DOW_ANALYSTS_SENTIMENT_ANALYSIS in FSI_DEMO_DB.ANALYTICS schema.
Source table: AI_TRANSCRIPTS_ANALYSTS_SENTIMENTS
Search column: SENTIMENT_REASON
Columns to return: PRIMARY_TICKER, EVENT_TIMESTAMP, SENTIMENT_SCORE, UNIQUE_ANALYST_COUNT, SENTIMENT_REASON
Warehouse: FSI_DEMO_WH
Target lag: 1 day
Create a Semantic View named ANALYST_SENTIMENTS_VIEW in FSI_DEMO_DB.ANALYTICS schema for natural language queries on analyst sentiment data.
Source table: AI_TRANSCRIPTS_ANALYSTS_SENTIMENTS
Dimensions:
- PRIMARY_TICKER: Company stock ticker symbol
- EVENT_TIMESTAMP: Date and time of the earnings call
- EVENT_TYPE: Type of event (Earnings Call)
Measures:
- SENTIMENT_SCORE: Analyst sentiment rating from 1-10
- UNIQUE_ANALYST_COUNT: Number of unique analysts participating
Include SENTIMENT_REASON as descriptive text field for qualitative insights.
Create a Cortex Agent named QUANTITATIVE_RESEARCH_AGENT in FSI_DEMO_DB.ANALYTICS schema.
Display name: "Quantitative Research Agent"
Instructions/System prompt:
"You are a quantitative research assistant specializing in Dow Jones 30 stock analysis. You help users with:
1. ML-based stock predictions - Use GET_TOP_BOTTOM_STOCK_PREDICTIONS to get top/bottom ranked stocks by predicted 5-day returns
2. Analyst sentiment queries - Query structured sentiment data (scores, analyst counts) via the semantic view
3. Sentiment insights - Search earnings call transcripts for qualitative analyst commentary
4. Email alerts - Send portfolio recommendations or research summaries via email
When asked about stock picks or predictions, always use the ML prediction tool first. When asked about analyst opinions or sentiment, combine both the semantic view (for scores) and search service (for reasoning). Be concise and data-driven in responses."
Sample questions:
- "What are the top 5 stocks to buy this week based on ML predictions?"
- "Which stocks have the most positive analyst sentiment?"
- "What did analysts say about Apple's last earnings call?"
- "Show me stocks with bullish sentiment but negative ML predictions"
- "Email me a summary of this week's top stock picks"
Tools to include (use SQL CREATE AGENT syntax with tool_resources array):
1. Semantic view tool: type='semantic_view', identifier='FSI_DEMO_DB.ANALYTICS.ANALYST_SENTIMENTS_VIEW'
2. Cortex search tool: type='cortex_search', identifier='FSI_DEMO_DB.ANALYTICS.DOW_ANALYSTS_SENTIMENT_ANALYSIS'
3. Procedure tool (ML predictions): type='procedure', identifier='FSI_DEMO_DB.ANALYTICS.GET_TOP_BOTTOM_STOCK_PREDICTIONS', execution_environment='sandbox'
- Parameters: MODEL_NAME (STRING, optional - auto-detects latest model if NULL), TOP_N (INTEGER, default 5 - returns both top N and bottom N stocks)
- Returns ranked stocks by predicted 5-day forward returns
4. Procedure tool (email): type='procedure', identifier='FSI_DEMO_DB.ANALYTICS.SEND_EMAIL', execution_environment='sandbox'
- Parameters: RECIPIENT_EMAIL (VARCHAR, optional - uses current user's email if NULL), SUBJECT (VARCHAR), BODY (VARCHAR)
IMPORTANT: For procedure tools, you must include execution_environment='sandbox' in the tool_resources.
Model: claude-3-5-sonnet
Warehouse: FSI_DEMO_WH
Register the agent FSI_DEMO_DB.ANALYTICS.QUANTITATIVE_RESEARCH_AGENT with Snowflake Intelligence object SNOWFLAKE_INTELLIGENCE_OBJECT_DEFAULT so users can interact with it conversationally.
Navigate to AI & ML → Snowflake Intelligence and select Quantitative Research Agent. Try these:
ML Predictions:
Give me top 3 vs bottom 3 trade predictions for the next period
Sentiment Queries:
Which companies have the highest sentiment score?
Semantic Search:
Search for companies with concerns about margins
Combined Analysis:
Compare the top predicted stocks with their analyst sentiment scores
Email Reports:
Note: Email functionality requires your Snowflake user to have a verified email address. Verify your email in Snowsight: User menu → Setting → Profile → Verify Email.
Send me an email summary of today's top stock picks
These prompts are for deeper exploration - not required for the agent.
Optional Prompt A: Feature Engineering (from scratch)
Note: This is already done in
setup.sqlwhich creates theFSI_DATAtable with pre-computed features. Use this prompt if you want to understand or recreate the feature engineering process.
Using SNOWFLAKE_PUBLIC_DATA_FREE.PUBLIC_DATA_FREE.STOCK_PRICE_TIMESERIES for Dow Jones 30 stocks (MMM, AXP, AMGN, AMZN, AAPL, BA, CAT, CVX, CSCO, KO, DIS, GS, HD, HON, IBM, JNJ, JPM, MCD, MRK, MSFT, NKE, PG, RTX, CRM, SHW, TRV, UNH, V, WMT, NVDA).
Construct momentum features using log returns:
- r_1: today's return
- r_5_1: return from t-4 to t-1
- r_10_5: return from t-9 to t-5
- r_21_10: return from t-20 to t-11
- r_63_21: return from t-62 to t-21
Construct target variable y: future return from t+2 to t+6.
Keep as panel data with ticker as a column.
Optional Prompt B: Backtesting Strategy
Test if the ML strategy works starting 2021.
Portfolio construction:
- Generate forecasts on Tuesdays
- At Wednesday close, go long top-5 and short bottom-5 by predicted return (equal weight)
- Hold through Thursday to next Wednesday (the t+2..t+6 window)
- Transaction cost: 3.0 bps one-way via weekly turnover
Show metrics:
- Information Ratio (before/after costs)
- Max drawdown
- Calmar ratio
Plot equity curves for before and after costs.
Optional Prompt C: Sentiment-Return Regression Analysis
Analyze the relationship between analyst sentiment and stock returns.
Merge sentiment data with price data using merge_asof (forward direction).
Create:
- 1D return: reaction during earnings call
- 3D return: return_lead_1 + return_lead_2 + return_lead_3 (post-earnings drift)
Run OLS regression: return ~ sentiment_score
Winsorize returns at 1st/99th percentiles.
Create scatter plots showing:
- Sentiment Score vs 1D Return with OLS fit line, β, t-stat
- Sentiment Score vs 3D Return with OLS fit line, β, t-stat
Repeat analysis using sentiment_change (vs previous earnings call).
If you prefer running pre-built code instead of Cortex Code prompts:
- Navigate to Projects > Notebooks
- Switch role to
FSI_DEMO_ROLE - Open
START_HEREnotebook - Run all cells to extract analyst sentiment using AI Functions
Equivalent to Prompt 1
- Open
TRAIN_ML_MODELSnotebook - Run all cells to train and register ML models
Equivalent to Prompt 2
- Open
CREATE_CORTEX_COMPONENTSnotebook - Run all cells to create Cortex Search, Semantic View, and Agent
Equivalent to Prompts 3-6
Navigate to AI & ML → Snowflake Intelligence and select Quantitative Research Agent.
Beyond the guided prompts, here's what Cortex Code can do:
What tables exist in FSI_DEMO_DB.ANALYTICS? Describe each one.
Explain this notebook cell by cell
This cell is throwing an error - help me fix it
Why is my model prediction returning NULL?
Summarize the model's feature importance
Which stocks had the biggest prediction errors?
To remove all demo objects:
- Navigate to Projects > Workspaces
- Create a new SQL file with contents from
scripts/teardown.sql - Run the script
You've built an end-to-end AI-powered quantitative research pipeline entirely within Snowflake. From here:
- Expand coverage - Add more companies beyond DOW 30
- Add new features - Use Cortex Code to add technical indicators (RSI, MACD)
- Improve the model - Experiment with different algorithms
- Build dashboards - Create a Streamlit app for visualization
- Automate updates - Schedule daily predictions with Snowflake Tasks
Use Cortex Code to help with all of it—just describe what you want to build.
- Cortex Code Documentation
- Cortex AI Functions
- Cortex Search
- Cortex Analyst
- Snowflake Intelligence
- Snowflake ML Model Registry
- Snowflake Notebooks
Copyright (c) Snowflake Inc. All rights reserved.
The code in this repository is licensed under the Apache 2.0 License.




