DataTalksClub
diff --git a/‎…ecology-biodiversity-and-conservation.md‎ ‎…ecology-biodiversity-and-conservation.md‎_podcast/to-update/s18e03-ai-for-ecology-biodiversity-and-conservation.md renamed to _podcast/ai-for-ecology-biodiversity-and-conservation.md
Lines changed: 33 additions & 9 deletions b/‎…ecology-biodiversity-and-conservation.md‎ ‎…ecology-biodiversity-and-conservation.md‎_podcast/to-update/s18e03-ai-for-ecology-biodiversity-and-conservation.md renamed to _podcast/ai-for-ecology-biodiversity-and-conservation.md
Lines changed: 33 additions & 9 deletions
diff --git a/‎…te/s20e01-trends-in-ai-infrastructure.md‎ ‎…id-cloud-on-prem-distributed-training.md‎_podcast/to-update/s20e01-trends-in-ai-infrastructure.md renamed to _podcast/ai-infrastructure-hybrid-cloud-on-prem-distributed-training.md
Lines changed: 43 additions & 20 deletions b/‎…te/s20e01-trends-in-ai-infrastructure.md‎ ‎…id-cloud-on-prem-distributed-training.md‎_podcast/to-update/s20e01-trends-in-ai-infrastructure.md renamed to _podcast/ai-infrastructure-hybrid-cloud-on-prem-distributed-training.md
Lines changed: 43 additions & 20 deletions
diff --git a/‎…ysis-with-python-and-machine-learning.md‎ ‎…ding-with-python-and-machine-learning.md‎_podcast/to-update/s17e03-stock-market-analysis-with-python-and-machine-learning.md renamed to _podcast/algorithmic-trading-with-python-and-machine-learning.md
Lines changed: 33 additions & 10 deletions b/‎…ysis-with-python-and-machine-learning.md‎ ‎…ding-with-python-and-machine-learning.md‎_podcast/to-update/s17e03-stock-market-analysis-with-python-and-machine-learning.md renamed to _podcast/algorithmic-trading-with-python-and-machine-learning.md
Lines changed: 33 additions & 10 deletions
@@ -1,7 +1,6 @@
 ---
-title: "Context: The episode frames a biodiversity crisis made harder by fragmented, sparse data and limited monitoring capacity, then surveys AI tools (computer vision, remote sensing, platforms, citizen science), technical challenges, ethical concerns, and policy needs for conservation.
-
-Core narrative: AI's most important role in conservation is as an integrative, trustworthy infrastructure that turns heterogeneous, messy ecological data into continuous, scalable, and actionable knowledge—bridging camera traps, drones, satellites, citizen science, and field expertise through interoperable standards, robust models, edge deployment, and open platforms. Real impact requires coupling technical advances with ethics, community engagement, capacity building, sustainable funding, and multistakeholder governance so that AI-enabled monitoring directly informs equitable conservation decisions, enforcement, and long-term policy."
+title: 'AI for Ecology, Biodiversity, and Conservation: Computer Vision, Remote Sensing
+  and Citizen Science'
 short: AI for Ecology, Biodiversity, and Conservation
 season: 18
 episode: 3
@@ -16,12 +15,26 @@ links:
   apple: https://podcasts.apple.com/us/podcast/ai-for-ecology-biodiversity-and-conservation-tanya/id1541710331?i=1000653709956
   spotify: https://open.spotify.com/episode/3Hhz5N8ZDvsOPlPP3wxQxq?si=Oz7y_pBrTfeypfYZXubu-g
   youtube: https://www.youtube.com/watch?v=30tTrozbAkg
-
-description: 'Discover AI-driven wildlife conservation: computer vision, remote sensing & citizen science for scalable species ID, habitat maps, alerts and policy impact.'
-intro: How can AI actually scale wildlife conservation in the face of accelerating biodiversity loss and persistent data gaps? In this episode, computational ecologist Tanya Berger-Wolf—director of TDAI@OSU, co‑founder of the Wildbook project, and director of technology at Wild Me—walks us through practical ways computer vision, remote sensing, and citizen science are transforming biodiversity monitoring. <br><br> We explore core AI techniques (machine learning, transfer learning, domain adaptation), image‑based monitoring with camera traps, drones and photo‑ID for individual tracking, and remote sensing for habitat mapping and change detection. Tanya addresses key data challenges—labeling, class imbalance, sparse observations—and the need for interoperable datasets, open standards and FAIR principles. We also cover model robustness, edge deployment in the field, ethics and Indigenous knowledge, scalable platforms like Wildbook, and how citizen science and crowdsourcing support quality control and long‑term monitoring. <br><br> Listeners will come away with a clearer understanding of tools and workflows for wildlife monitoring, practical barriers to scaling AI for conservation, policy and funding considerations, and resources to begin applying computer vision, remote sensing, and citizen science in their own conservation projects
+description: Discover AI-driven computer vision and remote sensing strategies to scale
+  biodiversity monitoring, improve species ID, and inform conservation policy.
+intro: How can AI help close critical data gaps in biodiversity monitoring and turn
+  images and sensor data into actionable conservation decisions? In this episode Tanya
+  Berger‑Wolf, a computational ecologist, director of TDAI@OSU, and co‑founder of
+  the Wildbook project (Wild Me), walks through practical applications of AI for ecology,
+  biodiversity monitoring, and conservation. <br><br> We cover core techniques—computer
+  vision, machine learning, and remote sensing—and their use in image‑based monitoring
+  with camera traps, drones, and species identification. Tanya explains individual
+  identification and longitudinal tracking, habitat mapping and change detection,
+  and the data challenges of labeling, class imbalance, and sparse observations. The
+  conversation addresses integration of heterogeneous datasets, model robustness (domain
+  shift and transfer learning), and ethical considerations including Indigenous knowledge
+  and equity. You’ll also hear about scalable platforms like Wildbook, citizen science
+  workflows for crowdsourcing and quality control, policy relevance, open data and
+  FAIR principles, edge deployment in the field, and building sustainable monitoring
+  programs. <br><br> Listen to gain concrete insights on tools, pitfalls, and next
+  steps for applying AI to conservation—what works now, what remains hard, and resources
+  to explore further.
 dateadded: 2024-04-28
-
-
 quotableClips:
 - name: Podcast Introduction
   startOffset: 0
@@ -119,9 +132,20 @@ quotableClips:
   startOffset: 3720
   url: https://www.youtube.com/watch?v=30tTrozbAkg&t=3720
   endOffset: 3720
+context: 'Context: The episode frames a biodiversity crisis made harder by fragmented,
+  sparse data and limited monitoring capacity, then surveys AI tools (computer vision,
+  remote sensing, platforms, citizen science), technical challenges, ethical concerns,
+  and policy needs for conservation.
 
+  Core narrative: AI''s most important role in conservation is as an integrative,
+  trustworthy infrastructure that turns heterogeneous, messy ecological data into
+  continuous, scalable, and actionable knowledge—bridging camera traps, drones, satellites,
+  citizen science, and field expertise through interoperable standards, robust models,
+  edge deployment, and open platforms. Real impact requires coupling technical advances
+  with ethics, community engagement, capacity building, sustainable funding, and multistakeholder
+  governance so that AI-enabled monitoring directly informs equitable conservation
+  decisions, enforcement, and long-term policy.'
 ---
-
 Links:
 
 * [Biodiversity and Artificial Intelligence pdf](https://www.gpai.ai/projects/responsible-ai/environment/biodiversity-and-AI-opportunities-recommendations-for-action.pdf){:target="_blank"}
@@ -1,17 +1,6 @@
 ---
-title: "Context: A conversation with an AI-infrastructure practitioner about moving from developer tools to building DStack, exploring real-world trade-offs across hardware, software, deployment, and business models for practical AI adoption.
-
-Core theme (single unifying idea): Practical AI is an infrastructure-first problem — success depends less on chasing the biggest model and more on designing cost-effective, controllable, and efficient stacks (hardware, orchestration, and software) that fit hybrid cloud/on‑prem realities, leverage open-source ecosystems, and optimize distributed training and serving for real-world constraints.
-
-Dominant through-line: Every segment — from cost of ownership and cloud vs on‑prem trade‑offs to open vs proprietary models, decentralization, distributed training bottlenecks, orchestration gaps, and edge/federated use cases — returns to the same tension: how to deliver AI that is scalable, performant, and economically sustainable by choosing the right mix of tooling, deployment model, and optimizations.
-
-Key themes implied by the narrative:
-- Cost and control drive architecture choices more than raw model capability.
-- Hybrid cloud + on‑prem is the pragmatic reality; orchestration must adapt.
-- Open-source ecosystems accelerate feedback, tooling, and business flexibility.
-- Efficient distributed training and communication optimizations trump brute-force scaling.
-- Decentralization (privacy, local control, edge) is often a matter of fit and trade-offs, not ideology.
-- Practical provisioning, automation, and orchestration are the unsolved scaling problems for non–AI‑first organizations."
+title: 'Post-ChatGPT AI Infrastructure: Open Source Orchestration, On-Prem Economics
+  & Distributed Training at Scale'
 short: Trends in AI Infrastructure
 season: 20
 episode: 1
@@ -26,13 +15,26 @@ links:
   apple: https://podcasts.apple.com/us/podcast/redefining-ai-infrastructure-open-source-chips-and/id1541710331?i=1000687565459
   spotify: https://open.spotify.com/episode/5MIc1pAXPxVYSr0E4pndU4
   youtube: https://www.youtube.com/watch?v=1aMuynlLM3o
-
-description: Discover DStack to cut AI infrastructure costs with on‑prem GPU training and MLOps alternatives—optimize distributed training, reduce orchestration overhead
-intro: 'How can engineering teams cut AI infrastructure costs without sacrificing performance or control? In this episode, Andrey Cheptsov — founder and CEO of dstack and former JetBrains engineer — walks through the motivation behind DStack, an open‑source orchestration alternative designed to lower AI infrastructure total cost of ownership. We trace the cloud vs on‑prem economics (including MLOps limitations like SageMaker), the decision to build open‑source developer tooling, and the trade‑offs between open and proprietary models. <br><br> You’ll hear practical discussion of on‑prem GPU training and distributed training challenges: GPU requirements, PyTorch + NCCL communication bottlenecks, optimization strategies such as DeepSpeed, and tips for fine‑tuning and serving models for non–AI‑first companies. The episode also covers orchestration gaps — Kubernetes and SLURM limitations — plus bare‑metal provisioning, hybrid cloud realities, edge computing scope, and federated learning versus distributed compute. <br><br> If you’re evaluating MLOps alternatives, on‑prem GPU coordination, or ways to reduce AI infrastructure cost, this episode offers concrete perspectives on when to choose on‑prem vs cloud, how DStack fits into the stack, and practical trade‑offs for production ML workloads.'
+description: 'Discover AI infrastructure strategies: open source orchestration, on-prem
+  economics and distributed training at scale to cut costs, boost performance and
+  control.'
+intro: How has the rise of ChatGPT reshaped the infrastructure needed to build and
+  run large language models, and when does open source orchestration make sense compared
+  to cloud or proprietary systems? In this episode we speak with Andrey Cheptsov,
+  founder and CEO of dstack — an open-source alternative to Kubernetes and Slurm designed
+  to simplify AI infrastructure orchestration. Drawing on his decade-plus at JetBrains
+  building developer tools, Andrey frames practical trade-offs between on-prem economics
+  and cloud spend, the maturity of open source orchestration tools, and patterns for
+  distributed training at scale. We cover core topics including open source orchestration
+  for AI workloads, cost and operational considerations for on-prem deployments, and
+  strategies to scale distributed training efficiently and reliably. Listen to understand
+  when an open source approach like dstack is appropriate, what to evaluate in orchestration
+  tools, and how to balance performance, cost, and control as you scale AI projects
+  post-ChatGPT. This episode is for engineering leaders and ML infrastructure teams
+  seeking actionable insights on AI infrastructure, orchestration tools, on‑prem economics,
+  and distributed training best practices.
 dateadded: 2025-02-26
-
 duration: PT01H06M04S
-
 quotableClips:
 - name: Episode Kickoff & Guest Introduction
   startOffset: 0
@@ -118,7 +120,6 @@ quotableClips:
   startOffset: 3938
   url: https://www.youtube.com/watch?v=1aMuynlLM3o&t=3938
   endOffset: 3964
-
 transcript:
 - header: Episode Kickoff & Guest Introduction
 - line: This week, we'll talk about AI infrastructure and everything related to it.
@@ -955,8 +956,30 @@ transcript:
   sec: 3964
   time: '1:06:04'
   who: Andrey
----
+context: 'Context: A conversation with an AI-infrastructure practitioner about moving
+  from developer tools to building DStack, exploring real-world trade-offs across
+  hardware, software, deployment, and business models for practical AI adoption.
+
+  Core theme (single unifying idea): Practical AI is an infrastructure-first problem
+  — success depends less on chasing the biggest model and more on designing cost-effective,
+  controllable, and efficient stacks (hardware, orchestration, and software) that
+  fit hybrid cloud/on‑prem realities, leverage open-source ecosystems, and optimize
+  distributed training and serving for real-world constraints.
 
+  Dominant through-line: Every segment — from cost of ownership and cloud vs on‑prem
+  trade‑offs to open vs proprietary models, decentralization, distributed training
+  bottlenecks, orchestration gaps, and edge/federated use cases — returns to the same
+  tension: how to deliver AI that is scalable, performant, and economically sustainable
+  by choosing the right mix of tooling, deployment model, and optimizations.
+
+  Key themes implied by the narrative: - Cost and control drive architecture choices
+  more than raw model capability. - Hybrid cloud + on‑prem is the pragmatic reality;
+  orchestration must adapt. - Open-source ecosystems accelerate feedback, tooling,
+  and business flexibility. - Efficient distributed training and communication optimizations
+  trump brute-force scaling. - Decentralization (privacy, local control, edge) is
+  often a matter of fit and trade-offs, not ideology. - Practical provisioning, automation,
+  and orchestration are the unsolved scaling problems for non–AI‑first organizations.'
+---
 Links:
 
 * [Twitter](https://twitter.com/andrey_cheptsov/){:target="_blank"}
 
@@ -1,7 +1,5 @@
 ---
-title: "Context: This episode follows Ivan Brigida’s path from finance to analytics and walks listeners step‑by‑step through the practical craft of retail algorithmic investing — covering data sources and quality, time‑series market formats, strategy ideas (like mean reversion), rigorous backtesting and walk‑forward validation, risk management and execution, feature engineering and model choice, explainability, deployment, and learning resources.
-
-Core: The unifying idea is that successful retail algorithmic trading is built like an engineering pipeline — start with clean, well‑understood data; define precise prediction targets; design simple, interpretable models and handcrafted features; validate performance with rigorous, leakage‑free backtests and walk‑forward simulations; embed strict risk controls and disciplined execution; and iterate toward partial automation and reproducible deployment while treating the whole process as a continuous learning project rather than a shortcut to quick profits."
+title: 'Algorithmic Trading with Python: Backtesting, Risk Management and Deployment'
 short: Stock Market Analysis with Python and Machine Learning
 season: 17
 episode: 3
@@ -16,13 +14,26 @@ links:
   apple: https://podcasts.apple.com/us/podcast/stock-market-analysis-with-python-and-machine/id1541710331?i=1000641465239
   spotify: https://open.spotify.com/episode/1ZXAeGr4Kx7F6oLQUip8Cc?si=KJwpYL-3SvuX8nPdc2cyOg
   youtube: https://www.youtube.com/watch?v=NThHAEIazFk
-
-description: 'Discover algorithmic trading & mean reversion: practical backtesting, data APIs, risk management, model choices and trade execution to boost strategy ROI.'
-intro: 'How do you build, backtest, and deploy a robust mean-reversion algorithm without falling prey to bad data or time‑series leakage? In this episode, Ivan Brigida — Analytics Lead and creator of PythonInvest — draws on 10+ years in business intelligence, econometrics, forecasting, machine learning and finance to answer that question. <br><br> We walk through practical steps for algorithmic trading: choosing retail-friendly data APIs (Yahoo, Quandl, Polygon), understanding market data formats like OHLCV and adjusted close, and cleaning for data quality. Ivan explains mean reversion strategy design, risk management fundamentals including stop‑loss and position sizing, and rigorous backtesting methods—covering time‑series leakage and walk‑forward simulation. He also breaks down prediction targets, feature engineering with time‑window statistics, and model choices from logistic regression to XGBoost and neural networks, plus approaches to explainability and evaluation metrics (ROI, precision, trading fees). Finally, deployment options (cron, Airflow, APIs) and learning resources from PythonInvest are discussed. <br><br> Listen to gain actionable guidance on backtesting, data sources, risk controls, and machine learning techniques to move a mean‑reversion idea toward a reproducible algorithmic trading workflow.'
+description: 'Master algorithmic trading: backtesting and risk management—learn practical
+  data sources, features, models & execution to build robust strategies.'
+intro: How do you turn a trading idea into a robust, risk‑managed algorithm in Python?
+  In this episode Ivan Brigida — analytics lead behind PythonInvest with 10+ years
+  in statistical modeling, forecasting, econometrics and finance — walks through practical
+  steps for algorithmic trading with Python, from data sourcing to deployment (and
+  a clear reminder this is educational, not investment advice). <br><br> We cover
+  where retail traders get market data (Yahoo, Quandl, Polygon), OHLCV and adjusted‑close
+  nuances, and a concrete mean‑reversion example. Ivan explains backtesting methodology,
+  common pitfalls like time‑series data leakage, and walk‑forward simulation for realistic
+  validation. He breaks down risk management (stop‑loss thresholds, position sizing),
+  execution and trading fees, plus evaluation metrics (ROI, precision) and defining
+  prediction targets (binary growth thresholds such as 5%). <br><br> On the modeling
+  side you’ll hear practical feature engineering (time‑window stats, handcrafted indicators),
+  model choices (logistic regression, XGBoost, neural nets), explainability via feature
+  importance, and deployment options (cron, Airflow, APIs, partial automation). Listen
+  to gain actionable guidance for building, validating, and deploying algorithmic
+  trading systems in Python.
 dateadded: 2024-01-24
-
 duration: PT01H40S
-
 quotableClips:
 - name: Podcast Introduction
   startOffset: 0
@@ -132,7 +143,6 @@ quotableClips:
   startOffset: 3696
   url: https://www.youtube.com/watch?v=NThHAEIazFk&t=3696
   endOffset: 3640
-
 transcript:
 - header: Podcast Introduction
 - header: 'Guest Introduction: Ivan Brigida — Analytics Lead & PythonInvest'
@@ -1134,8 +1144,21 @@ transcript:
   sec: 3735
   time: '1:02:15'
   who: Ivan
----
+context: 'Context: This episode follows Ivan Brigida’s path from finance to analytics
+  and walks listeners step‑by‑step through the practical craft of retail algorithmic
+  investing — covering data sources and quality, time‑series market formats, strategy
+  ideas (like mean reversion), rigorous backtesting and walk‑forward validation, risk
+  management and execution, feature engineering and model choice, explainability,
+  deployment, and learning resources.
 
+  Core: The unifying idea is that successful retail algorithmic trading is built like
+  an engineering pipeline — start with clean, well‑understood data; define precise
+  prediction targets; design simple, interpretable models and handcrafted features;
+  validate performance with rigorous, leakage‑free backtests and walk‑forward simulations;
+  embed strict risk controls and disciplined execution; and iterate toward partial
+  automation and reproducible deployment while treating the whole process as a continuous
+  learning project rather than a shortcut to quick profits.'
+---
 Links:
 
 * [Exploring Finance APIs](https://pythoninvest.com/long-read/exploring-finance-apis){:target="_blank"}