9.66 josh tenenbaum comp. cog.sci class + 9.s vikash<>josh #250

hyunjimoon · 2024-09-06T11:56:17Z

hyunjimoon
Sep 6, 2024
Maintainer

3 Course Description
An introduction to computational theories of human cognition, and the computational frameworks that could support human-like artificial intelligence (AI). Our central questions are: What is the form and content of people's knowledge of the world across different domains, and what are the principles that guide people in learning new knowledge and reasoning to reach decisions based on sparse, noisy data? We survey recent approaches to cognitive science and AI built on these principles:

World knowledge can be described using probabilistic generative models; perceiving, learning, reasoning and other cognitive processes can be understood as Bayesian inferences over these generative models.
To capture the flexibility and productivity of human cognition, generative models can be defined over richly structured symbolic systems such as graphs, grammars, predicate logics, and most generally probabilistic programs.
Inference in hierarchical models can explain how knowledge at multiple levels of abstraction is acquired.
Learning with adaptive data structures allows models to grow in complexity or change form in response to the observed data.
Approximate inference schemes based on sampling (Monte Carlo) and deep neural networks allow rich models to scale up efficiently, and may also explain some of the algorithmic and neural underpinnings of human thought.

We will introduce a range of modeling tools, including core methods from contemporary AI and Bayesian machine learning, as well as new approaches based on probabilistic programming languages. We will show how these methods can be applied to many aspects of cognition, including perception, concept learning and categorization, language understanding and acquisition, common-sense reasoning, decision-making and planning, theory of mind and social cognition. Lectures will focus on the intuitions behind these models and their applications to cognitive phenomena, rather than detailed mathematics. Recitations will fill in mathematical background and give hands-on modeling guidance in several probabilistic programming environments, including WebPPL and Gen.

4 Prequisites
(1) Basic probability and statistical inference as you would acquire in 9.014, 9.40, 18.05, 18.600, 6.008, 6.036, 6.041, or 6.042, or an equivalent class. If you have not taken one of these classes, please talk to the instructor after the first day. (2) Previous experience with programming, especially in Matlab, Python, Scheme or Javascript, which form the basis of the probabilistic programming environments we use. Also helpful would be previous exposure to core problems and methods in artificial intelligence, machine learning, or cognitive science.

5 Requirements and Grading (to be updated)
Participation: Attendance at lectures is required. Recitations are optional. Students must attend at least 80% of lectures, and not miss more than 2 in a row. Students failing to meet the attendance requirement will be penalized by one full letter grade (for example, the grade will go from an A- to a B-). Attendance will be taken using a Google form with a Question of the Day during each class. The only exception to this attendance requirement is by specific exemption from the S^3 Deans or DAS (Disability and Access Services).

Problem Sets [60%]: There will be 3 problem sets, along with an optional 4th problem set. The release and due dates of the problem sets are still being finalized, and this section will be updated in the future.

Final Project [40%]: this is a project-based course. You will submit a project proposal and a paper-style write up for a final course project. Projects can be done individually or in groups of 2-3, but proportionately more work is expected for a group project.

Late Policy: A late penalty of 5% per day will be applied to late problem sets, up to 1 week past the deadline. We can't accept work later than 1 week after the deadline except in extraordinary circumstances because doing so would hinder our ability to discuss solutions with students in a timely manner.

Collaboration Policy: Students are allowed to talk and work with others on problem sets, but the work they write up and hand in must be their own. Students should also indicate the names of anyone that they worked with or collaborated with.

The Institute obliges us to remind you of its policy on integrity. It can be found at the website http://web.mit.edu/academicintegrity/. Please read it if you have not already done so.

Consistent with MIT policy, there is no curve or pre-set grade distribution for this class.

Please let us know on an individual basis if you have a learning disability or other special concern you would like us to be aware of.

6 Topics

Specific techniques and topics in cognitive modeling to be covered this year will include some or all of the following:

Foundations of inductive learning: philosophical challenges and theories of how learning is possible.
Introduction to Bayesian inference and Bayesian concept learning.
Modeling human cognition as rational statistical inference: Bayes meets Marr’s levels of analysis. Case studies in modeling surface perception and predicting the future
Graphical models and Bayesian networks. Modeling how people learn and reason about simple causal structures.
Probabilistic programming languages: Generalizations of graphical models that can capture the core of common- sense reasoning. Modeling social evaluation and attribution, visual scene understanding and common-sense physical reasoning.
Sampling-based methods for approximate probabilistic inference: Markov chain Monte Carlo (MCMC) - Gibbs sampling, Metropolis-Hastings; Sequential Monte Carlo (particle filtering). Modeling the dynamics of binocular rivalry, online sentence processing, change detection and multiple object tracking.
Learning as Bayesian inference: parameter estimation and model selection; the Bayesian Occam’s razor. Modeling visual learning and classical conditioning in animals.
Hierarchical Bayesian models: a framework for learning to learn, transfer learning, and multitask learning. Mod- eling how children learn the meanings of words, and learn the basis for rapid (’one-shot’) learning. Building a machine that learns words like children do.
Probabilistic models for unsupervised clustering: Modeling human categorization and category discovery; prototype and exemplar categories; categorizing objects by relations and causal properties.
Nonparametric Bayesian models - capturing the long tail of an infinitely complex world: Dirichlet processes, adaptor grammars, fragment grammars. Models for morphology in language.
Planning with Markov Decision Processes (MDPs): Modeling single- and multi-agent decision-making. Modeling human ’theory of mind’ as inverse planning.
Modeling human cognitive development - how we get to be so smart: infants’ probabilistic reasoning and curiosity; how children learn about causality and number; the growth of intuitive theories.

W1
Sep.06
q1. does common sense evolve with sensory system ? i haven't seen tameren for the last five years and this'd affect my answer for his question on it.

importance of uncertainty quantification (saying "not sure")
cumulative learning on frostbite

tomfid · 2024-09-06T13:48:00Z

tomfid
Sep 6, 2024
Maintainer

I'm skeptical about "common sense". I think it's a catch-all term that is (over)used to invoke certainty or tradition where the speaker has none. I'm sure there's some underlying quantity that exists, but it's some kind of naive view of causality that applies only to simple situations. Real systems don't always suit common sense (example: https://metasd.com/2019/07/complexity-default-assumption/ ).

0 replies

hyunjimoon · 2024-09-20T14:36:46Z

hyunjimoon
Sep 20, 2024
Maintainer Author

W2: Foundations of Inductive Learning, Bayesian Inference, Bayesian Concept Learning

eight concepts on hypothesis, observation/example of concept, prior of hypothesis, likelihood of observations given hypothesis, posterior, size principle of likelihood (smaller hypothesis receives greater likelihood, more so as n increases), choice principle (natural over logic) of principle, hypothesis averaging for new observation (averaging prediction over all hypothesis weighted by posterior)

Concept	Description	Emoji
Hypothesis space (H)	Space of possible concepts	⬛️
Examples (X)	X = {x1, . . . , xn}: n examples of a concept C	⚫️
Prior	p(h): domain knowledge, pre-existing biases	🟩
Likelihood	p(X\|h): statistical information in examples	🔴
Posterior	p(h\|X): degree of belief that h is the true extension of C	🟦
Size principle	Smaller hypotheses receive greater likelihood, exponentially more so as n increases	📌
Choice principle	Choice of hypothesis space embodies a strong prior, favoring natural over logical possibilities	🌲>🎄
Hypothesis averaging	p(y ∈ C \| X): Probability that C applies to new object y, averaged over all h weighted by p(h\|X)	⭐️
Conservation of belief	Total probability must sum to 1; increased belief in some hypotheses necessitates decreased belief in others	⚖️

2 replies

hyunjimoon Sep 21, 2024
Maintainer Author

Level	Name	Description
1	Computational theory	- What are the inputs and outputs to the computation? - What is its goal? - What is the logic by which it is carried out?
2	Representation and algorithm	- How is information represented? - How is information processed to achieve the computational goal?
3	Hardware implementation	- How is the computation realized in physical or biological hardware?

This table clearly organizes the three levels proposed by Marr for reverse-engineering a cognitive system

hyunjimoon Oct 5, 2024
Maintainer Author

applying marr's three levels for my project:

Level	Description	SAFE Optimization Context
Computational	Defines the problem and goal in terms of optimal behavior	Maximize expected utility: E[utility(i, c, M, s, f)]
		Where utility = founder's share × price per share
		Modeled using Bayesian decision theory
Algorithmic	Specifies the process for achieving the computational-level goal	Conversational inference process:
		1. Perceiving: Break down SAFE terms (investment, valuation cap)
		2. Probabilistic Reasoning: Model scenarios till Series A
		3. Planning: Refine terms based on ownership and dilution calculations
Implementation	Describes how the algorithm is realized in practice	Psychological factors:
		- Cognitive biases (e.g., anchoring, overconfidence)
		- Communication strategies in negotiations
		Technical infrastructure:
		- Probabilistic programming platforms
		- Domain-Specific Language for financial modeling

This table illustrates how the rational-resource-strategy approach is applied across the three levels:

Computational Level: At this level, we define the core problem of maximizing expected utility for both founders and investors. This is expressed mathematically and incorporates Bayesian decision theory to handle uncertainty in future valuations.
Algorithmic Level: This level describes the process of conversational inference, which involves iteratively perceiving the situation, using probabilistic reasoning to evaluate options, and planning actions based on this analysis. In the SAFE context, this involves breaking down terms, modeling various scenarios, and refining the offer based on calculated outcomes.
Implementation Level: At this level, we consider both the psychological factors that influence decision-making (such as cognitive biases and negotiation strategies) and the technical infrastructure required to support the analysis (like probabilistic programming platforms and specialized modeling languages).

hyunjimoon · 2024-09-24T02:21:50Z

hyunjimoon
Sep 24, 2024
Maintainer Author

pset

pset1

pset1 cld

Problem	Summary	Toolkits 🛠️
1A(i)	Collect and attach 4 datasets with modified cover stories	1, 2
1A(ii)	Analyze systematic differences in ratings between conditions	1, 2
1A(iii)	Compare differences to expectations	1, 2
1B(i)	Plot transformed model predictions vs human data, report correlation	1, 4
1B(ii)	Find best settings for parameters a and b	1, 4
1B(iii)	Compare and contrast human and model judgments qualitatively	1, 2, 4
1C(i)	Analyze effect of varying P(H1) on model predictions with plots	1, 2, 4
1C(ii)	Find best P(H1) value for each cover story condition	1, 2, 4
1D	Identify limitations of hypothesis space and provide examples	1, 2
2A	Manually compute posterior probabilities for two hypotheses	1
2B	Manually compute probability of concept containing number 40	1, 2
2C	Implement log likelihood function for given dataset and hypothesis	1, 4
2D(i)	Generate plots for given datasets and analyze	1, 4
2D(ii)	Generate sequential plots to show how new data changes distribution	1, 4
2D(iii)	Experiment with alternative prior settings and explain effects	1, 2, 4
2D(iv)	Determine which prior settings best capture human data	1, 2, 4
2E(i)	Explain how Marr's levels apply to the number game	1, 2, 6
2E(ii)	Discuss aspects of human concept learning captured by the Number Game	1, 2, 3
2E(iii)	Evaluate ecological relevance of the number game task	1, 2
2E(iv)	Discuss origin and potential differences in hypothesis spaces across participants	1, 2, 3
3	Recapitulate computations from Optimal Predictions in Everyday Cognition paper	1, 2, 4, 5

Toolkits:

How does abstract knowledge guide learning and inference from sparse data?
What form does that knowledge take, across different domains and tasks?
How is that knowledge itself constructed?
How can learning and inference proceed efficiently and accurately, even with very complex hypothesis spaces?
How can probabilistic inferences be used to drive action?
How could these computations be implemented in neural hardware, or massively parallel computing machines?

1 reply

hyunjimoon Oct 26, 2024
Maintainer Author

pset2

Criterion	Important Code	meaning
Problem 1a	`var dist = Infer({method:'enumerate'}, model)` `print("P(Fair=True\| Flip1=Tails, Flip2=Tails) = " + Math.exp(dist.score(true)))`
Problem 1b	`var dist = Infer({method:'enumerate'}, makeModel(function(fair, flip1, flip2) {` `condition(flip1 != flip2)` `fair` `}))`
Problem 1c	`var dist= Infer({method:'enumerate'}, makeModel(function(fair, flips) {` `condition(flips(1) == 'T' && flips(2) == 'T' && flips(3) == 'T' && flips(4) == 'T')` `flips(5)` `}))`
Problem 2a P(Studies \| passed) =0.67	`var dist = Infer({method:'enumerate'}, makeModel(function(studier, examEasy, pass) {` `condition(pass("student", "exam"))` `studier("student")` `}))`	infer agent's state given interaction's state (function of agent and world i.e. passing exam)
Problem 2a P(Hard \| passed and studied) = 0.16	`var dist = Infer({method:'enumerate'}, makeModel(function(studier, examEasy, pass) {` `condition(pass("student", "exam"))` `condition(studier("student"))` `!examEasy("exam")` `}))`	infer world's state, given the state of agent and interaction
Problem 2b - P(S1 studies\|S1 failed E1 and E2) =.02 - P(E1 is easy∣S1 failed E1 and E2) =.7	var dist = Infer({method:'enumerate'}, makeModel(function(studier, examEasy, pass) { condition(!pass("student", "exam1"))🧍‍♀️↔️🌏 condition(!pass("student", "exam2"))🧍‍♀️↔️🌏 return {studentStudied:studier("student"), exam1Easy:examEasy("exam1")} })) var pStudied = (Math.exp(dist.score({"studentStudied":true,"exam1Easy":true})) + Math.exp(dist.score({"studentStudied":true,"exam1Easy":false})))🧍‍♀️ var pExam1Easy = (Math.exp(dist.score({"studentStudied":true,"exam1Easy":true})) + Math.exp(dist.score({"studentStudied":false,"exam1Easy":true})))🌏	infer current 🧍‍♀️agent's state, given previous 🧍‍♀️↔️🌏interaction states by marginalizing out the 🌏world state (easy, hard) infer current 🌏world's state, given previous🧍‍♀️↔️🌏interaction states by marginalizing out the 🧍‍♀️agent state (study, nostudy)
Problem 2c 🧍‍♀️P(S1 studied \| S1, S2, S3 failed both exams) = 0.04	var dist = Infer({method:'enumerate'}, makeModel(function(studier, examEasy, pass) { condition(!pass("student1", "exam1")) 🧍‍♀️↔️🌏 condition(!pass("student1", "exam2"))🧍‍♀️↔️🌏 condition(!pass("student2", "exam1"))🧍‍♂️↔️🌏 condition(!pass("student2", "exam2"))🧍‍♂️↔️🌏 condition(!pass("student3", "exam1"))🧍‍♀️↔️🌏 condition(!pass("student3", "exam2"))🧍‍♀️↔️🌏 return {studentStudier:studier("student1"), exam1Easy:examEasy("exam1")} }))	infering agent1's state, given state of self's past interaction and the others'(agent2,3) interaction
Problem 2c 🌏P(E1 is Easy \| S1, S2, S3 failed E1 and E2) = .462	(Same code as above)	infer state of the world, given observation on three interactions of agents and world
Problem 2c (Explanation for changes)	N/A (Explanation provided in text)	from (b), the probability of student who failed both exams drops (from .5 to .02) as it is likely the test is easy so the only explanation that for repeated fail is if this student hasn't studied. however, this probability rises a littles in (c) (from .02 to .04) upon additional observation of sequential failures by other students. Rejecting easy exam sample more often (compared to only one student failing twice), has the effect of lowering the expectation of hyperprior: indicator for easy exam.
Problem 2d 🌏P(E1 is Easy \| S1, S2, S3 failed exams 1, 2; S2, S3 failed exams 3, 4)=.474	`var dist = Infer({method:'enumerate'}, makeModel(function(studier, examEasy, pass) {` `condition(!pass("student1", "exam1"))` `condition(!pass("student1", "exam2"))` `condition(!pass("student2", "exam1"))` `condition(!pass("student2", "exam2"))` `condition(!pass("student2", "exam3"))` `condition(!pass("student2", "exam4"))` `condition(!pass("student3", "exam1"))` `condition(!pass("student4", "exam2"))` `condition(!pass("student3", "exam3"))` `condition(!pass("student4", "exam4"))` `return {studentStudier:studier("student1"), exam1Easy:examEasy("exam1")}` `}))`
Problem 2d 🧍‍♀️P(S1 studied\| S1, S2, S3 failed exams 1, 2; S2, S3 failed exams 3, 4)=.04	(Same code as above)
Problem 2d (Explanation for changes)	N/A (Explanation provided in text)	With additional observation on students and exam interaction, how does inference on each state changes. Compared with (c), environment state P(E1 easy) increases little .46 to.47 as the continued failure of S23 shifts main cause of failure from the difficulty of exam to students not studying. Agent's state P(S1 studied) didn't change much as new observation is only about S23 (not S1). Information flows, but it's too far to make meaningful change.
Problem 2e P(E1 is Easy \| S2, S3 failed all; S1 passed E3 and E4) = 0.47	var dist = Infer({method:'enumerate'}, makeModel(function(studied, examEasy, pass) { condition(!pass("student1", "exam1")) condition(!pass("student1", "exam2")) condition(!pass("student2", "exam1")) condition(!pass("student2", "exam2")) condition(!pass("student2", "exam3")) condition(!pass("student2", "exam4")) condition(!pass("student3", "exam1")) condition(!pass("student4", "exam2")) condition(!pass("student3", "exam3")) condition(!pass("student4", "exam4")) return {studentStudied:studied("student1"), exam1Easy:examEasy("exam1")} }))
Problem 2e (P(S1 is a studier \| S2, S3 failed all; S1 passed E3 and E4)) = .042	var pStudied = (Math.exp(dist.score({"studentStudied":true,"exam1Easy":true})) + Math.exp(dist.score({"studentStudied":true,"exam1Easy":false})))
Problem 2e (Explanation for changes) 🌏0.438, 🧍‍♀️0.18	N/A (Explanation provided in text)	Surprisingly observation on passing the test gives more credit to students studying, dramatically increasing P(S1 studied) from .042 to .18 P(S1 studies). This increase has two sources: 1) direct effect from S1 passing E34 and, 2) knowing S23 implying E34 might be hard. This affects decrease in P(E1 easy) as knowing S1 failed E1, increase of the probability of S1 studying decreases the probability of E1 easy.![[export.txt]]![[export 1.txt]]
Problem 2f (Human subject answer for b-e)	N/A (Human subject responses provided in text)
Problem 2g (b-e with new priors)	`var makeModel = function(f) {return function() {` `var studier = mem(function(person) {` `flip(0.1)` `})` `var examEasy = mem(function(exam) {` `flip(0.5)` `})`
Problem 2h (P(Professor is lenient \| ...))	var pass = function(student, exam, prof, course){ return flip(examEasy(exam, prof) ? // Added prof parameter (studied(student, course) ? .95 : .5) : // Added course parameter (studied(student, course) ? .75 : .2)) }	exam difficulty (env.state) is translated to agent state (lenient) + class difficulty is added
Problem 2i (Student extension)	var makeModel = function(f) {return function() { //Is the exam easy? var examEasy = mem(function(exam) { flip(0.5) }) //How good is the student? var goodStudent = mem(function(student){ sample(Uniform({a:0,b:5})) }) //How prepared do they feel for the exam? var preparedness = mem(function(student, exam){ var beta = 5 - goodStudent(student) sample(Beta({a:5, b:beta})) }) //Did they pass? var pass = function(student, exam){ return flip(examEasy(exam) ? (flip(preparedness(student, exam)) ? 0.9 : 0.6) : (flip(preparedness(student, exam)) ? 0.7 : 0.2)) } f(examEasy, goodStudent, preparedness, pass) }}	![[Pasted image 20241022000532.png\|200]]
		![[Pasted image 20241021220719.png]]

code in [[export.txt]]

hyunjimoon · 2024-09-29T15:19:50Z

hyunjimoon
Sep 29, 2024
Maintainer Author

2 replies

hyunjimoon Sep 29, 2024
Maintainer Author

with comparing rule-based and bayesian learning approaches cld

concept learning from Tenenbaum - Bayesian modeling of human concept learning.pdf

The paper discusses three main approaches to concept learning:

MIN algorithm (rule-based)
Weak Bayes (similarity-based)
Full Bayesian model (combines aspects of both)

Comparison of evolution vs Bayes learning:

Bayesian methods: Focus on statistical reasoning, probabilistic inference, and optimization
Evolutionary methods: Emphasize adaptation, experimentation, and action-oriented approaches

Similarities between approaches:

Rule-based learning (MIN algorithm) appears more similar to Bayesian learning:

Both focus on finding specific, optimal solutions
Emphasize careful analysis and planning
Aim to converge on the most accurate or precise concept/solution

Similarity-based learning (Weak Bayes) seems more aligned with evolutionary learning:

Both allow for broader, more flexible generalizations
Emphasize adaptation and learning through experience
More action-oriented, allowing for exploration of multiple possibilities

The full Bayesian model presented in the paper interestingly combines aspects of both rule-based and similarity-based approaches, potentially bridging the gap between Bayesian and evolutionary methods in a way similar to the "theory-based view" mentioned in the comparison.

Comparison table:

Aspect	Rule-based / Bayesian	Similarity-based / Evolutionary
Focus	Optimization, precise solutions	Adaptation, flexible solutions
Decision-making	Careful analysis, planning	Experimentation, action-oriented
Uncertainty handling	Probabilistic inference	Trial and error, learning from experience
Generalization	Narrow, specific	Broad, flexible
Learning process	Convergence to optimal solution	Exploration of multiple possibilities
Strength in entrepreneurship	Detailed analysis, risk assessment	Quick adaptation, innovation in dynamic environments
Example algorithm	MIN algorithm	Weak Bayes
Computational approach	Bayesian inference	Evolutionary algorithms

This table highlights the key differences between the two grouped concepts, showing how rule-based/Bayesian approaches differ from similarity-based/evolutionary approaches in various aspects of learning and decision-making. It's important to note that these groupings are not absolute, and there can be overlap or combinations of these approaches, as demonstrated by the full Bayesian model in the concept learning paper and the "theory-based view" mentioned in the evolutionary vs Bayesian learning comparison.

building on the 🧠bayes vs 🤜🧬evol developed in #242 and more on 🤜🧬evol in #100

hyunjimoon Oct 26, 2024
Maintainer Author

using 🦇bat_inspection cld
todo1: unanswered question on naturally low precision's efffect on hypothesis reasoning
👁️ (L) = 💡 (I) * 🔢 (R)
⬛️ (Hypothesis space): 💡 (I) values
⚫️ (Examples): 👁️ (L) values observed
🟩 (Prior): Weak preference for lower 💡 (I)
🔴 (Likelihood): Higher for smaller 💡 (I) range (📌 Size principle)
🟦 (Posterior): Inference about 💡 (I) given 👁️ (L)
⭐️ (Hypothesis averaging): Estimating 🔢 (R) based on inferred 💡 (I)
Question: How would the inference process change if your physical perception had lower precision, given that the size principle relies on discriminating between different luminance values? 🤔👁️📏

todo2: connect [inspection paradox] with [gott's rule].

hyunjimoon · 2024-10-10T15:33:11Z

hyunjimoon
Oct 10, 2024
Maintainer Author

Week 6: Graphical Models and Probabilistic Programming

best lecture so far as i unpacked my attraction to probabilistic program as universality, composability, productivity to be unpacked). josh explained how probabilistic program improves over bayes net i.e. directed acyclic graph.

using how josh persuades prob.prog is better (finer) than graphical models, i can share suggestions on value lab's graphical representation of belief on node and arc of value hypothesis here.

lec slide
Oct 8 2024.pdf
my pick:

lecture transcript 🗣️. personal highlight is Angie-Josh QnA!

Angie: The graph you drew from hypothesis to data (H->D) was very helpful to understand number game as bayes net
Josh: Yes
Angie: Could you walk me through how the size principle, choice principle, and conservation of belief can be interpreted in this bayesnet examples like QMR?
Josh: Not right now. But those are ways to think about the internal structure of the graph. I use H -> D graph to tell you about lazy networks. It doesn't really tell me very much about models, because all the interesting stuff is hidden inside the node. so the size principle in those things is like ways of making sense of what goes on when you have this really thick node, which has lots of different possible hypotheses.

Angie: I'll be expecting more from you later 😄
Josh: That's a fair thing to expect. Maybe another way to put it is that, I think it's part of why we're interested in holistic programs, rather than just a graphical model, is because in the bayesnet picture for the number game, there's no interesting thing. The node is just as fine grained as we get. There's no interesting structure inside the node just has some possible balance. To have a probabilistic program is so that we have much more fine grained structure. And just writing down a node that corresponds to h, we can actually write down what's inside of that, which might be a program that generates programs, for example, and then all the interesting structure in there can be used to evaluate the things that the size principle is telling.

Angie: Would you say probabilistic program is more dynamic compared to the Bayes net?
Josh: Well, in this case, it's just more fine grained, right? But in this class, especially for lecture purposes, I'll often put up these graphs that are basically Bayes nets, which really thick, arrows and nodes, and then we'll go inside. Then you will know what's actually going on inside.

size principle: Smaller hypotheses receive greater likelihood, exponentially more so as example n increases
choice principle: Choice of hypothesis space embodies a strong prior, favoring natural over logical possibilities

question for today was

what is joshua's favorite programming language? A: ⛪️church

3 replies

hyunjimoon Oct 10, 2024
Maintainer Author

mccoy:
gen returns both sample and density (unlike stan). simulate, assessboth takes gen_ftn. assess also gets choice_map. they each returns sample and weight and if you sum(map()) assess you get 1 (for discrete variables).
we discussed how genjax framework which can handle

ced:
shared belief of world and utility from weak2strong (15 computational rationality (=bayes))

social cognition probmod
https://agentmodels.org/chapters/3-agents-as-programs.html

hyunjimoon Oct 11, 2024
Maintainer Author

https://github.com/probcomp/programmable-vi-pldi-2024?tab=readme-ov-file#overview

https://www.sciencedirect.com/science/article/pii/S1364661316300535?via%3Dihub

hyunjimoon Oct 11, 2024
Maintainer Author

Generative Model
Utility Estimation. If A is the set of actions that an agent can take (e.g., take a step forwards, pickup an object, etc.), and S is the set of possible states of the world (determining, for instance, the agent's position in space or her possessions), then a cost function is a mapping C : A → ℜ + from actions to costs, and a reward function is a mapping R : S → ℜ + from states to rewards.
A plan (or policy) π : S → A determines what the agent will do in each state to arrive at her goal, or final state, sf, from her initial state s0. Given cost and reward functions C and R and a set P of plans, a utility function UC,R : P→ ℜ assigns a utility to each plan π. In deterministic situations, this utility is the sum of the rewards the agent obtains minus the costs she incurs:
[I]
where s0 is the starting state, sf is the target state, si are the intermediate states the agent travels through, and π(si) is the planned action in each of these states.
Plan Selection. Because agents’ estimates are noisy, they sometimes fail to select the best possible plan. This is modeled through a Boltzmann policy, where the probability of selecting a plan is proportional to:
[II]
where κ ∈ (0, ∞) determines the noise in the agent's choice. The smaller the value of κ, the more likely the agent will select high-utility plans.
Inference
Given an agent's actions, the unobservable cost and reward functions can be inferred using Bayes’ rule:
[III]
Here,
is the prior probability over cost and reward functions, capturing constraints and expectations, and p(Actions|C, R) is the likelihood that the agent would take the observed actions given the cost and reward functions. This likelihood term is computed by running the generative model to calculate the probability of the agent selecting each plan, multiplied by the probability that each plan produces the observed actions and integrated over all possible plans the agent can consider:
[IV]
The Bayesian cost and reward inferences specified by Equation III are illustrated in Figure I, using an example stimulus from an experiment designed to test the quantitative predictions of this model with adults (who saw a large number of similar stimuli, parametrically varying the agent's path and the configuration of objects and terrain types in the environment).

hyunjimoon · 2024-10-18T14:59:04Z

hyunjimoon
Oct 18, 2024
Maintainer Author

W8
explaining away (posterior coupling)

limitations of using neural network for inference

Explaining away is a phenomenon in probabilistic reasoning that involves complex information flow and posterior coupling between variables in a network. It creates correlations between previously independent causes when observing their common effects or related information. This concept extends beyond simple causal structures to encompass broader patterns of inference and information propagation throughout probabilistic models.

Examples:

Disease diagnosis: Two rare diseases causing a common symptom. Observing the symptom increases the probability of both diseases, but confirming one disease "explains away" the other, reducing its probability.
Visual perception: In ambiguous images, interpreting one part of the scene can influence the perception of other parts through explaining away. For instance, in a 3D scene with shadows and reflections, interpreting an area as a shadow might explain away the possibility of it being a reflection, affecting the overall scene interpretation.
Natural language processing: In speech recognition, multiple word hypotheses might initially be plausible for an ambiguous sound. As more context is processed, some word interpretations may explain away others, leading to a coherent sentence interpretation.
Sensor fusion: In robotics, when integrating data from multiple sensors (e.g., camera, lidar, GPS), explaining away can help resolve conflicts. If one sensor's reading explains an observed phenomenon, it may reduce the influence of conflicting data from other sensors.

0 replies

hyunjimoon · 2024-10-26T13:48:53Z

hyunjimoon
Oct 26, 2024
Maintainer Author

W9. Probabilistic language of thought (meaning function) and MCMC as intuitive learning mechanism

meaning function allows purposeful knowledge projection to hyperplane that matters and implementable (i.e. have capability express any points in hyperplane with resources).

todo: make collage of MCMC (TAC, three proposal comparison (indistinguishable after crossing) slides)

using optimal fit through sequential monte carlo cld, i drew analogies between the proposal in the inference algorithm and how it might apply in the equity valuation negotiation situation in the context of #249. table shows how each inference technique maps to specific aspects of the equity valuation negotiation process, with particular attention to the objective function x*(i,c) and the various states and constraints specified in the problem setup.
Through shakul_stratified adaptive annealing sampling_otter_ai.txt, two techniques emerged as particularly powerful for equity valuation negotiations: Stratified Sampling and Adaptive Proposals. These techniques stand out because they address fundamental challenges in startup negotiations - the need to efficiently explore a complex, non-convex negotiation space (Stratified Sampling) while dynamically responding to new information and changing beliefs about market conditions (Adaptive Proposals). The combination allows for both systematic exploration 🗺️ and intelligent adaptation 🧠, mirroring how successful negotiations typically progress from establishing broad boundaries to making nuanced adjustments based on learned preferences and market insights.

Technique	Core Concept	Negotiation Application	Example
Stratified Sampling 🎯	Systematic exploration of different zones	Start with extremes to define boundaries, then methodically explore key terms	Systematically explore combinations across four quadrants 🔍: (i<0.5M, c<5M), (i<0.5M, c>5M), (i>0.5M, c<5M), (i>0.5M, c>5M)
Adaptive Proposals 🔄	Dynamic adjustment based on current state	Modify proposals based on learned information about market beliefs	When shared belief about market growth is high (M>20%) 📈, propose higher valuation caps (c) with larger steps; when low (M<10%) 📉, propose more conservative values with smaller steps

The remaining techniques, while useful, play supporting roles in the negotiation process:

Importance Sampling 🎯: Helps prioritize proposals based on market conditions. For instance, in growth markets 📊, you might focus more on proposals that could yield 15x returns versus the standard 10x requirement. This is particularly useful when market conditions strongly influence which deals are likely to succeed.
Particle Rejuvenation 🔄:Acts as an escape mechanism in two scenarios. When negotiations are stuck in a deadlock (both parties uncomfortable) 🚫, introducing slight variations can help find new acceptable ground. For example, if neither party accepts $1M at $5M cap, trying $0.9M at $5.2M cap might break the impasse.
When negotiations seem too comfortable (potential local optimum) ⛰️, perturbing terms can reveal better possibilities. For example, if both parties quickly settled for $0.8M at $6M cap, trying $0.85M at $6.5M cap might uncover more optimal terms for both parties.
Annealed Importance Sampling 📊: Provides a framework for gradually increasing complexity in negotiations. You might start with basic valuation metrics 1️⃣, then progressively incorporate future market scenarios 2️⃣ and additional constraints 3️⃣. This matches how real negotiations often begin with core terms before addressing more complex provisions.

0 replies

hyunjimoon · 2024-11-05T19:24:43Z

hyunjimoon
Nov 5, 2024
Maintainer Author

W10
asymmetry between stable and unstable

0 replies

hyunjimoon · 2024-11-27T14:20:51Z

hyunjimoon
Nov 27, 2024
Maintainer Author

W11

0 replies

hyunjimoon · 2024-11-27T14:21:06Z

hyunjimoon
Nov 27, 2024
Maintainer Author

W12 hierarchical bayes

Concept	Explanation	Example
Model Selection Trade-off	Balance between model simplicity and fit to data	Polynomial curve fitting: 1st order (too simple), 2nd order (good fit), 12th order (too complex)
Marginal Likelihood	p(D	M) = ∫p(D
Prior Structure	Hierarchical organization of hypothesis space with simpler models having fewer parameters	Tree structure: linear → quadratic → cubic polynomials
Just Noticeable Difference	Minimum parameter change that produces meaningfully different predictions	Small changes to polynomial coefficients that alter fit within Gaussian noise tolerance
Functional Form Sensitivity	Complexity depends on how model structure affects predictions, not just parameter count	Even-order polynomials (U-shaped) vs odd-order polynomials (S-shaped)
Gaussian Conjugacy	Using Gaussian priors and likelihoods allows closed-form integration	Polynomial coefficients with Gaussian prior, Gaussian noise model
Suspicious Coincidences	When simpler model explains correlated errors better than complex model with random errors	Quadratic capturing systematic deviations from linear fit
Objective vs Subjective	Framework provides objective measure of simplicity through probability, but prior structure remains subjective choice	Choice of polynomial basis vs other function classes

Nov 26 2024.pdf

0 replies

hyunjimoon · 2025-01-26T01:40:48Z

hyunjimoon
Jan 26, 2025
Maintainer Author

I'm preparing a mail to josh after reading andrew's "Why forecast an election that’s too close to call? Predictive models don’t make the news, but they have a crucial role in democracy.", regarding josh's comment that if the p is closer to .5 (close call), sampling is a waste (just decide) from one and done paper.

my argument is, the absence of modeler in one and done model. Modeling capability stock (which I believe is the key spirit of Bayesian workflow (centering modeler (process) instead of model (product))

andrew's "This predictability affects how politicians and journalists think about elections, the economy and the balance between parties" seem very related with josh's action understanding
The Naïve Utility Calculus as a unified, quantitative framework for action understanding - ScienceDirect.pdf

Feature	Inverse Decision-Making	Inverse Planning
Core Focus	Preference inference	Action understanding
Decision Structure	Discrete, isolated choices	Complex spatiotemporal actions
Cost Consideration	No cost analysis	Fixed, uniform costs
Temporal Scope	Single-moment decisions	Extended sequences
Agent Variability	Can vary by preference	Assumes uniform agents
Methodology	Utility maximization	Markov Decision Processes
Goal Analysis	Single-goal oriented	Multiple goals possible
Complexity Level	Simpler, discrete	More complex, continuous

0 replies

hyunjimoon · 2025-02-12T11:55:25Z

hyunjimoon
Feb 12, 2025
Maintainer Author

summary of the thirteen special topics to follow below

Module	Topic	Rational AI Principle	How	Contrast	Example
1	Scaling behavior of intelligence vs machine learning	📐Shallow to Deep	Define broad model space with structured ignorance priors, allowing models to evolve incrementally as evidence accumulates	Traditional ML fixes model architecture and optimizes parameters; shallow-to-deep allows structure itself to adapt based on evidence	Time-series forecasting with a dynamic DSL. We specify a generative model (e.g. linear trend, seasonal/periodic components, change-point jumps) with latent hyperparameters for amplitudes, frequencies, etc. The system samples which model components to include, plus their parameters, instead of committing to a single fixed architecture. Then, using particle-based sampling (e.g. SMC), it automatically refines and prunes these structural choices as new observations arrive. In airline-traffic demo, this approach successfully adapts to the sudden COVID drop, while a transformer or LSTM—pre-trained on stationary patterns—fails to track the abrupt change. The code can be implemented in Gen/GenJAX by writing a short “kernel DSL” for time-series structure, assigning a broad prior over kernels, and applying sequential Monte Carlo so that the model grows in complexity only if/when the data demand it.
2	Perception and navigation	👁️See Flowing Mass	Visualize probability distribution through entire parameter space using particle-based methods with adaptive computation	Deep learning provides point estimates without uncertainty; flowing mass reveals full distribution of possibilities	2D Robot Localization with noisy sensors. Gen/GenJAX generative model of a robot’s motion (uncertain rotation & translation) and sensor (noisy distance readings to walls). Instead of a single “best guess,” it *maintains a distribution* of possible poses** using Sequential Monte Carlo. The “mass” of particles flows from one area of the map to another when sensor data contradict the current pose. A single-point estimate (like Tesla’s autopilot) can fail catastrophically if, e.g., the map alignment is off by one room; but a particle-based inference method automatically rejuvenates proposals and corrects itself when local mismatches accumulate. The code is straightforward: define a pose+motion generative function, a sensor-likelihood function, then run “resample‐move” SMC so multiple pose hypotheses are re-weighted each step, bridging uncertainty in real time.
3	Foundations of modeling and inference	🪒Auto Occam's Razor	Sample from hierarchical models with latent hyperparameters rather than optimize fixed models	Explicit regularization requires manual tuning; sampling naturally favors typical solutions	Polynomial regression with outlier detection. Assign a prior over both the degree of the polynomial and outlier/noise parameters; then use posterior sampling to infer model complexity. A naive optimizer might pick a high-degree polynomial that overfits, whereas the Bayesian sampler automatically leans toward simpler polynomials unless evidence strongly demands complexity, thus embodying “Occam’s razor” without manually fiddling with penalty terms.
4	Why automate math Automatic differentiation of expected values Probabilistic programming with stochastic probabilities	🧩Compose To Simplify	Implement higher-order operations (gradient, expectation, Radon-Nikodym derivative) as composable program transformations, allowing complex inference algorithms to emerge from simple building blocks	Traditional implementations require specialized derivations for each model; composable transformations enable automatic generation of efficient estimators for any well-formed program	Stochastic gradient descent with language models. We represent both the language model and a reward function as probabilistic programs. The objective is to maximize the expected reward of generated text. Traditional approaches require complex, model-specific gradient estimators with manual derivations spanning multiple pages. Using our approach, gradient estimation is automated by applying composable transformations: (1) transform sample-based language model into expectation-operator form, (2) apply automatic differentiation to this form, and (3) generate unbiased gradient estimators with user-selectable variance-cost tradeoffs. This enables rapid prototyping of RLHF pipelines that would otherwise require specialized implementations, and can yield up to 10x faster training compared to handcrafted estimators by automatically exploring the space of possible gradient estimation strategies.
5	Neural network models of visual perception
6	Learning probabilistic programs
7	Theory-of-mind via inference
8	Language model probabilistic programming
9	Neurally mappable implementations
10	Research methods
11-12	Research frontiers
13	Project CHI

📐shallow to deep: structured ignorance priors enable robust inference by starting deliberately broad and refining incrementally as evidence accumulates. this approach maintains flexibility against unexpected cases while leveraging structure when appropriate, yielding systems that scale better than those with premature specificity.

👁️see flowing mass: probabilistic programming visualization reveals how probability distributes across hypothesis space rather than fixating on point estimates. this perceptual skill transforms inference from opaque guesswork to transparent reasoning, allowing rational uncertainty quantification that drives adaptive computation and robust decision-making.

🪒auto occam's razor: hierarchical model encodes uncertain beliefs and sampling navigates that uncertainty representation. together, they form consistent algorithm for probabilistic inference which behaves rationally compared to those violating consistency hence making predictably irrational decisions.

🧩compose to simplify: higher-order probabilistic operations automate mathematical reasoning by transforming sampling-based programs into expectation-operator form. this systematic approach allows gradient estimators and density evaluations to emerge from primitive transformations, replacing pages of derivations with composable building blocks that maintain unbiasedness guarantees while enabling exploration of variance-performance tradeoffs across inference algorithms.

1 reply

hyunjimoon Feb 17, 2025
Maintainer Author

.

hyunjimoon · 2025-02-26T14:30:29Z

hyunjimoon
Feb 26, 2025
Maintainer Author

1. scaling behavior of intelligence vs machine learning

rational ai principle

📐shallow to deep: structured ignorance priors enable robust inference by starting deliberately broad and refining incrementally as evidence accumulates. this approach maintains flexibility against unexpected cases while leveraging structure when appropriate, yielding systems that scale better than those with premature specificity.

applying principle 1

Module	Topic	Rational AI Principle	How	Contrast	Example
1	Scaling behavior of intelligence vs machine learning	📐Shallow to Deep	Define broad model space with structured ignorance priors, allowing models to evolve incrementally as evidence accumulates	Traditional ML fixes model architecture and optimizes parameters; shallow-to-deep allows structure itself to adapt based on evidence	Time-series forecasting with a dynamic DSL. We specify a generative model (e.g. linear trend, seasonal/periodic components, change-point jumps) with latent hyperparameters for amplitudes, frequencies, etc. The system samples which model components to include, plus their parameters, instead of committing to a single fixed architecture. Then, using particle-based sampling (e.g. SMC), it automatically refines and prunes these structural choices as new observations arrive. In airline-traffic demo, this approach successfully adapts to the sudden COVID drop, while a transformer or LSTM—pre-trained on stationary patterns—fails to track the abrupt change. The code can be implemented in Gen/GenJAX by writing a short “kernel DSL” for time-series structure, assigning a broad prior over kernels, and applying sequential Monte Carlo so that the model grows in complexity only if/when the data demand it.

📐shallow to deep: structured ignorance priors enable robust inference by starting deliberately broad and refining incrementally as evidence accumulates. this approach maintains flexibility against unexpected cases while leveraging structure when appropriate, yielding systems that scale better than those with premature specificity.

🧱extending 🛠️selling probabilistic program to implement Entrepreneurship #224
🗣️vikash seminar.txt

1 reply

hyunjimoon Feb 26, 2025
Maintainer Author

more on 📐shallow to deep

shallow as using "broad ignorance priors" that deliberately avoid making strong assumptions, allowing systems to be more robust. For example, in 3D perception, rather than using detailed CAD models (deep), they use simple voxelized approximations (shallow). The key insight is that "if you put in rationally uncertain knowledge, then you will scale better than if you didn't." The system progressively moves from shallow to deep understanding by maintaining flexibility to accommodate exceptions (like floating objects) while leveraging structure when it exists (like object contact).
infer the structure of the world (rather than quantified uncertainty of known structure in science)
"broad structure then - get a sense of what the uncertainty is in the model structure"
" it's perceptual failure and value function estimation. mathematically, what's going on is there's a gap between what the optimizer is able to achieve and the actual minimax solution to the game. In that gap, that's where the camouflage lives. it's important at the starting point to recognize, if we're interested in scaling intelligent systems or understanding natural intelligence, we need something that scales very differently in terms of data efficiency, compute efficiency and robustness from machine learning, spinning today's machine learning. So in the seminar, one of the approaches we'll be exploring in more detail is really just to bring back the idea that there's a world and we have to make decisions about it. these themes of perception and reasoning, which were central to multiple decades of AI research."
"someone was asking, How broad is the prior? Here we're just showing that the prior is quite broad. I'll point out a interesting project direction for people who have kind of a theoretical that is that the prior is still paying as the trees get larger. So there's some very fundamental questions about, what would it mean to have like a really ignorant prior with no biases over this type of model class, like you know, if we only truncated at three level trees, then we could have a uniform prior over all 8000 structures. That's not what this prior does, though. recursion kind of favors shorter programs within that class. That's very technical issue that we'll see if, I think the third lecture actually on fundamental, really interested in very general phenomena of learning from sparse data or rationality how is it that we can find robust explanations? And also, when does that capacity fail like when we get sort of seduced by a conspiracy theory of there's a lot here in this, " (🙋‍♀️ why does recursion favors shorter program? might this be relevant with occum's razor? - which i've never associated with recursion) (🙋‍♀️ what does vikash mean by conspiracy theory?)

applying 📐shallow to deep principle to entrepreneurship: "At the heart of Vikash Mansinghka's seminars lies the novel concept of "structured ignorance" in probabilistic inference - a methodical approach to embracing and encoding uncertainty in decision-making models. Rather than making narrow, overconfident assumptions, this approach advocates for beginning with broad "ignorance priors" that acknowledge the vast range of possibilities in an open-world system. This foundational principle suggests that by explicitly structuring what we don't know, we can create more robust and adaptable models that better reflect reality. The document introduces a progressive refinement methodology that moves from "shallow" to "deep" probabilistic programs. Starting with simple models that capture broad uncertainties, the approach gradually adds layers of detail and structure as more evidence becomes available. This hierarchical approach to uncertainty allows for a systematic reduction of ignorance while maintaining flexibility at each level. Like zooming into a blurry picture, the process reveals finer details only when warranted by data and evidence, preventing premature commitment to potentially wrong assumptions. These concepts find particular resonance in entrepreneurship, where decision-makers must navigate complex uncertainties while building new ventures. The framework suggests practical strategies: starting with deliberately broad assumptions, layering learning from coarse to fine details, updating beliefs iteratively with new evidence, and maintaining flexibility for unexpected developments. By treating business models as living probabilistic programs that can be refined through data and experience, entrepreneurs can make more informed decisions while remaining adaptable to change. This Bayesian approach to entrepreneurship transforms uncertainty handling from a vague art into a structured, computational process that can be systematically improved over time."

⭐️rational inference

"intelligence as emergent" is the foil of this lecture (🙋‍♀️ this means vikash is against it? - why?)
MENTAL SKILL: visualize the entire space - where the mass flows

⭐️programmable inference

modular, symbolic, programmable language with compiler receiving generating program"s" and data stream and outputs custom algorithms
dsl program (example of deep - constrain model space) for time series (linear, squared ; e = ..9, ; compete with
robustness for inference programs (prababilty of unacceptable failure - only soundness guarante) and soundness for probabilistic programming platform (sounness building the system building that platform; interpreter will encode; meaning preserving); NN fails to be sound/robust (soundness ~ verification test and robustness ~ validation test)

didn't find the need to digest yet

The fact that they're policy computing systems in a compositional way. So liquid model probabilistic programming, which is developed by, principally by Alex Lew is a probabilistic programming model for language models that give some interesting new capabilities. So here's one distinctive problem of generating poems that can be plausibly reversed. Let's say you could ask GPT4 to try to generate a first line of a poem that could possibly be reversed to form the ending line of the problem. You don't need to generate the whole poem just a phrase whose words can be reversed. It turns out that many strong models really struggle with this. So GPT4 might say, Dawn, silence, whispers, softly, evenings, Echo fades, reversing. It fades. Echo, evening, softly, whispers, silence, thoughts didn't really work, so this task was introduced in the sparks of AGI paper as an example of failures of planning in language models.

But you can actually use a llama 2 family model, plus our language model problems to programming language to get solutions like cold and dark. Was it when winter last, which traverses to last winter when it was dark and cold? Or are we there yet? Yeah, there we are, or there was nothing but love in her eyes, eyes are in love, but nothing was there. the way we did this was by writing a language model, probabilistic program that combines forward generation process with a test that reverses what was for was generated forward and scores. It may be a pretty intuitive reasoning strategy for solving the problem.

This is an instance of a much more general affordance that language model problems to programs, which is the ability to take complex task specifications and model at least some of the semantics of this task specifications. So we can't understand symbolically all the semantics of some code generation task, but we can't break it down into right x such that y, which we could model a sample from a distribution prompted by an LLM over exits, but actually re weighted by a likelihood score that assesses the probability of y given x. So this is an example, again, of going from shallower to deep interpretations here, though it's of the task. You just throw it all the language model, or you could try to parse the task a little bit into a compositional specification, which you could then solve with probabilistic inference. So again, it's this 🦦shallow to deep inference process. And over the course of this semester, we'll actually see some some recent results that actually do natural language probabilistic programming with language models, where you can use language model probabilistic programs to take instructions and parse them into language model probabilistic programs so that you can actually implement the process like whatever just described. And that turns out to have some interesting capabilities in practice, and it could be even competitive with four much cheaper than 01, with very, very small models. Now we'll spend a little time when we talk about language model probabilistic programming, situating it against sort of the recent flurry of interest in reasoning, quotes of various kinds, and the chain of thought and all that kind of stuff.
But I do think it's worth highlighting that in the last year, two research awards, one at ICML and one went to sequential Monte Carlos for inference and reinforcement learning and language our language model promo super programming paper is actually going to be COVID and all that I clear this year. So I just want to point out that for people who are interested in language theme, but has some currency, but there's a wide range of approaches, some that are really based on machine learning, and some that may look a little bit more like 🧠Rational inference.

scientific knowledge of more structure (build in more knowledge)
sticky chronic loops for labeled data by human
masking other failure modes due to perception is too buggy; robust world modeling is evolutionary necessity
data, compute efficiency and robustness
worlds and decision about it (perception and reasoning) - hierarhcial, modular, uncertain
one generating train data; but compared
stochastic digital circuit; spike neural (cortical )
unifying; symbolic program
compiler (both evalutie function and its derivatives); + random choices and AD
"But is it robust? With what probability will the new network fail to give the same answer, and by how much? Probably quite a lot sometimes. Or with what probability will the Tesla division system for this percept that differs from the world in terms of its drivability and sufficiently when the car crashes and creates a fatal " approximate
model of generative program that make random choices
compositional specification of kernel
if the y coordinate of every dot (jointly whith) - form of that function will be specified by
syntax tree
resample move SMC (system generate makes proposal; perturbs to better fit; properly weighed)
how many layer (hierarchical, nested strucgureD)-> inference alg converges faster than black
naive
🚨todo: extract every context with "rational"

hyunjimoon · 2025-02-26T14:31:10Z

hyunjimoon
Feb 26, 2025
Maintainer Author

2. Perception and navigation

rational ai principle2

👁️see flowing mass: probabilistic programming visualization reveals how probability distributes across hypothesis space rather than fixating on point estimates. this perceptual skill transforms inference from opaque guesswork to transparent reasoning, allowing rational uncertainty quantification that drives adaptive computation and robust decision-making.

applying principle1, 2

Module	Topic	Rational AI Principle	How	Contrast	Example
2	Perception and navigation	👁️See Flowing Mass	Visualize probability distribution through entire parameter space using particle-based methods with adaptive computation	Deep learning provides point estimates without uncertainty; flowing mass reveals full distribution of possibilities	2D Robot Localization with noisy sensors. Gen/GenJAX generative model of a robot’s motion (uncertain rotation & translation) and sensor (noisy distance readings to walls). Instead of a single “best guess,” it *maintains a distribution* of possible poses** using Sequential Monte Carlo. The “mass” of particles flows from one area of the map to another when sensor data contradict the current pose. A single-point estimate (like Tesla’s autopilot) can fail catastrophically if, e.g., the map alignment is off by one room; but a particle-based inference method automatically rejuvenates proposals and corrects itself when local mismatches accumulate. The code is straightforward: define a pose+motion generative function, a sensor-likelihood function, then run “resample‐move” SMC so multiple pose hypotheses are re-weighted each step, bridging uncertainty in real time.

🗣️vikash seminar - perception_navigation.txt

0 replies

hyunjimoon · 2025-02-26T14:32:25Z

hyunjimoon
Feb 26, 2025
Maintainer Author

3. Foundations of modeling and inference

rational ai principle3

🪒auto occam's razor: hierarchical model encodes uncertain beliefs and sampling navigates that uncertainty representation. together, they form consistent algorithm for probabilistic inference which behaves rationally compared to those violating consistency hence making predictably irrational decisions.

applying principle 1,2,3

Module	Topic	Rational AI Principle	How	Contrast	Example
3	Foundations of modeling and inference	🪒Auto Occam's Razor	Sample from hierarchical models with latent hyperparameters rather than optimize fixed models	Explicit regularization requires manual tuning; sampling naturally favors typical solutions	Polynomial regression with outlier detection. Assign a prior over both the degree of the polynomial and outlier/noise parameters; then use posterior sampling to infer model complexity. A naive optimizer might pick a high-degree polynomial that overfits, whereas the Bayesian sampler automatically leans toward simpler polynomials unless evidence strongly demands complexity, thus embodying “Occam’s razor” without manually fiddling with penalty terms.

🗣️vikash seminar modeling inference.txt

1 reply

hyunjimoon Feb 26, 2025
Maintainer Author

Table 1: Shallow summary

Model Aspect	Description
🎯 Big Idea	Probabilistic modeling allows us to represent uncertainty about both parameters and model structure itself, enabling rational inference from limited data.
⭐️ Key Principles	• Bayesian Occam's Razor: Models of appropriate complexity emerge naturally through marginal likelihood • Model Complexity: How falsifiable a model is relative to how well it fits data - not just parameter count • Hierarchical Bayesian Modeling: Replace fixed parameters with latent variables to be inferred • Marginal Likelihood (p(D\|m)): Integrates over all possible parameter values; automatically balances complexity and fit
🔍 Model Selection	• Too Simple Models: Can only generate a narrow range of datasets, unlikely to fit observed data • Too Complex Models: Can generate many possible datasets, so assign low probability to any specific one • "Just Right" Models: Balance specificity and flexibility, maximizing p(D\|m)
🌲 Hierarchical vs. Fixed	• Fixed Models: Manual parameter setting, brittle to misspecification • Hierarchical Models: Parameters themselves have distributions, allowing automatic adaptation • Example: Fixed outlier probability vs. inferring outlier probability from data

Table 2: Deep dichotomy

Inference Aspect	🧮 Optimization-Based	🪙 Sampling-Based
Goal	Find single best solution (MAP)	Represent entire distribution of solutions
Mathematical Expression	argmax_θ p(θ\|D)	Draw samples from p(θ\|D)
Typical Behavior	Finds mode of distribution	Samples from typical set (often far from mode)
Relationship to p(D\|m)	Does not naturally compute marginal likelihood	Naturally approximates marginal likelihood via samples
High-Dimensional Behavior	Focuses on unlikely region (mode)	Focuses on probable regions (typical set)
Computational Properties	• Often gets stuck in local maxima • Can be inefficient in non-convex settings	• Often more efficient than optimization in non-convex settings • Equivalent to optimization in convex settings
Engineering Simplification	Model collapse to single point estimate	Approximation of full distribution via finite samples
Impact on Rationality	Sacrifices uncertainty representation	Maintains rational uncertainty management
Cautionary Note	"Building AI systems without understanding how optimization shortcuts relate to proper probability theory is like building bridges without understanding how engineering simplifications affect structural integrity."	Must still understand sampling approximation quality

Integrated Paragraph Summary

The foundations of modeling and inference rest on two complementary dimensions: how we represent uncertainty in model space and how we compute with that uncertainty. Table 1 explores the model space, highlighting how hierarchical Bayesian models naturally implement Occam's Razor through the marginal likelihood p(D|m), which integrates over all parameter values p(θ|m) to evaluate how well a model class explains data p(D|θ,m). This integration naturally penalizes overly complex models without requiring explicit regularization. Vikash didn't cover it but I especially liked the log2(1/p(D|m)) as "number of bits of surprise at observing data D under model m" interpretation as a recent study in my field showed decision makers should experiment with more “surprising” theories because in this case experiments are more informative and enable more learning (Camuffo et al., Theory-Driven Strategic Management Decisions).

Table 2 then examines how we navigate this model space, contrasting optimization (which finds single solutions often in improbable regions) with sampling (which represents typical behavior under the model). Together, these approaches create a synergy: hierarchical models provide the structure for encoding appropriate uncertainty, while sampling provides the computational mechanism for navigating that uncertainty, maintaining rational behavior even as problem complexity grows. I dubbed this as "🪒auto occam's razor" hierarchical model encodes uncertain beliefs and sampling navigates that uncertainty representation. together, they form consistent algorithm for probabilistic inference which behaves rationally compared to those violating consistency hence making predictably irrational decisions. (rational ai principle3)

hyunjimoon · 2025-03-06T19:58:43Z

hyunjimoon
Mar 6, 2025
Maintainer Author

I applied four principles in automating paper writing project

Pose Estimation with Sensors:

Basic principle: A robot's position is determined by combining noisy sensor measurements with a motion model to estimate its most likely location.
Application to paper writing: We can model paper sections as "positions" that need to be estimated from input parameters, where input changes are like sensor readings that help triangulate the optimal content.

Particle Filtering vs. Simple Importance Sampling:

Basic principle: Particle filtering sequentially updates and resamples candidate solutions at each step, while importance sampling tries to estimate the entire distribution at once.
Application to paper writing: Instead of generating complete papers from inputs all at once, we can iteratively generate and evaluate each section, allowing earlier sections to influence later ones.

Benefits of Rejuvenation:

Basic principle: Rejuvenation perturbs resampled particles to maintain diversity, preventing collapse to a single solution in high-dimensional problems.
Application to paper writing: We can introduce controlled variations in generated content to explore different phrasings and structures rather than converging to a single write-up.

Integrated Path vs. Probabilistic Inference:

Basic principle: Simple path integration accumulates errors while probabilistic inference maintains uncertainty and can correct itself with new information.
Application to paper writing: Rigid templates accumulate errors when inputs change, but probabilistic approaches can adapt by reweighting possible versions as inputs change.

Differential Visualization Implementation

Here's a simple implementation of differential visualization that shows how changes to inputs propagate through the paper generation process:

// The simplest implementation of Differential Visualization for paper generation

function createDifferentialVisualization() {
  // Track current state
  let state = {
    inputs: {
      phenomena: "",
      theory: "",
      application: ""
    },
    activeInput: "phenomena",
    paperSections: {
      introduction: { content: "", impactScores: {} },
      literature: { content: "", impactScores: {} },
      methodology: { content: "", impactScores: {} },
      results: { content: "", impactScores: {} },
      conclusion: { content: "", impactScores: {} }
    }
  };

  // Define impact relationships (how inputs affect sections)
  // These would be learned or defined based on model analysis
  const impactMatrix = {
    phenomena: {
      introduction: 0.8,
      literature: 0.4,
      methodology: 0.2,
      results: 0.3,
      conclusion: 0.2
    },
    theory: {
      introduction: 0.3,
      literature: 0.7,
      methodology: 0.9,
      results: 0.5,
      conclusion: 0.3
    },
    application: {
      introduction: 0.2,
      literature: 0.1,
      methodology: 0.4,
      results: 0.7,
      conclusion: 0.8
    }
  };

  // Calculate differential impact when input changes
  function updateDifferentialImpact(inputKey, oldText, newText) {
    // Calculate change magnitude (simplified)
    // In a real implementation, this would use embeddings comparison
    const changeRatio = newText.length > 0 ? 
      1 - (oldText.length / newText.length) : 1;
    
    // Update impact scores based on change magnitude
    Object.keys(impactMatrix[inputKey]).forEach(section => {
      const baseImpact = impactMatrix[inputKey][section];
      state.paperSections[section].impactScores[inputKey] = baseImpact * changeRatio;
    });
    
    // Update visualization
    updateHeatmap();
  }

  // Update the heatmap visualization
  function updateHeatmap() {
    // Compute total impact for each section
    Object.keys(state.paperSections).forEach(section => {
      const sectionImpacts = state.paperSections[section].impactScores;
      const totalImpact = Object.values(sectionImpacts).reduce((sum, val) => sum + val, 0);
      
      // Update the visual heatmap for this section (DOM operation in real implementation)
      console.log(`Section "${section}" impact: ${totalImpact.toFixed(2)}`);
      
      // In a real implementation, we would update DOM elements:
      // document.getElementById(`${section}-heatmap`).style.backgroundColor = 
      //   `rgba(255, 102, 0, ${totalImpact})`;
    });
  }

  // Input change handler
  function handleInputChange(inputKey, newValue) {
    const oldValue = state.inputs[inputKey];
    state.inputs[inputKey] = newValue;
    state.activeInput = inputKey;
    
    // Calculate and visualize impact
    updateDifferentialImpact(inputKey, oldValue, newValue);
    
    // In a real system, we'd regenerate affected sections
    // regenerateAffectedSections();
  }

  // Public API
  return {
    updateInput: handleInputChange,
    getState: () => state
  };
}

// Usage example
const paperVisualizer = createDifferentialVisualization();
paperVisualizer.updateInput("phenomena", "Experimental choice differences between entrepreneurs");
paperVisualizer.updateInput("theory", "Hierarchical Bayesian model of decision making");

Key Logic for Differential Visualization

Particle Representation: Each input parameter (phenomena, theory, application) is like a particle position in the state space of possible papers.
Impact Matrix: Similar to how sensors create observation models, we create an impact matrix that defines how each input affects each paper section.
Change Detection: When an input changes, we calculate the differential impact by comparing old and new values.
Weighted Propagation: We propagate changes through the system with appropriate weights, like how particle weights change in a filter.
Visual Feedback: We update a heatmap visualization showing which sections are most affected by recent changes.
Rejuvenation: In a full implementation, we would use this impact information to determine which sections to regenerate with controlled variations.

0 replies

hyunjimoon · 2025-03-08T08:34:25Z

hyunjimoon
Mar 8, 2025
Maintainer Author

4. automating math

lew25_auto(math(integ, diff)).pdf
🗣️: vikash seminar_alex_automate_math.txt

0 replies

hyunjimoon · 2025-04-24T16:14:29Z

hyunjimoon
Apr 24, 2025
Maintainer Author

i prompted below to katie as we have shared interest on model synthesis & entrepreneurship and angie would like to get advice on building domain specific language for entrepreneurial operations from katie

Q1. does it make sense if i say nail scale sail comic book is an attempt for defining semantically rich but syntactically restrictive entrepreneurial operations? full paper below where ten casestudies are compactly used to illustarte ten tools:
Fine22_OM4Entrep.pdf

Q2. if Q1 is yes, if you were to design a domain specific language for entrepreneurial operations (below structure), where'd you spend most time developing?

Q3. what do you think about vikash's separation between generative program and causality?

Instead, we're ⭐️generating generative programs that are highly stochastic⭐️, vastly oversimplified surrogates for whatever was really happening. That's to say they're ⭐️not causal⭐️. I mean, they may have some elements of causality, like the change point, might accurately capture just a very limited, basic fact that at some point, the future is independent of the past. Is something exogenous change, but I think ⭐️they're completely non causal, but they're pretty phenomenological⭐️. I guess they don't have much in the way of mechanism, certainly not explicit mechanism. I think there's something quite interesting here about probabilistic models that are semantically very expressive, but syntactically, very simple.

0 replies

hyunjimoon · 2025-04-30T11:55:50Z

hyunjimoon
Apr 30, 2025
Maintainer Author

My favorite seminar so far!
vikash seminar.txt

Seminar Transcript Summary: Challenges in Interdisciplinary Research

Key Topics Covered:

Value Stories in Different Fields
- How research "stories" vary across disciplines
- Examples from machine learning, computer vision, programming languages, robotics
- How understanding field-specific stories is critical for successful publication
The PhD Life Cycle
- Establishment in the lab (0 papers → shipping research)
- Establishment in the field (understanding stories/norms)
- Two tracks: becoming a PI/entrepreneur vs. working in industry
Time and Energy Allocation
- A logarithmic scale of time investments:
  - 1 minute: Hook attention
  - 1 hour: Initial planning
  - 10 hours: Design docs/prototypes
  - 100 hours: De-risking ideas
  - 1,000 hours: Conference paper/startup fundraising
  - 10,000 hours: PhD/startup journey
  - 100,000 hours: Career span
Interdisciplinary Research Challenges
- Conflicting story templates between disciplines
- Need to understand rigor standards in each field
- Building brands across multiple disciplines
Marketing Frameworks for Research
- Device (the technical artifact)
- Product (value story)
- Marketing (how people discover it)
- Go-to-market strategy
- Brand (impression of you/organization)

The seminar encouraged participants to be intentional about research communication, to understand field-specific evaluation standards, and to balance technical excellence with strong presentation and positioning.

0 replies

9.66 josh tenenbaum comp. cog.sci class + 9.s vikash<>josh #250

Uh oh!

Uh oh!

hyunjimoon Sep 6, 2024 Maintainer

Replies: 19 comments · 11 replies

Uh oh!

tomfid Sep 6, 2024 Maintainer

Uh oh!

Uh oh!

hyunjimoon Sep 20, 2024 Maintainer Author

W2: Foundations of Inductive Learning, Bayesian Inference, Bayesian Concept Learning

Uh oh!

Uh oh!

hyunjimoon Sep 21, 2024 Maintainer Author

Uh oh!

hyunjimoon Oct 5, 2024 Maintainer Author

Uh oh!

Uh oh!

hyunjimoon Sep 24, 2024 Maintainer Author

pset

pset1

Uh oh!

hyunjimoon Oct 26, 2024 Maintainer Author

pset2

Uh oh!

hyunjimoon Sep 29, 2024 Maintainer Author

Uh oh!

Uh oh!

hyunjimoon Sep 29, 2024 Maintainer Author

Uh oh!

Uh oh!

hyunjimoon Oct 26, 2024 Maintainer Author

Uh oh!

Uh oh!

hyunjimoon Oct 10, 2024 Maintainer Author

Week 6: Graphical Models and Probabilistic Programming

Uh oh!

Uh oh!

hyunjimoon Oct 10, 2024 Maintainer Author

Uh oh!

hyunjimoon Oct 11, 2024 Maintainer Author

Uh oh!

hyunjimoon Oct 11, 2024 Maintainer Author

Uh oh!

Uh oh!

hyunjimoon Oct 18, 2024 Maintainer Author

W8 explaining away (posterior coupling)

limitations of using neural network for inference

Uh oh!

Uh oh!

hyunjimoon Oct 26, 2024 Maintainer Author

W9. Probabilistic language of thought (meaning function) and MCMC as intuitive learning mechanism

Uh oh!

Uh oh!

hyunjimoon Nov 5, 2024 Maintainer Author

Uh oh!

hyunjimoon Nov 27, 2024 Maintainer Author

Uh oh!

Uh oh!

hyunjimoon Nov 27, 2024 Maintainer Author

Uh oh!

hyunjimoon Jan 26, 2025 Maintainer Author

Uh oh!

Uh oh!

hyunjimoon
Sep 6, 2024
Maintainer

Replies: 19 comments 11 replies

tomfid
Sep 6, 2024
Maintainer

hyunjimoon
Sep 20, 2024
Maintainer Author

hyunjimoon Sep 21, 2024
Maintainer Author

hyunjimoon Oct 5, 2024
Maintainer Author

hyunjimoon
Sep 24, 2024
Maintainer Author

hyunjimoon Oct 26, 2024
Maintainer Author

hyunjimoon
Sep 29, 2024
Maintainer Author

hyunjimoon Sep 29, 2024
Maintainer Author

hyunjimoon Oct 26, 2024
Maintainer Author

hyunjimoon
Oct 10, 2024
Maintainer Author

hyunjimoon Oct 10, 2024
Maintainer Author

hyunjimoon Oct 11, 2024
Maintainer Author

hyunjimoon Oct 11, 2024
Maintainer Author

hyunjimoon
Oct 18, 2024
Maintainer Author

W8
explaining away (posterior coupling)

hyunjimoon
Oct 26, 2024
Maintainer Author

hyunjimoon
Nov 5, 2024
Maintainer Author

hyunjimoon
Nov 27, 2024
Maintainer Author

hyunjimoon
Nov 27, 2024
Maintainer Author

hyunjimoon
Jan 26, 2025
Maintainer Author