No single taxonomy is “the” official list, because people use “mode of reasoning” to mean (at least) four different things:
- an inference pattern (e.g., deduction vs. abduction),
- a representation of uncertainty/vagueness (e.g., probability vs. fuzzy membership),
- a problem-solving method (e.g., planning, constraint solving, optimization),
- or a domain style (e.g., scientific, legal, moral reasoning).
Real-world reasoning is therefore almost always hybrid (e.g., science routinely cycles: abduction → deduction → statistical testing → belief revision).
Below is a broad, high-coverage map intended to be usable in practice. For each mode, I give:
- What it is (the core move),
- What it outputs (proof, probability, hypothesis, plan, argument, etc.),
- How it differs (especially from “nearby” modes),
- Best for (where it tends to win),
- Common failure mode (what to watch for).
Most disagreements about “the right reasoning” come from mixing up these axes:
| Axis | One pole | Other pole | Why it matters |
|---|---|---|---|
| Ampliative vs. non‑ampliative | Deduction: conclusion contained in premises | Induction/Abduction/Analogy: goes beyond premises | Discovery/learning requires ampliative moves; assurance requires non-ampliative checks. |
| Monotonic vs. non‑monotonic | Adding info never retracts conclusions | Adding info can retract conclusions | Common-sense & policy reasoning are usually non‑monotonic; pure math proofs are monotonic. |
| Uncertainty vs. vagueness | Uncertainty about crisp facts (probability) | Vagueness in predicates (fuzzy/rough) | Prevents category mistakes like treating “tall” as probabilistic rather than vague. |
| Descriptive vs. normative | What is (facts, causes) | What ought (values, duties, constraints) | Decisions fail when value tradeoffs are smuggled in as “facts.” |
| Belief vs. action | What to believe / accept | What to do / choose | Separating belief updates from decision criteria improves clarity and accountability. |
| Single‑agent vs. multi‑agent | World as uncertainty/noise | Other agents strategically respond | Strategy, negotiation, security, and markets require game/ToM reasoning. |
| Truth vs. adoption | Accuracy / validity oriented | Audience / coordination oriented | Many org failures are rhetorical (alignment) rather than logical. |
A practical rule: use multiple modes intentionally.
- If you need reliability/assurance (safety, compliance, verification): lean on deduction, proof, constraints.
- If you need learning/prediction (forecasting, measurement): lean on statistics/Bayesian, with calibration.
- If you need explanations/diagnosis (root cause, incident response): lean on abduction + causal + mechanistic.
- If you need choices under tradeoffs (strategy, portfolio): lean on decision theory + satisficing + robust.
- If you need buy‑in (policy adoption, change management): add argumentation + rhetoric.
What it is: If the premises are true and the inference rules are valid, the conclusion must be true. Outputs: Valid entailments; proofs/derivations; contradictions/counterexamples (via refutation). How it differs: Truth‑preserving and typically monotonic; it makes explicit what’s already implicit. Best for: Spec checking, compliance logic, crisp “must/shall” implications, formal arguments. Common failure mode: Garbage-in (false premises) or missing premises that matter in reality.
What it is: Deduction where the proof object matters (what counts as a proof, how it’s constructed). Outputs: Formal proofs (sometimes machine-checkable); proof transformations. How it differs: More structured than everyday deduction; emphasizes derivability and proof structure. Best for: Formal methods, theorem proving, certified reasoning pipelines. Common failure mode: Proving the wrong theorem (spec mismatch) or proving something irrelevant to outcomes.
What it is: A proof of existence must provide a construction/witness; some classical principles are restricted. Outputs: Proofs that often correspond to algorithms (“proofs as programs”). How it differs: Stronger link between “proved” and “computable.” Best for: Verified software, protocols, constructive math, “show me the witness.” Common failure mode: Over-constraining when classical reasoning is acceptable and simpler.
What it is: Transform expressions using equalities and rewrite rules while preserving meaning. Outputs: Equivalent forms; normal forms; simplifications; invariants. How it differs: Deduction specialized to symbol manipulation; often the everyday workhorse in math/CS. Best for: Refactoring, optimization proofs, dimensional reasoning scaffolds, invariant manipulation. Common failure mode: Unsound rewrite rules or implicit domain restrictions (division by zero, overflow).
What it is: Reason by constructing/analyzing models that satisfy a theory (true in all models vs some). Outputs: Satisfiable/unsatisfiable; countermodels; interpretations. How it differs: Complements proof-theory: instead of “derive,” you “build a world where it holds/doesn’t.” Best for: Consistency checks, finding hidden assumptions, generating counterexamples. Common failure mode: Model doesn’t match the intended semantics of the real system.
What it is: Encode requirements as constraints and solve for assignments that satisfy them (or prove none exist). Outputs: A satisfying assignment; unsat certificate; minimal unsat cores; counterexamples. How it differs: It can implement deduction, but the “mode” is solve-by-consistency rather than argument-by-argument inference. Best for: Scheduling, configuration, verification, policy enforcement, feasibility checks. Common failure mode: Poor encoding (missed constraints) → false confidence.
What it is: Use types (including dependent/refinement types) to enforce invariants; “propositions as types” in some systems. Outputs: Type derivations; well-typed programs; compositional guarantees. How it differs: Reasoning is integrated into construction; great for modular correctness. Best for: API design, correctness-by-construction, safe composition of large systems. Common failure mode: Fighting the type system instead of clarifying the spec it encodes.
What it is: Propose an abstraction; check; if a counterexample appears, refine the abstraction and repeat. Outputs: Either a proof of property or a concrete counterexample; refined models. How it differs: It’s a loop blending deduction + model checking + refinement, built for scalability. Best for: Verification, security properties, systems where full modeling is too expensive. Common failure mode: Endless refinement loops if the abstraction boundary is poorly chosen.
What it is: Infer general patterns from observations (“observed many A are B → probably A are B”). Outputs: General rules, trends, predictors. How it differs: Not truth‑preserving; new data can overturn it. Best for: Learning from experience, early-stage pattern discovery, forming priors. Common failure mode: Overgeneralizing from small/biased samples.
What it is: Inference about populations from samples via estimators, confidence intervals, tests, error rates. Outputs: Effect estimates + uncertainty statements tied to sampling procedures. How it differs: Typically avoids “probability of hypotheses”; emphasizes long-run properties of procedures. Best for: Experiments, A/B tests, QA, inference under repeated-sampling assumptions. Common failure mode: P-value worship; confusing “no evidence” with “evidence of no effect.”
What it is: Represent degrees of belief as probabilities and update them with evidence (Bayes’ rule). Outputs: Posterior beliefs; predictive distributions; uncertainty-aware forecasts. How it differs: Probability as rational credence management; coherence arguments motivate consistency; Cox-style results are often interpreted as connecting Boolean logic with graded belief. Best for: Integrating prior knowledge + data, diagnosis, forecasting, online learning. Common failure mode: Overconfident priors or “making up” priors without sensitivity analysis.
What it is: Compare how well hypotheses predict observed data via likelihoods, without necessarily committing to priors. Outputs: Likelihood ratios; relative evidential support rankings. How it differs: Separates “data support” from “belief after priors”; sits between Bayesian and frequentist idioms. Best for: Model comparison, forensic evidence strength, hypothesis triage. Common failure mode: Ignoring base rates/priors entirely when they matter for decisions.
What it is: From observations, propose a hypothesis that would best explain them. Outputs: Candidate explanations/models; “best current story.” How it differs: Unlike induction (generalizing frequencies), abduction introduces hidden mechanisms/causes; unlike deduction, it’s not guaranteed. Best for: Hypothesis generation, incident triage, diagnosis, scientific discovery. Common failure mode: “Story bias” (choosing the most appealing explanation, not the most supported).
What it is: Transfer relational structure from a known domain/case to a new one (often deeper than surface similarity). Outputs: Candidate inferences; adapted solutions; conceptual models/metaphors. How it differs: Often particular → particular transfer; frequently seeds abduction (“maybe it works like…”). Best for: Innovation, design, teaching, cross-domain problem solving. Common failure mode: False analogies (shared surface traits, different causal structure).
What it is: Retrieve similar past cases and adapt their solutions. Outputs: Proposed solution justified by precedent; playbook actions. How it differs: More operational than analogy: emphasizes retrieval metrics + adaptation operators + case libraries. Best for: Law (precedent), customer support, clinical decision support, ops playbooks. Common failure mode: Cargo-culting: applying precedent without checking context changes.
What it is: Use an explanation of why a solution works to generalize a reusable rule/plan. Outputs: Generalized strategies with an explanatory justification. How it differs: It generalizes like induction but is guided/validated by deductive explanation. Best for: Turning expert solutions into SOPs; reducing overfitting to anecdotes. Common failure mode: Explanations that are internally elegant but empirically wrong.
What it is: Prefer hypotheses that explain data with fewer assumptions / shorter descriptions, balancing fit vs complexity. Outputs: Bias toward simpler models; complexity penalties; regularization choices. How it differs: It’s a selection principle across hypotheses; often paired with abduction and statistics. Best for: Model selection, avoiding overfitting, choosing parsimonious policies. Common failure mode: Oversimplifying when the world is genuinely complex/nonlinear.
What it is: Predict by comparing to a base rate distribution of similar past projects/cases (“what usually happens?”). Outputs: Base-rate forecasts; adjustment factors. How it differs: It’s an inductive method designed to counter planning fallacy and inside-view optimism. Best for: Project timelines, budgets, risk forecasting, portfolio-level planning. Common failure mode: Choosing the wrong reference class (too broad or too narrow).
What it is: Rough quantitative estimates via decomposition and bounding. Outputs: Back-of-the-envelope estimates; upper/lower bounds; sensitivity drivers. How it differs: A heuristic quantitative mode: aims for scale correctness rather than precision. Best for: Early feasibility, sanity checks, identifying dominant terms. Common failure mode: Hidden unit mistakes or implicit assumptions left untested.
What it is: Blend logical structure (relations/rules/quantifiers) with probabilistic uncertainty (e.g., probabilistic programming + constraints). Outputs: Probabilistic inferences over structured worlds; uncertain rule consequences. How it differs: More expressive than plain Bayesian models for relational domains; more uncertainty-aware than pure logic. Best for: Knowledge graphs with uncertainty; uncertain policies; relational prediction. Common failure mode: “Model soup” (too expressive → hard to validate; brittle inference).
What it is: Represent uncertainty with ranges of probabilities when precision isn’t justified. Outputs: Bounds on beliefs and decisions; sensitivity analyses. How it differs: Less committal than a single prior/posterior; separates “unknown” from “unlikely.” Best for: High-stakes decisions with weak priors; governance/risk; robustness checks. Common failure mode: Paralysis (“ranges are wide, so we can’t decide”)—needs decision rules.
What it is: Allocate “mass” to sets of possibilities; combine evidence into belief/plausibility intervals. Outputs: Belief + plausibility ranges; fused evidence from multiple sources. How it differs: Can represent partial support for sets (not point hypotheses) more directly than standard probability. Best for: Multi-source fusion, ambiguous evidence, partial identification. Common failure mode: Misusing combination rules when sources aren’t independent.
What it is: Choose distributions satisfying known constraints while assuming as little else as possible (maximize entropy). Outputs: Principled default distributions; minimally committed priors under constraints. How it differs: “Least-committal completion” rather than explanation. Best for: Baselines, priors under constraints, principled defaults in modeling. Common failure mode: Constraints are wrong/underspecified → outputs look “objective” but aren’t.
What it is: Replace numeric probabilities with ordinal “degree of disbelief” ranks; update by shifting ranks. Outputs: Ordered plausibility levels; belief dynamics without precise probabilities. How it differs: More structured than defaults, less numeric than Bayes; useful when only ordering is defensible. Best for: Early-stage hypothesis ranking; reasoning with weak quantification. Common failure mode: Losing important magnitude information when magnitude actually matters.
What it is: Truth is a degree (0–1) because predicates have blurred boundaries (“tall,” “near,” “high risk”). Outputs: Degrees of membership/truth; fuzzy rule outputs. How it differs: Fuzzy truth ≠ probability: probability is uncertainty about crisp facts; fuzzy membership is graded applicability. Best for: Control systems, scoring/rubrics, policies with soft thresholds. Common failure mode: Treating fuzzy scores like calibrated probabilities.
What it is: More than two truth values; explicitly represent “unknown” or “undefined.” Outputs: Inferences that track indeterminacy rather than forcing a binary choice. How it differs: Often targets incompleteness more than vagueness. Best for: Databases with nulls, partial specs, missingness-aware reasoning. Common failure mode: Conflating “unknown” with “false.”
What it is: Approximate a concept by what is definitely in vs possibly in, given limited features/indiscernibility. Outputs: Lower/upper bounds on classifications; boundary regions. How it differs: Membership arises from granularity of observation, not degrees of truth. Best for: Interpretability-focused classification; feature-limited domains. Common failure mode: Overconfidence about what’s “definitely” in/out when features are weak.
What it is: Categorize by similarity to prototypes/exemplars rather than strict necessary-and-sufficient definitions. Outputs: Graded category judgments; typicality effects. How it differs: Natural for human categories; complements fuzzy/rough by focusing on similarity geometry. Best for: UX taxonomies, product categorization, human-facing labeling. Common failure mode: Hidden bias in prototypes; category drift over time.
What it is: Reason with qualitative states (“increasing,” “decreasing,” “positive influence”) instead of exact numbers. Outputs: Directional predictions; qualitative constraints; sanity checks. How it differs: Not primarily uncertainty; it’s coarse modeling for robustness and early design. Best for: Early architecture, feedback reasoning, “does this trend make sense?” checks. Common failure mode: Missing nonlinear thresholds where sign reasoning breaks down.
What it is: Adding information can retract previous conclusions (“birds fly” until “penguin”). Outputs: Default conclusions with explicit revision behavior. How it differs: Classical deduction is monotonic; most real knowledge bases aren’t. Best for: Rule systems with exceptions, policies, “normally” knowledge. Common failure mode: Unclear priority rules → inconsistent or surprising behavior.
What it is: Use “normally/typically” rules overridden by more specific info. Outputs: Typical conclusions; exception handling. How it differs: Often categorical (default applies/doesn’t) rather than numeric probabilities. Best for: Ontologies, rule engines, SOPs with carve-outs. Common failure mode: Defaults become “facts” and stop being questioned.
What it is: Conclusions can be defeated by counterevidence or stronger rules; tracks priorities/strength. Outputs: Warranted conclusions given competing reasons. How it differs: More explicit about conflict resolution than plain defaults. Best for: Compliance/policy, medical guidelines, conflicting requirements. Common failure mode: Priority schemes that encode politics rather than relevance.
What it is: Principles for revising an accepted belief set with new info, especially when inconsistent. Outputs: Revised belief sets with minimal-change goals. How it differs: Bayesian updating revises degrees; belief revision revises acceptance of propositions. Best for: Knowledge management, requirements evolution, source reconciliation. Common failure mode: “Minimal change” preserves outdated core assumptions.
What it is: Tolerate contradictions without explosion (deriving everything). Outputs: Controlled inferences from inconsistent data. How it differs: Instead of immediately repairing inconsistency, it contains it. Best for: Merging inconsistent sources, messy enterprise data, early incident response. Common failure mode: Never resolving contradictions that actually matter for action.
What it is: Build arguments and counterarguments; compute which claims stand given attack/defense relations. Outputs: Accepted/warranted claims; rationale maps. How it differs: Not just “derive consequences” but “evaluate competing reasons.” Best for: Governance, policy disputes, legal-style reasoning, stakeholder conflicts. Common failure mode: Mistaking “won the debate” for “is true” (argument strength vs reality).
What it is: Structured argument that a system is acceptably safe/secure/reliable, supported by evidence and subclaims (often tree-like). Outputs: Safety case; risk arguments; evidence traceability. How it differs: It’s argumentation constrained by standards and evidence requirements; bridges formal and empirical reasoning. Best for: Safety-critical systems, compliance audits, AI governance documentation. Common failure mode: Paper compliance (beautiful argument, weak evidence).
What it is: Identify causal relations and predict effects of interventions (distinguish P(Y|X) vs P(Y|do(X))). Outputs: Causal effect estimates; intervention predictions; adjustment sets. How it differs: Correlation alone can’t resolve confounding or direction; causal reasoning encodes structure assumptions. Best for: Product impact, policy evaluation, root-cause analysis that must guide action. Common failure mode: Hidden confounders; unjustified causal assumptions.
What it is: Infer causal graph structure from data + assumptions (and ideally interventions). Outputs: Candidate causal graphs; equivalence classes; hypotheses for experimentation. How it differs: Causal inference assumes (some) structure; discovery tries to learn it. Best for: Early-stage domains with unclear mechanisms; prioritizing experiments. Common failure mode: Overtrusting discovery outputs without validating assumptions (faithfulness, no hidden confounding, etc.).
What it is: Evaluate alternate histories given a causal model. Outputs: Counterfactual outcomes; blame/credit analyses; individualized explanations. How it differs: Needs causal structure beyond pure statistics. Best for: Postmortems, accountability, scenario evaluation, personalized decision support. Common failure mode: Confident counterfactuals from weak models.
What it is: Explain/predict by identifying parts and interactions. Outputs: Mechanistic explanations; levers; failure modes. How it differs: Stronger than correlation: gives actionable intervention points and generalizes when mechanisms hold. Best for: Engineering, debugging, safety analysis, biology/medicine. Common failure mode: “Just-so mechanisms” that sound plausible but aren’t validated.
What it is: Infer hidden faults/causes from symptoms using a fault/causal model plus uncertainty handling. Outputs: Ranked causes; next-best tests; triage plans. How it differs: Often abduction + Bayesian/likelihood updates, constrained by explicit fault models. Best for: Incident response, troubleshooting, quality triage. Common failure mode: Premature closure (locking onto one cause too early).
What it is: Run an internal model (mental or computational) to predict consequences under scenarios. Outputs: Scenario traces; sensitivity analyses; “what-if” results. How it differs: Not proof-like; it’s generative prediction from a specified model. Best for: Complex systems, policy design, engineering dynamics, capacity planning. Common failure mode: Simulation overconfidence; unvalidated models.
What it is: Reason about interacting components over time: reinforcing/balancing loops, delays, unintended consequences. Outputs: Causal loop diagrams; leverage points; dynamic hypotheses. How it differs: Explicitly multi-level and dynamic; “local linear” reasoning often fails. Best for: Org design, markets, reliability engineering, platform ecosystems. Common failure mode: Vague loop stories without measurable hypotheses.
What it is: From goals, derive actions/subgoals necessary or helpful to achieve them (“to get X, do Y”). Outputs: Action rationales; subgoals; dependency chains. How it differs: About doing, not merely believing; feeds planning and decision theory. Best for: Strategy decomposition, OKRs, operational planning. Common failure mode: Local means become ends (“process is the goal”).
What it is: Combine beliefs with preferences/utilities to choose actions (e.g., expected utility). Outputs: Option rankings; policies; explicit tradeoffs. How it differs: Bayesian reasoning updates beliefs; decision theory adds values and consequences. Best for: Portfolio choices, risk decisions, prioritization, pricing. Common failure mode: Utility mismatch (what you optimize isn’t what you truly value).
What it is: Decide with multiple objectives (cost, speed, safety, equity), often using weights, outranking, or Pareto frontiers. Outputs: Tradeoff surfaces; Pareto-efficient sets; transparent scoring models. How it differs: Makes tradeoffs explicit instead of collapsing them implicitly into one objective. Best for: Strategy, procurement, roadmap planning, governance. Common failure mode: Arbitrary weights hiding politics; false precision.
What it is: Compute action sequences or policies achieving goals under constraints and dynamics. Outputs: Plans, policies, contingencies, playbooks. How it differs: Outputs a procedure, not a proposition. Best for: Operations, project plans, incident response. Common failure mode: Plans that ignore uncertainty and execution reality.
What it is: Choose the best solution relative to an objective subject to constraints. Outputs: Optimal/near-optimal decisions; tradeoff curves; shadow prices. How it differs: Constraint satisfaction asks “any feasible?”; optimization asks “best feasible.” Best for: Resource allocation, routing, scheduling, design tradeoffs. Common failure mode: Optimizing the wrong objective or ignoring unmodeled constraints.
What it is: Choose actions that perform acceptably under worst plausible conditions or adversaries. Outputs: Conservative policies; guarantees; buffer sizing. How it differs: Expected-value optimizes averages; robust optimizes guarantees. Best for: Safety-critical systems, security, compliance, tail-risk control. Common failure mode: Overconservatism (leaving too much value on the table).
What it is: Choose the action minimizing worst-case regret (difference from best action in hindsight). Outputs: Regret-robust choices; hedged decisions. How it differs: More compromise-oriented than strict worst-case utility; useful under ambiguity. Best for: Strategy under deep uncertainty; irreversible decisions. Common failure mode: Regret framing that ignores asymmetric catastrophic outcomes.
What it is: Seek a solution that is “good enough” given time/compute/info limits rather than globally optimal. Outputs: Thresholds; stopping rules; acceptable solutions. How it differs: Not “lazy optimization”; it’s rational under constraints. Best for: Real-time ops, fast-moving environments, early product strategy. Common failure mode: Thresholds too low → chronic mediocrity; too high → disguised optimization.
What it is: Decide which measurements/experiments reduce uncertainty most per cost to improve decisions. Outputs: Experiment priorities; instrumentation plans; “next best question.” How it differs: Meta-decision theory: picks information acquisition actions. Best for: R&D prioritization, analytics roadmaps, incident investigation sequencing. Common failure mode: Measuring what’s easy, not what changes decisions.
What it is: Use simple rules that often work; fast but biased. Outputs: Quick decisions/inferences; prioritization shortcuts. How it differs: Less principled but cheaper; should be paired with checks/calibration. Best for: Triage, first drafts, guiding search. Common failure mode: Heuristics become doctrine.
What it is: Systematically explore possibilities (tree search, dynamic programming), guided by heuristics and pruning. Outputs: Candidate solutions; best-found solutions; sometimes optimality proofs. How it differs: Computational method that can realize planning, proof, or optimization. Best for: Large combinatorial spaces, automated reasoning, “try options” problems. Common failure mode: Search blowup without good heuristics/structure.
What it is: Reason when outcomes depend on others’ choices. Outputs: Strategies; incentive analyses; equilibrium reasoning. How it differs: Decision theory treats uncertainty as nature; game theory treats uncertainty as other optimizers. Best for: Negotiation, pricing competition, security, platform rules. Common failure mode: Assuming rationality/common knowledge where it doesn’t exist.
What it is: Infer beliefs, intentions, knowledge states of others (nested beliefs). Outputs: Behavior predictions; communication strategies; coordination plans. How it differs: Focuses on beliefs-about-beliefs; often essential for collaboration. Best for: Leadership, UX, teamwork, threat modeling. Common failure mode: Mind-reading with overconfidence; projecting your incentives onto others.
What it is: Reason about acceptable agreements and coalition formation under constraints and asymmetric information. Outputs: Offers, concessions, coalition structures; Pareto improvements. How it differs: More process- and constraint-oriented than abstract equilibrium analysis; mixes game theory with norms/rhetoric. Best for: Partnerships, sales, cross-team alignment. Common failure mode: Winning the negotiation but losing the relationship/long-term incentives.
What it is: Design rules so that self-interested behavior leads to desired outcomes (align incentives). Outputs: Policies, marketplaces, compensation plans, governance structures. How it differs: Reverse game theory: instead of predicting behavior under rules, choose rules to shape behavior. Best for: Platforms, internal governance, moderation policies, compensation systems. Common failure mode: Goodharting (metrics become targets and get gamed).
What it is: Advance understanding through structured opposition: surface tensions, refine concepts, integrate perspectives. Outputs: Refined positions; conceptual synthesis; clarified distinctions. How it differs: Unlike paraconsistency (tolerating contradictory data), dialectic uses tension to improve concepts and frames. Best for: Strategy debates, assumptions audits, resolving conceptual confusion. Common failure mode: Endless debate without convergence criteria.
What it is: Reasoning aimed at belief-change and coordination, considering audience, framing, ethos/pathos/logos. Outputs: Persuasive arguments, narratives, positioning. How it differs: Not primarily truth-preserving; it’s about adoption—ideally constrained by truth. Best for: Change management, stakeholder alignment, external communication. Common failure mode: Manipulative persuasion that erodes trust and long-run effectiveness.
What it is: Infer meaning and intent from language, documents, norms, artifacts using context and interpretive canons. Outputs: Interpretations; reconciled meanings; clarified definitions. How it differs: Emphasizes context and ambiguity management, not only formal entailment. Best for: Contracts, policy docs, requirements, qualitative feedback synthesis. Common failure mode: Over-interpreting; reading intent that isn’t there.
What it is: Build coherent time-ordered explanations connecting events, motives, causes into a story supporting prediction and action. Outputs: Postmortems, strategy narratives, scenario stories. How it differs: Integrates causal/abductive/rhetorical constraints; risk is over-coherence (“too neat”). Best for: Incident reports, executive communication, explaining complex causal chains. Common failure mode: Narrative closure crowding out alternative hypotheses.
What it is: Decide “what kind of situation is this?”—build frames that organize signals, priorities, and actions under ambiguity. Outputs: Situation frames; working hypotheses; shared mental models. How it differs: Precedes many other modes: it selects what counts as relevant evidence and what questions to ask. Best for: Crisis leadership, early-stage strategy, ambiguous competitive landscapes. Common failure mode: Locking onto the wrong frame and then reasoning flawlessly inside it.
J) Modal, temporal, spatial, and normative reasoning (structured possibility, time, space, and “ought”)
What it is: Reason with “necessarily,” “possibly,” “knows,” and “after action α…” operators. Outputs: Claims about possibility spaces, knowledge, and action effects. How it differs: Makes distinctions explicit that classical logic can’t express cleanly. Best for: Security (knowledge), planning (actions), reasoning about contingencies. Common failure mode: Treating “possible” as “likely” or “knowable.”
What it is: Reason about what is permitted, required, or forbidden; handle norm conflicts and exceptions. Outputs: Norm-consistent action sets; compliance interpretations. How it differs: Normative: about “ought,” not “is.” Often non-monotonic due to exceptions. Best for: Compliance, policy, ethics constraints in systems. Common failure mode: Inconsistent norms; ignoring priority/lexical ordering of duties.
What it is: Reason about ordering, duration, persistence, and change over time. Outputs: Temporal constraints; timelines; persistence assumptions. How it differs: Truth depends on time; persistence defaults introduce non-monotonicity. Best for: Scheduling, planning, forensics, narrative validity. Common failure mode: Hidden assumptions about persistence (“it stays true unless…”).
What it is: Reason using geometry/topology and often diagrams (containment, adjacency, flows). Outputs: Spatial inferences; layouts; flow arguments. How it differs: Uses representational affordances of diagrams; can be more direct than symbolic propositions. Best for: Architecture diagrams, supply chains, UX, robotics. Common failure mode: Diagram ≠ truth; pictures can hide missing constraints.
What it is: A workflow: abduce hypotheses, deduce predictions, test (statistics), revise beliefs/theories. Outputs: Models, predictions, experiments, updated beliefs. How it differs: An integrated pipeline rather than a single inference rule. Best for: R&D, experimentation platforms, measurement culture. Common failure mode: Confirmation bias; underpowered experiments; publication/reporting bias.
What it is: Choose interventions, measurements, and sampling to identify effects (randomization, controls, blocking, instrumentation). Outputs: Experiment plans; power analyses; measurement strategies. How it differs: It’s reasoning about how to learn reliably, not just how to analyze after the fact. Best for: A/B testing, causal learning, evaluation of interventions. Common failure mode: Measuring proxies that don’t capture the real outcome (Goodhart risk).
What it is: Iterate from requirements to architectures to prototypes with tradeoffs, constraints, and failure analyses. Outputs: Designs, specs, tradeoff justifications, test plans. How it differs: Inherently multi-objective and constraint-laden; relies on simulation, optimization, safety margins. Best for: Product development, reliability, architecture decisions. Common failure mode: Premature optimization or over-engineering; ignoring maintainability.
What it is: Apply rules to facts, interpret texts, reason from precedents; uses burdens/standards of proof and adversarial argumentation. Outputs: Legal positions, compliance interpretations, precedent-based arguments. How it differs: Mixes deduction, analogy, rhetoric under institutional constraints. Best for: Compliance, governance, dispute resolution. Common failure mode: Treating legal compliance as sufficient for ethical legitimacy (or vice versa).
What it is: Reason about right/wrong and value tradeoffs (consequentialist, deontological, virtue, contractualist, care ethics, etc.). Outputs: Value constraints; ethical justifications; tradeoff statements. How it differs: Normative: cannot be reduced to facts alone, though must be informed by them. Best for: AI governance, product harms, trust & safety, people policy. Common failure mode: Values laundering (“it’s ‘ethical’ because it helps our goal”) without principled constraints.
What it is: Reconstruct what happened from incomplete sources; triangulate evidence; assess credibility; compare hypotheses. Outputs: Best-available reconstructions; source assessments; confidence statements. How it differs: Strong emphasis on provenance, bias, and alternative explanations under uncertainty. Best for: Audits, incident reconstruction, due diligence, fraud investigations. Common failure mode: Overfitting to a compelling narrative; neglecting disconfirming evidence.
What it is: Blend pattern recognition (cases), mechanistic models, tests, triage, and risk constraints under time pressure. Outputs: Triage decisions; test sequences; interventions with safety checks. How it differs: A real-world hybrid mode optimized for time-critical, high-stakes diagnosis. Best for: SRE/ops, support escalation, medical-style workflows. Common failure mode: Skipping confirmatory tests; treating correlations as mechanisms.
What it is: Decide how to reason: which mode to use, effort allocation, what uncertainties matter, when to stop. Outputs: Deliberation policies; checklists; stopping rules. How it differs: Second-order: the object is your inference process and resource allocation. Best for: High-stakes decisions, avoiding over-analysis, building reliable org processes. Common failure mode: Meta-infinite regress (“thinking about thinking” forever).
What it is: Track how reliable your beliefs are (forecast scoring, error bars, backtesting). Outputs: Calibrated confidence; forecast accuracy metrics; improved priors. How it differs: First-order uncertainty is “what is true?”; calibration is “how good am I at knowing?” Best for: Forecasting culture, risk reviews, decision reviews. Common failure mode: Confusing confidence with competence; never measuring accuracy.
What it is: Iteratively adjust both principles and case judgments until they cohere. Outputs: Coherent principles + case decisions; updated policies/norms. How it differs: Not deduction from fixed axioms; principles and judgments co-evolve. Best for: Policy design, governance, value-laden decisions. Common failure mode: Coherence achieved by quietly dropping hard cases.
What it is: Start from an accepted fact and infer what must be true for it to be possible (Kantian style). Outputs: Necessary preconditions; architectural “must-haves.” How it differs: Not empirical induction; reasons from possibility to enabling conditions. Best for: Deep framework design, conceptual audits, first-principles constraints. Common failure mode: Mistaking “necessary for my model” as “necessary in reality.”
What it is: Assume the role of an attacker/critic: try to break arguments, systems, incentives, and assumptions. Outputs: Failure modes, exploits, counterexamples, “what could go wrong” maps. How it differs: It’s intentionally antagonistic to your current plan; pairs with robust reasoning and assurance cases. Best for: Security, safety, governance, strategy stress-testing. Common failure mode: Cynicism theater (finding clever attacks without prioritizing real risk).
What it is: Structured checks to reduce predictable errors (base rates, alternative hypotheses, premortems, disconfirmation search). Outputs: Checklists; improved judgments; documented uncertainty. How it differs: Not a new inference rule; it’s a discipline for selecting and constraining inference. Best for: High-stakes decisions, leadership reviews, forecasting, incident postmortems. Common failure mode: Ritualized checklists that aren’t actually used to change conclusions.
High-quality real-world work usually looks like one of these composites:
- Science / product experimentation: abduction → deduction → experimental design → statistical test → belief revision → calibration.
- Incident response: abductive diagnosis + mechanistic model + tests (VoI) + satisficing under time pressure + postmortem counterfactuals.
- Policy / governance: causal inference (“what happens if…”) + decision theory (tradeoffs) + moral reasoning (constraints) + argumentation (stakeholder conflict) + rhetoric (adoption) + assurance case (traceable evidence).
- Engineering / safety: constraints + proof/verification + robust reasoning + red-teaming + safety case + continuous monitoring + calibration loops.
- Strategy under uncertainty: reference-class forecasting + scenario simulation + minimax regret/robust + negotiation/game theory + sensemaking updates.
If you tell me the kinds of problems you’re actually facing (forecasting? root-cause? product strategy? compliance? security? org alignment?), I can propose a concrete “mode stack”: which modes to combine, in what order, and what artifacts (docs, metrics, tests, checklists) to produce so the reasoning reliably converts into execution.