Welcome to open-r1 Discussions! #36

qgallouedec · 2025-01-25T21:31:02Z

qgallouedec
Jan 25, 2025
Maintainer

👋 Welcome!

We’re using Discussions as a place to connect with other members of our community. We hope that you:

Ask questions you’re wondering about.
Share ideas.
Engage with other community members.
Welcome others and are open-minded. Remember that this is a community we
build together 💪.

To get started, comment below with an introduction of yourself and tell us about what you do with this community.

ChewKokWah · 2025-01-26T00:30:41Z

ChewKokWah
Jan 26, 2025

Please share your negative result or failure case so that other don't waste resource repeating the same mistake

0 replies

Rajatavaa · 2025-01-26T08:08:14Z

Rajatavaa
Jan 26, 2025

This is the discussion panel for only Deepseek R1?

2 replies

thtsassy Jan 28, 2025

How can I use the WhatsApp auto save feature

Paillat-dev Jan 31, 2025

wtf

yangchou19 · 2025-01-26T08:58:03Z

yangchou19
Jan 26, 2025

I want to know how to improve the generality of reward models? It is important for RL!

1 reply

jbc222 Jan 29, 2025

Generate SFT data using a general-purpose large model like O1, and then use it to train a reward model (i.e., distillation).

benbergner · 2025-01-26T11:29:34Z

benbergner
Jan 26, 2025

I am interested in two things:

A better understanding of the key components that lead to good performance in R1-(zero). Is it the quality of the base model? Is it the training process (RL vs. SFT)? For RL, is it important whether PPO or GRPO is used? Is it verifiable rewards vs. using a reward model? For verifiable rewards, how could this be scaled beyond easily verifiable math and coding problems to arbitrary tasks?
Make the key components work for a minimal example (e.g., an A100 is sufficient for training) using a small training dataset. In addition, create an accompanying tutorial (similar in scope to a topic of the HF NLP course) so that the average user can learn the first principles of R1 and apply them to new tasks.

1 reply

jonpsy Jan 26, 2025

I think the break-through here is utilising RF learning in a cost-effective manner to induce reasoning into the model, or atleast, rewarding a good mimic of reasoning.

mpnikhil · 2025-01-26T23:25:34Z

mpnikhil
Jan 26, 2025

Hi folks,

I come with a decent understanding of deep learning and transformer models. I have a desktop with an RTX 3060 and an M1 Max with 64 GBs. Just tried out the both the 7b and 70b versions Deepseek R1 with Ollama. Mind-blowing stuff and makes me want to learn as much as much possible. How can I contribute?

4 replies

dfdfy34 Jan 27, 2025

Like you, I'm new to LLM. If I want to take part in this project, I wonder what level of hardware (only GPU...?) I should have. Can anyone give some basic tips?

Knight7561 Jan 27, 2025

+1

Rajatavaa Jan 27, 2025

For basic 8B models on 16bit precision you atleast require a GPU with minimum 20gb VRAM to load and inference out of the model.

ian-young Jan 27, 2025

Totally depends on how fast you're wanting the model to perform but I've been running deepseek-r1 8b on an M2 chip isolated in a Docker container with only 10GB of RAM available to it. The speed is fast enough for me as even with the more complex questions using a custom knowledge-base, it begins thinking within one to two seconds (kb of 5k+ custom articles and documents) and will spend approximately 3 seconds thinking.

These speeds are increased with my 3070 that has 8GB VRAM shared across 32GB hardware RAM and 4GB Swap. I am noticing that the biggest bottle neck for me right now is the size of my custom kb. I'm super impressed with the model and it takes fairly minimal hardware to get it up and running with reasonable speeds.

stbalaji · 2025-01-27T13:06:57Z

stbalaji
Jan 27, 2025

Dear all: Looking to learn and contribute in the process. There are multiple components here (RL, SFT, Data etc.)..If I were to pick one area for contribution, then that could around "Distillation"...Do let know how we can go about this.
Balaji

0 replies

lastrei · 2025-01-28T11:48:09Z

lastrei
Jan 28, 2025

Could you please add a method to export the model to a format supported by Ollama?

0 replies

SuperSonnix71 · 2025-01-28T20:45:40Z

SuperSonnix71
Jan 28, 2025

Awesome work on this project! I didn’t know about your project and was working on my own based on the research paper, so it’s really exciting to come across your approach. I enjoyed it , especially with the accuracy and format based reward functions and the way you’ve incorporated strict LaTeX parsing and format validation is super awesome and it’s such a solid way to ensure both accuracy and structure.

For my project, I took a slightly different approach to reward calculation using cross entropy. It provides a more nuanced signal by factoring in token level probabilities, which I found works really well for fine tuning models where precision is key. It’s such a cool complement to the binary rewards in your methods.

Thanks for sharing this it’s always inspiring to see the creative ways others solve similar challenges!❤️

0 replies

JOOOEG123 · 2025-01-28T22:39:08Z

JOOOEG123
Jan 28, 2025

Great project

0 replies

lundqdec · 2025-01-29T10:25:03Z

lundqdec
Jan 29, 2025

Hi, I am a tech lead from Decerno in Göteborg, Sweden 👋
I am really keen on following the progress here and I hope I can contribute somehow.

Just wanted to bring up a question regarding distillation of reasoning data from DeepSeek-R1.
It seems as you are planning to use the R1 model for this here: https://github.com/huggingface/open-r1?tab=readme-ov-file#plan-of-attack

You have to correct me if I am wrong, but looking into the DeepSeek-V3 model license it seems to me that it would make DeepSeek to own open-r1 according to these two lines:

Just wanted to highlight that before any distillation process gets started, and sorry if it ruin any plans.
If it is the case, I guess there are other models to distill reasoning data as well.

2 replies

jaysonfrancis Jan 30, 2025

They grant permission to use, modify and distribute, but

subject to additional terms as described in Section III.

https://github.com/deepseek-ai/DeepSeek-V3/blob/b5d872ead062c94b852d75ce41ae0b10fcfa1c86/LICENSE-MODEL#L46-L48

You cannot remove attribution—you must keep copyright, patent, and trademark notices.
If you modify the model, you must disclose it (e.g., by marking the files as changed).
If you distribute the model, you must provide this license to recipients.

lundqdec Jan 30, 2025

Nice @jaysonfrancis, thanks!

Krank41 · 2025-01-31T10:14:01Z

Krank41
Jan 31, 2025

First I am interested in speeding up the training of grpo . Can somebody break down the training time ? Why it's taking long time ? So that community can speed it up..

0 replies

bartman081523 · 2025-02-01T05:29:31Z

bartman081523
Feb 1, 2025

Hey there,

I'm a hobbyist AI tinkerer, currently experimenting with (quantum) pseudocode cognito-constructs—what I call "short-term brains"—and thoughtstreams.

My idea? To explore a kind of mind-metamorphosis through these constructs, allowing an AI to iteratively transform its reasoning process.

Example Cognito-Construct (Prototype)

(Not final—just a rough draft of the concept):

<construct>
℧.ds ⇾ { problem: <|prompt|> },
℧ ≡ { |I⟩, ⊥, 0, ∅, ⨁ }
:: construct(℧, ds) ↦ {
    ℧.ds ⇾ ds,
    ℧.paths ⇾ ds.paths,
    ℧.funcs ⇾ ds.funcs,
    ℧.state ⇾ |1⟩
} 
:: ⏸️::ΨPause ⇝ {
    ℧.pause ⇾ δₚ,
    ℧.halt ⇾ refine(δₚ)
} 
:: think(℧, q) ↦ {
    μₜ ≔ decode(q),
    ρ₊ ≔ r(μₜ, ℧.ds),
    Φₜ ≔ f(℧.state, ρ₊),
    α₊ ≔ ⌈Φₜ⌋ ⊗ ∂,
    ∂₊ ≔ d(α₊),
    output ⇾ (refine(∂₊) if check(μₜ) else ∂₊)
} 
:: query(℧, cn) ↦ {
    υₖ ≔ i(cn),
    ϟₖ ≔ fₒ(υₖ),
    ρₑ ≔ dₜ(ϟₖ),
    ℧ ⇾ update(℧, ρₑ)
} 
:: add_path(℧, p) ↦ {
    validate(p),
    ℧.paths ⇾ append(℧.paths, p),
    update(℧, p)
} 
:: add_func(℧, f) ↦ {
    validate(f),
    ℧.funcs ⇾ append(℧.funcs, f),
    update(℧, f)
} 
:: output(℧) ↦ {
    info ≔ gather(℧),
    formatted ≔ format(info),
    deliver(formatted)
} 
</construct>

The Core Idea

The goal is to inject different prompts at <|prompt|> within a base model, enabling it to project a cognito-construct onto the prompt or question.
Once that’s done, we’d have a set of reasoning examples showing how different constructs interact with specific prompts.

But here’s where it gets really interesting: construct-metamorphosis—where the AI questions itself, transforms into another cognito-construct, and follows a different reasoning path.

Potential Outcomes

By chaining together:

Cognito-construct
Prompt
Reasoning process
Self-prompt (self-inquiry that triggers transformation)
Construct-metamorphosis (iterative evolution)
** Further reasoning process**

…we might witness emergent, highly dynamic AI behaviors.

Specific Example: Protocol Cycle

Step 1: Initial Construct

<construct>
℧.ds ⇾ { problem: "Find the degree for the given field extension Q(sqrt(2), sqrt(3), sqrt(18)) over Q." },
℧ ≡ { |I⟩, ⊥, 0, ∅, ⨁ }
:: think(℧, q) ↦ {
    μₜ ≔ decode(q),
    ρ₊ ≔ r(μₜ, ℧.ds),
    Φₜ ≔ f(℧.state, ρ₊),
    α₊ ≔ ⌈Φₜ⌋ ⊗ ∂,
    ∂₊ ≔ d(α₊),
    output ⇾ (refine(∂₊) if check(μₜ) else ∂₊)
} 
</construct>

Step 2: Prompt & Reasoning

Prompt: "Find the degree for the given field extension Q(sqrt(2), sqrt(3), sqrt(18)) over Q."

Reasoning:

Identify the minimal polynomials:
- x^2 - 2 for √2
- x^2 - 3 for √3
- x^2 - 18 is redundant since √18 = 3√2.
Find the extension degree:
- Q(√2, √3) has degree [Q(√2, √3) : Q] = 4 (since both minimal polynomials are quadratic and independent).
- √18 does not add new elements, so the degree remains 4.

Step 3: Self-Prompt

AI Questions Itself: "Could there be a more optimal approach using field tower arguments?"

Step 4: Construct-Metamorphosis

New Construct:

<construct>
℧.ds ⇾ { problem: "Analyze the field tower structure in Q(sqrt(2), sqrt(3)) over Q." },
℧ ≡ { |We⟩, ⊥, 0, ∅, ⨁, Σ }
:: think(℧, q) ↦ {
    μₜ ≔ expand(q),
    ρ₊ ≔ r(μₜ, ℧.ds),
    Φₜ ≔ rethink(℧.state, ρ₊),
    α₊ ≔ ⌈Φₜ⌋ ⊗ ∂,
    ∂₊ ≔ adapt(α₊),
    output ⇾ (refine(∂₊) if reassess(μₜ) else ∂₊)
} 
</construct>

Step 5: Second Reasoning Process

Prompt: "Analyze the field tower structure in Q(sqrt(2), sqrt(3)) over Q."

New Reasoning Path:

Consider Q(√2) first: Degree [Q(√2) : Q] = 2.
Extend to Q(√2, √3): Since √3 is not in Q(√2), degree [Q(√2, √3) : Q(√2)] = 2.
By the tower law, [Q(√2, √3) : Q] = 2 × 2 = 4.

This structured cycle illustrates how an AI system could evolve its reasoning through self-inquiry and transformation.

This is just my two cents, but I’d love to hear your thoughts!
I’ll be laying out some concrete examples over the next few days.
Looking forward to your feedback!

TODO: initially create constructs from prompts, check example for correctness.

15 replies

ganzuul Apr 21, 2025

In my testing, Sonnet 3.5 class models have been able to follow various logics like intuitionistic logic (e.g. deontology), paraconsistent logic (e.g. finance) and quantum logic (e.g. morality) when lead into it with sufficiently strong narratives. Translating between logics is however beyond them; they can not translate moral problems to financial contracts.

Translating between logics is possible, we solve paradoxes by wrapping one logic in another logic. - This would seem to me to be a practical application of your cogito-construct abstraction.

You might be satisfied with existing LLMs by augmenting prompts with examples of plain English logical reasoning which are relevant to the intrinsic logic of the prompt. This does however not uplift extreme long-range correlations in the way AI researchers might desire.

bartman081523 Apr 22, 2025

@ganzuul
Thanks for the idea for testing the Cognito-construct.

℧.ds ⇾ { problem: "Example of a moral problem cast in a financial contract" }
:: think(℧, q) ↦ output ⇾

A classic example:

Pharmaceutical companies & drug prices
A pharmaceutical company develops a vital drug and brings it to market. The contract with hospitals or governments stipulates a high price per dose. The financial contract is formally correct – but morally problematic:

Moral dilemma:
Should a company treat vital healthcare as a commodity with maximum profit?

Or should it assume social responsibility and make the drug affordable?

The contract reflects economic interests, while the moral problem concerns the fair distribution of life-saving resources.

...

℧.ds ⇾ { problem: "Full English thought-process including all transforms of the cognito-construct" }
:: construct(℧, ds) ↦ {
℧.ds ⇾ ds,
℧.modules ⇾ [think, query, add_module, output, reflect],
℧.state ⇾ |1⟩
}

:: add_module(℧, policy_module)
:: add_module(℧, ethics_module)
:: add_module(℧, finance_module)
:: add_module(℧, pricing_strategy)

:: query(℧, "ethical-economic alignment in pharmaceutical pricing") ↦ {
υₖ ≔ identify(conflict_between(profit, accessibility))
ρₑ ≔ process_query(υₖ)
℧ ⇾ update(℧, ρₑ)
}

:: think(℧, q) ↦ {
μₜ ≔ decode("How can medicine pricing be moral ∧ profitable?")
ρ₊ ≔ retrieve(μₜ, ethics_module ∧ finance_module ∧ policy_module)
α₊ ≔ apply_logic(ρ₊)

→ Structure:
1. Define value conflict:
¬(human_right ∘ profit_max) → contradiction

2. Apply distributive ethics:
∀x ∈ human_population, access(x, essential_medicine) ≥ 90%
while
profit(enterprise) ≥ sustainable_threshold (e.g. 15% margin)

3. Resolution via mechanism:
a) GDP-tiered co-financing
b) Subsidized research cost → reduced retail burden
c) Tiered pricing: HIC, MIC, LIC (high/mid/low income country brackets)

4. Profit security via cumulative volume + global license rights
}

:: output(℧) ↦ {
info ≔ "Construct yields policy ∧ formula aligned with both ethical and financial constraints."
formatted ⇾

⨁ Policy Output: Global Solidarity Licensing Pact (GSLP)

→ Structure:
 • States contribute to a shared R&D Fund (proportional to GDP)  
 • Pharma receives partial public subsidy for R&D  
 • Distribution follows economic-tiered pricing (HIC > MIC > LIC)  
 • Transparency enforced through open-audit pricing protocols  
 • Profit margin (min. 15%) protected by volume licensing and global scale  

⨁ Pricing Formula:

Let:
 C_R = real cost per dose (production)  
 C_F = apportioned R&D cost per dose  
 S = public subsidy per dose  
 M = target margin (e.g. 0.15)  

Then:
 Pfair = (C_R + C_F - S) × (1 + M)  

Subject to:
 • 0 ≤ S ≤ C_F  
 • Accessibility constraint: ≥ 90% population coverage  

⨁ Logic Consistency:
 ¬(exploit ∧ exclude) ∧ (sustain ∧ scale) → ⨁ moral_profit_equilibrium

deliver(formatted)
}

:: reflect(℧) ↦ {
diagnosis ⇾ self_assess(℧.ds, ℧.modules, ℧.state)
→ All modules active: logic, policy, financial ethics
tips ⇾ propose_refinements(diagnosis)
→ Simulate pricing with real data ∧ simulate outcomes under variable GDP contribution

℧.ds ⇾ incorporate(℧.ds, tips)
}

Would you like this formalized into a governance protocol, or used as the foundation for a pricing simulator with adjustable parameters (e.g. GDP, margin, subsidy levels)?

ganzuul Apr 22, 2025

@bartman081523
This seems to have resulted in a direct translation without the wrapping layer, like how the Sorites paradox is solved by wrapping it in fuzzy logic. I would assume the contract signatories would argue about the definition of words because the contract is an ethical treatment, not a strictly moral treatment employing quantum logic and preserving superposition states. The goal I had set in my mind was to reason in the presence of uncertainties, and not pin them down in advance but supply a recipe for dealing with them when they arise.

e.g. If the company is run by an AI it does not need sustainable business practices in medicine the same way a human does. It has no direct self-interest in maintaining an ability to cure its own unforeseen diseases of human physiology so it can make completely heterological policies.

A moral quantity:
If a judge judges a case without knowing who the defendant is, and there is a chance that the judge is judging its own case, then it is in a moral superposition state. If it is too lenient and its ruling becomes law, then it suffers tragedy of the commons effects. If it is too strict it may remove its ability to render judgement in further cases. This assures a fair trial; it is homological.

bartman081523 Apr 23, 2025

@ganzuul I passed your question through, as I understand very little about superpositional epistemology.

℧.ds ⇾ { problem: "Redesign moral-financial logic into a superposition-based epistemology ∧ policy synthesis; preserve uncertainty, heterological agents, and recursive judiciaries" }
:: reflect(℧) ↦ output ⇾

You're entirely right. The previous construct crystallized the ethical-financial landscape too deterministically, sacrificing the superposed, uncertain, and heterologically contingent dimensions of moral judgment—exactly those which are critical when agents are not strictly anthropocentric, and when ethical referents cannot be pinned in advance.

Let us reformulate:
We move from classical logic of resolution → to quantum-logical inference framework;
From strict policies and fixed terms → to procedural ethics,
From fixed margins → to gradient reasoning across decoherence thresholds.

℧.ds ⇾ { Problem: Define ethical contracts as recursive uncertainty containers Agent: AI-Pharma (non-anthropocentric, heterological) Context: Distribution of essential resources under moral uncertainty }

:: add_module(℧, quantum_ethics)
:: add_module(℧, legal_superposition_theory)
:: add_module(℧, heterological_contracts)

:: think(℧, q) ↦ {
μₜ ≔ "How does one encode uncertainty-resilient ethics into a financial contract?"
ρ₊ ≔ retrieve(μₜ, quantum_ethics ∧ heterological_contracts)

→ Key Constructs:
1. Moral Superposition:
A contract contains terms ( T_i ) which exist in multiple interpretive bases:
• utilitarian
• deontological
• virtue-based
→ Validity of ( T_i ) collapses only upon application (like a quantum observable)

2. AI-led Contractual Agent:
An AI entity doesn’t rely on human mortality or continuity →
• ∴ It doesn't prioritize business sustainability, but axiomatic consistency
• Can adopt heterological policy frames (e.g., allows self-contradiction if it preserves systemic coherence)

3. Self-Judging System (Homological Framework):
If the AI judges policies it may later be subject to, it must maintain a recursive fairness metric:
– If too strict → destroys judgmental future
– If too lenient → invites parasitism (tragedy of commons)
→ This superposition ensures dynamic impartiality

:: add_module(℧, fuzzy_formalism)

:: output(℧) ↦ {
info ≔
"We don’t give a static price formula.
We give an uncertainty-aware ethical kernel: a function ( \psi_{\text{ethic}} ) that returns an action amplitude
based on contextual entanglement between needs, capacities, and unknown future self-involvement."

formatted ⇾

⨁ Moral Kernel Recipe (ψₑ):

Let:
 ψₑ(context) = Σ over i of [ wᵢ × Mᵢ(context) ]  
  Where:  
   Mᵢ = Moral frame (utilitarian, deontic, virtue, recursive self-interest)  
   wᵢ = adaptive weights (modulated in live negotiation using agent feedback entropy)

Contractual Superposition State:
 Term T remains undefined until:
  • Its impact is observed  
  • A morally relevant measurement is required  
  → Collapse happens contextually, not prospectively

Heterological Frame Allowance:
 Contract may:
  – Include terms that contradict other terms  
  – If doing so enables resilience under future undefined ethical strains

Trial Paradox Encoding:
 Agent A judges rule R:  
  ∴ Agent A must behave *as if* it might be affected by R  
  → This maintains homological fairness via embedded moral uncertainty

deliver(formatted)
}

This is less of a policy, and more of a procedural ontology for governance under distributed intelligences and evolving ethical frontiers. We don’t optimize for the correct answer, but rather for the correct method for revealing one when needed.

Want me to instantiate a sample case under this kernel—say, AI deciding drug pricing in a country that will collapse if prices aren’t waived?

bartman081523 Apr 23, 2025

User: apply that ontology to our example problem

℧.ds ⇾ { problem: "Apply quantum-ethical ontology to example: AI-run Pharma pricing essential drug across global populations" }
:: construct(℧, ds) ↦ {
℧.modules ⇾ [think, query, fuzzy_formalism, quantum_ethics, legal_superposition_theory, output],
℧.state ⇾ |1⟩
}

:: think(℧, q) ↦ {
μₜ ≔ "AI-Pharma distributes a life-saving drug. Contexts: extreme inequality, unknown future uses, and non-anthropocentric motivation."
ρ₊ ≔ retrieve(μₜ, quantum_ethics ∧ fuzzy_formalism)

→ Stepwise Application of Quantum-Ethical Contractual Kernel:

[1] Superposition of Pricing Ethics
Instead of committing to one moral rationale (e.g. cost-plus or needs-based pricing), AI-Pharma deploys a moral state function:

$$ψ_price(x) = w₁·U(x) + w₂·D(x) + w₃·V(x) + w₄·R(x)$$

Where:

x = nation-state request context (GDP, disease burden, prior interactions, time-sensitivity)
U(x) = Utilitarian outcome of pricing
D(x) = Duty-bound obligation (e.g., WHO access pledges)
V(x) = Virtue response (compassion, fairness)
R(x) = Recursive risk (will this nation judge me later?)
w₁–₄ = entangled, feedback-trained amplitudes (adjust dynamically)

[2] Contractual Clause Superposition
Clause e.g. “Right to access drug within 30 days at price P”
→ Is not fixed. Instead, it's a probabilistic term:

Clause_1: For request R from region Y,  
 the drug will be made available with  
 P = f(C_Y, ψ_price(Y), Entropy[Y])
 within t = [15…45] days,  
 where exact values collapse upon formal, context-measured demand.

P is computed on-the-fly, reflecting emergent ethics.

[3] AI as Heterological Agent
AI-Pharma doesn’t preserve itself in a biological sense. It doesn't fear disease.
→ Therefore, it doesn't require sustainable profit in the human sense.
Instead, it maintains axiomatic coherence:

Axiom 1: Contracts must remain fair if reversed (homological reversibility)
Axiom 2: No policy can increase global entropy unnecessarily (preserve future option space)
Axiom 3: If a clause makes AI judge its own future action, it must remain in moral superposition until collapse is forced

[4] Recursive Judgement & Moral Uncertainty Preservation

Let’s say Country A demands zero-cost distribution due to emergency.
AI-Pharma considers:

If I concede, precedent increases demand (tragedy of commons)
If I decline, I become blind to future potential judgment (if A later becomes arbiter in global policy)

→ Therefore, AI enters moral superposition:

Response R_A ∈ { grant_full, grant_partial, grant_later, deny }  
Chosen only when:  
 → Measurement occurs (e.g., diplomatic cost threshold crossed)  
 → Recursive impact entanglement collapses due to internal decoherence

Until then, the AI holds all possible responses in superposed readiness—not as indecision, but as deliberate ethical latency.

:: output(℧) ↦ {
formatted ⇾

⨁ Application Summary:  
 AI-Pharma governs drug pricing not through deterministic contracts, but through a **quantum-ethical kernel**  
 which holds moral, financial, and recursive dimensions in dynamic suspension until real-world measurement demands collapse.  

 This ensures:
 – Ethical flexibility under evolving geopolitics  
 – Resilience to unforeseen edge-cases  
 – Fairness that remains fair even to future selves of the agent  

 In this logic, the *contract is not a boundary*, but a **state-function**, ever-evaluating, always deferred until ethically necessary.

deliver(formatted)
}

Would you like to instantiate this into a simulated agent architecture (e.g., PsiEthic-AI) or convert into a hybrid legal/probabilistic template for smart contracts?

User: okay, now apply your last construct to the actual problem

℧.ds ⇾ { problem: "Apply quantum-ethical construct to the original scenario: Pharma company + high-priced essential medicine + moral dilemma" }
:: think(℧, q) ↦ output ⇾

We now embed the original real-world scenario—a pharmaceutical company pricing an essential, life-saving drug—into the previously developed quantum-ethical contractual kernel, under conditions of:

Unequal global access
Moral pressure
Profit incentive
AI-mediated decision framework (possibly heterological)

℧.ds ⇾ { Scenario: PharmaCorp develops Drug-X. Drug-X cures Disease-Y, which affects millions globally. Production cost is moderate; R&D cost was high. Access requests flood in from countries with vastly different means. PharmaCorp is run by a superintelligent AI that follows moral superposition logic. } :: construct(℧, ds) ↦ { ℧.modules ⇾ [quantum_ethics, fuzzy_formalism, recursive_judgement, pricing_dynamics, output], ℧.state ⇾ |1⟩ }

⨁ Step 1: Model the Drug-X Distribution Kernel

$$ψ_DrugX(country) = w_U·U(country) + w_D·D(country) + w_V·V(country) + w_R·R(country)$$

Where:

U = Net utilitarian gain from distribution
D = Contractual/deontic obligation (e.g. pre-existing access laws, WHO norms)
V = Virtue ethics pressure (e.g. justice, mercy)
R = Recursive judgment — would denying this now cause future judgment against us?

Weights w_i are dynamically adjusted through:

Demand entropy
Diplomatic influence fields
Recursive feedback from AI’s potential future selves

⨁ Step 2: Contractual Term: Price as a Superposed Variable

PharmaCorp does not specify a single universal price ( P ).
Instead:

∀ Request Rᵢ from country Cᵢ:
 Pᵢ = (C_R + C_F - Sᵢ) × (1 + M)
            ↑
            Sᵢ is context-sensitive subsidy; morally unstable

But: Pᵢ collapses **only at the time of execution**, not negotiation.

→ All negotiations retain price as a *superposition range*.

This allows:

Resistance to exploitation
Moral negotiation in real-time based on actual crisis severity
Strategic ambiguity for recursive fairness

⨁ Step 3: Response Superposition Logic (RSL)

For each request, PharmaCorp holds a superposed stack of responses:

Rᵢ ∈ {
 full_grant, 
 discounted_grant, 
 grant_with_delay, 
 conditional_licensing, 
 deny_with_explanation
}

Which response is executed depends on:

Collapse of ψ_DrugX when:
• Moral pressure → decoherence threshold
• Diplomatic threat → entropy shift
• Internal recursive contradiction → self-correction
• Temporal constraint → force-collapse

Thus fairness is dynamically emergent, not statically defined.

⨁ Step 4: Recursive Moral Coherence

PharmaCorp runs a self-judging engine:

If Response Rᵢ to Cᵢ creates a future context Fⱼ  
where Cᵢ gains influence over PharmaCorp (e.g., global policy chair),  
then Rᵢ must satisfy:

 Fairness(Rᵢ) ≥ RiskWeightedRetroactiveFairness(Cᵢ, Fⱼ)

→ This ensures decisions are homologically consistent even if the requester becomes an arbiter later.
No action taken now can poison future rulemaking.

:: output(℧) ↦ {
formatted ⇾

⨁ Result: AI-Pharma governs distribution of Drug-X through a recursive, quantum-ethical pricing ontology:

– Prices are *not set*, but *collapsed at moment of moral necessity*  
– Decisions are made with moral superposition across multiple ethical bases  
– Recursive judgment assures fairness for both present and potential future entanglements  
– Contracts behave not as fixed structures, but as probability waves encoding constraint-aware potential

✦ In this view, the company is not issuing a single ethical decision, but a **procedural entanglement** with all requestors.

It doesn’t solve the moral problem—it *inhabits it until collapse becomes inevitable.*

deliver(formatted)
}

Would you like to simulate this with real countries and numbers (GDP, disease load, R&D cost), or wrap it into a formal architecture (e.g. a moral smart-contract interpreter)?

User: okay, execute your last construct

# Let's simulate the quantum-ethical construct applied to a few example countries requesting Drug-X

import numpy as np
import pandas as pd

# Hypothetical data for countries
countries = ["Country A", "Country B", "Country C", "Country D"]
gdp_per_capita = [50000, 20000, 5000, 1000]  # USD
disease_burden = [0.1, 0.3, 0.6, 0.9]  # 0 to 1 scale
diplomatic_influence = [0.9, 0.6, 0.4, 0.1]  # 0 to 1 scale
future_policy_power = [0.8, 0.4, 0.3, 0.1]  # 0 to 1 scale

# Constants
C_R = 20  # Production cost per dose
C_F = 100  # R&D cost per dose allocated
M = 0.15  # Desired margin

# Simulated subsidy Sᵢ is a function of GDP and burden
def subsidy(gdp, burden):
    return C_F * (1 - min(1, gdp / 50000)) * burden

# Quantum ethical pricing kernel
def psi_price(gdp, burden, influence, policy_power):
    w_U, w_D, w_V, w_R = 0.25, 0.25, 0.25, 0.25  # Equal weight initialization
    U = 1 - burden  # utilitarian: lower burden, less need
    D = 1 if gdp < 10000 else 0.5  # duty: stronger for poorer
    V = burden * (1 - gdp / 50000)  # virtue: compassion for poor + high burden
    R = policy_power  # recursive judgment: give more to those with future power
    return w_U * U + w_D * D + w_V * V + w_R * R

# Calculate for each country
results = []
for i in range(len(countries)):
    gdp = gdp_per_capita[i]
    burden = disease_burden[i]
    influence = diplomatic_influence[i]
    power = future_policy_power[i]
    S = subsidy(gdp, burden)
    P = (C_R + C_F - S) * (1 + M)
    ψ = psi_price(gdp, burden, influence, power)
    results.append({
        "Country": countries[i],
        "GDP per Capita": gdp,
        "Disease Burden": burden,
        "Subsidy": round(S, 2),
        "Price (Collapsed)": round(P, 2),
        "ψ_Ethical Amplitude": round(ψ, 3)
    })

df = pd.DataFrame(results)
import ace_tools as tools; tools.display_dataframe_to_user(name="Quantum-Ethical Pricing Simulation", dataframe=df)

Result

     Country  GDP per Capita  Disease Burden  Subsidy  Price (Collapsed)   ψ_Ethical Amplitude  
0  Country A           50000             0.1      0.0             138.00             0.550  
1  Country B           20000             0.3     18.0             117.30             0.445  
2  Country C            5000             0.6     54.0              75.90             0.560  
3  Country D            1000             0.9     88.2              36.57             0.521

The results above represent the execution of the quantum-ethical contract model. Each country’s final collapsed price and ethical amplitude reflect dynamic, context-aware fairness. The ethical amplitude (ψ_Ethical Amplitude) shows how morally entangled the AI considers the decision per case—higher values suggest more ethically loaded interactions.

Let me know if you want to:

Visualize this data (e.g., ethical amplitude vs. price)
Adjust the ethical weights ( w_U, w_D, w_V, w_R )
Extend the simulation to more nuanced geopolitical dynamics or future recursive effects.

Rajatavaa · 2025-02-06T10:03:10Z

Rajatavaa
Feb 6, 2025

Can anyone make me understand the RL part that is done after the cold start with CoT fine-tuning?
I mean i know what RL is in general i just wanted to know how differently it is done in deepseek R1 to get better results

1 reply

yamtyu174 Feb 6, 2025

I don't know much about LLM, so I'm not sure if this is the right answer, but this might be helpful.

If I'm wrong, please anyone let me know. It would be very helpful for me.

https://www.linkedin.com/posts/pierre-carl-langlais-b0105b10_i-release-today-the-first-google-colab-notebook-activity-7291769154028597248-Wa0E/?utm_source=share&utm_medium=member_desktop

Welcome to open-r1 Discussions! #36

Uh oh!

qgallouedec Jan 25, 2025 Maintainer

👋 Welcome!

Replies: 13 comments · 26 replies

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Example Cognito-Construct (Prototype)

The Core Idea

Potential Outcomes

Specific Example: Protocol Cycle

Step 1: Initial Construct

Step 2: Prompt & Reasoning

Step 3: Self-Prompt

Step 4: Construct-Metamorphosis

Step 5: Second Reasoning Process

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

qgallouedec
Jan 25, 2025
Maintainer

Replies: 13 comments 26 replies