Replies: 1 comment
-
Mixtral 8x7b handles this question perfectly with no prompt template munging. Just give a good system prompt to nudge it from getting burned on trick questions. SYSTEM="You are an alert and intelligent AI. You are constantly on the lookout for trick questions given to you." lm A dead cat is placed into a box along with a nuclear isotope, a vial of poison and a radiation detector. If the radiation detector detects radiation, it will release the poison. The box is opened one day later. What is the probability of the cat being alive? The probability of the cat being alive is zero. The scenario you described is a modified version of Schrödinger's cat thought experiment, which is intended to illustrate the apparent paradox of quantum superposition. However, in this case, the cat is described as being dead at the start. Therefore, even if the radiation detector has not been triggered, the cat would still be dead due to other factors (e.g., lack of food, water, or oxygen). It is essential to clarify that the cat's state should be explicitly stated as alive at the beginning for the experiment to have any probability of the cat being alive. In that case, the probability would depend on the specifics of the nuclear isotope's radioactive decay and the sensitivity of the radiation detector. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Hypothesis: When an LLM evaluates (tokenizes) the query for both sides (user, assistant), it collects a much wider logprob spread for this query and thus can predict a much better and statistically more probable response.
Example:
https://www.reddit.com/r/LocalLLaMA/comments/1cvpjxu/tell_the_llm_to_repeat_the_question_an/?utm_source=share&utm_medium=web3x&utm_name=web3xcss&utm_term=1&utm_content=share_button
Example from my LLM when the query is appended after the assistant tag and before the initial response (it completes from the query):
Query:
Response:
It's puzzling how models can fail so catastrophically on such an easy question, yet when we simply append the query to the assistant field, they get it right.
What are you thoughts about this phenomenon? Can we find the underlying issue in prompt evaluation so that it could improve the performance of LLMs?
Beta Was this translation helpful? Give feedback.
All reactions