Is there public information about how the oasst llama models were trained? In particular, is [oasst-rlhf-2-llama-30b-7k-steps-xor](https://huggingface.co/OpenAssistant/oasst-rlhf-2-llama-30b-7k-steps-xor) derived from [oasst-sft-7-llama-30b-xor](https://huggingface.co/OpenAssistant/oasst-sft-7-llama-30b-xor) and if so, what hyperparameters were used for rlhf?