LLM/VLM Thinking steps Feedback Graph Neural Network #8942

TimeLordRaps · 2024-02-20T21:00:48Z

TimeLordRaps
Feb 20, 2024

I want to create a graph neural network that can act as a feedback adapter on top of a language model or visual language model for the purpose of simulating thinking at the expense of inference time compute. A feedback adapter's purpose would be to take in a prompt (w/ image) and the language model and perform an auto-regressive generation of the activations of the network from activations of the networks. How it would work:

For each layer:
    For i in range(layer's feedback times)
        First generate it's standard activations
        Feedback adapter then takes those standard activations and creates a vector that can be fed back into this layer
    Feed final version of activations into next layer

I need to come up with some way to train this type model on top of and/or in conjunction with the current general training LLM/VLM pipeline.
I'm for now assuming it shouldn't be part of pre-training, and should not be trained in conjunction with the frozen language model because we would want to be able to leverage the frozen pre-trained model's capabilities via some sort of feedback intervention. As well, we may possibly be able create a meta-layer on top over multiple pre-trained models to create a general feedback adapter that can be used on any model.

However, the most significant level of overall effectiveness is probably in pairing the gnn's training with the pretraining step of the language model, where not only does the gnn learn how to effectively produce valuable thinking steps, but the language model learns how to depend on these thinking steps to create a more nuanced representation space within the overall model architecture.

I can also imagine that some level of meta-instruction fine tuning in conjunction with regular instruction fine-tuning could also yield good results, where the prompt sent to the GNN contains the prompt sent to the LM as well as meta-instructions on how to process the effort used in thinking about the prompt which could just be like some sort of prompt engineering strategy where all you add is a static instance of a subprompt or a more dynamic meta-instruction generation step where the pretrained model is used to generate instructions of how it should allocate resources for thinking about the prompt.

I wasn't really sure where to ask questions about this idea, or even if gnn's would be the best option.

Two options I just thought of to add more complexity would be different parts of the gnn used on each respective layer of the LM where the meta-objective of training is to learn representations of individual sub thinking steps, while including interactions between layers indirectly, or the second option: a single sort of vector transformation that can be applied unanimously across any layer of the LM, as some sort of meta-feedback mechanism that learns how to not only influence particular layers but has to learn how to model the interactions between layers more directly.

TimeLordRaps · 2024-02-20T23:19:45Z

TimeLordRaps
Feb 20, 2024
Author

Been messing around with different LM's trying to answer some of these questions and one of them, let's call it mistral-next because I was using chat arena and honestly never checked who responded it, developed the following idea:
"""
How to generate meta-instructions for the GNN?
You mentioned two options for generating meta-instructions for the GNN: a static instance of a subprompt or a dynamic meta-instruction generation step using the pretrained model. The static instance of a subprompt can be simply added to the input of the GNN for each layer. The dynamic meta-instruction generation step can be achieved by designing the GNN architecture to take the output of the language model as input and generate meta-instructions based on the model's capabilities and the requirements of each layer.
"""

Looking deeper at the following:
"The static instance of a subprompt can be simply added to the input of the GNN for each layer. "
I didn't conceptualize the combination of prompting strategies with separated layer functions would lead to:
"The dynamic meta-instruction generation step can be achieved by designing the GNN architecture to take the output of the language model as input"
Which totally transforms the meaning of the scale of compute:
Because instead of just feeding back local information from each layer you would be directly feeding back the output of the global effects of that layer into itself by using the final output as part of the meta-instruction.

O yeah "and generate meta-instructions based on the model's capabilities and the requirements of each layer."
This introduces something I would call like self-steering??? where the LM, in the case of my example from the original issue, or the GNN, in the context of mistral-next's generation, inherently learns how to generate meta-instructions for itself.
In the case of the LM it learns how to compress/express what should be done into natural language, which is then used as part of the prompt for the feedback adapter.
Whereas in the case of the GNN it just goes straight to learning how to represent what should be done based solely on the continued output of the mode.
I think this limits the implications of using the LM's skills to determine and reason over how itself should be thinking, and also limits our ability to interpret and transform the mechanisms for which meta-instructions are presented.

0 replies

rusty1s · 2024-02-26T11:15:15Z

rusty1s
Feb 26, 2024
Maintainer

I may not be the best once to answer this question. Does it need to be an LLM? GNNs+LMs have been already paired successfully in research, e.g., in DRAGON.

0 replies

TimeLordRaps · 2024-02-26T18:09:46Z

TimeLordRaps
Feb 26, 2024
Author

That is eye opening as a way of modeling an LM as a simultaneous KG link predictor, however what I'm trying to say is more around the lines of taking a frozen LM, say Llama2, and then using a gnn to learn over the attention activations as the model is used in inference(the LM should never change), to get a representation of the model that can be used in combination with the model as thinking steps. However with the ideas presented within that paper I can imagine some sort of simultaneous training of a DRAGON on the activations while predicting the number of thinking steps to use a layer/node instead of the KG Links. I'm sure DRAGON isn't the best option for this, however this is just my initial reaction. Thank you for your input.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

LLM/VLM Thinking steps Feedback Graph Neural Network #8942

Uh oh!

{{title}}

Uh oh!

Replies: 3 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

LLM/VLM Thinking steps Feedback Graph Neural Network #8942

Uh oh!

TimeLordRaps Feb 20, 2024

Replies: 3 comments

Uh oh!

TimeLordRaps Feb 20, 2024 Author

Uh oh!

rusty1s Feb 26, 2024 Maintainer

Uh oh!

TimeLordRaps Feb 26, 2024 Author

TimeLordRaps
Feb 20, 2024

TimeLordRaps
Feb 20, 2024
Author

rusty1s
Feb 26, 2024
Maintainer

TimeLordRaps
Feb 26, 2024
Author