Question regarding Positional Information and Internal World Models in CTM for Maze Solving #4
-
Dear Sakana AI CTM Team, Sincerely, |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
Hi Tao. Thank you for your kind words and question! We discuss this in Section 4.4 in the paper, but I'll expand and reiterate for you because this is a really crucial question that I believe will be important moving forward with research in this space. I don't quite know if I can speak directly to your TPA perspective, but I shall share my intuition and hopefully that helps. Let us consider an example solve; the CTM must:
Gazing beyond the training path horizonInterestingly, we noticed that even after it has 'looked' all the way to the end of the path that it is predicting (e.g., length 100 in the paper; a hyper-parameter choice), it continues to gaze further along the path. If it 'could' output a longer path, it likely would. Perhaps a future experiment should entail taking the pretrained maze, extending the number of internal ticks (which is crucial to observe this emergent phenomenon) and fine-tuning the The attention heads are more complex than they seemThis is also a crucial point, and you can see it in the demo here: https://pub.sakana.ai/ctm/ . The CTM's attention is not simply looking beyond the path (although on average it seems to be, based on our analysis and the visualizations). Instead, it is looking at multiple locations and some attention heads seem to be gathering more global perspectives. Set the animation speed to be very slow and watch the attention heads (below the maze) carefully to observe this! The patterns 'mature' over timeYou can confirm this for yourself because we did not cover it in the paper, but as the CTM learns what the attention pattern is doing changes over time. On several occasions we observed a sort of 'double take' phenomenon where the CTM would look down a path and then double back to fix any mistakes. As it gets better and learns more it discards this wasteful process, but it is quite interesting to watch it learn. Further, in some instances we observed a 'backwards' solve similar to some of the parity results. Note that these points are based on my intuition and my opinions, and not necessarily proven concretely or shared by everyone on the team. |
Beta Was this translation helpful? Give feedback.
Hi Tao.
Thank you for your kind words and question!
We discuss this in Section 4.4 in the paper, but I'll expand and reiterate for you because this is a really crucial question that I believe will be important moving forward with research in this space. I don't quite know if I can speak directly to your TPA perspective, but I shall share my intuition and hopefully that helps. Let us consider an example solve; the CTM must: