Skip to content

Commit f4d3d23

Browse files
committed
Added year to blog dates
1 parent 826ceab commit f4d3d23

File tree

4 files changed

+8
-8
lines changed

4 files changed

+8
-8
lines changed

_includes/sections/blog.html

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -36,7 +36,7 @@ <h2>{{ site.blog-title }}</h2>
3636
</div>
3737
<div class="article-title">
3838
<h3 class="title"><a href="{{ site.baseurl }}{{ post.url }}">{{ post.shorttitle }}</a></h3>
39-
<h4 class="category">{{ post.date | date: "%b" }} {{ post.date | date: "%d" }} </h4>
39+
<h4 class="category">{{ post.date | date: "%b" }} {{ post.date | date: "%d" }}, {{ post.date | date: "%Y" }} </h4>
4040
</div>
4141
</header>
4242

_layouts/post.html

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@
1515
</p>
1616

1717
<h3 class="title">{{ page.title }}</h3>
18-
<h4 class="category">{{ page.date | date: "%b" }} {{ page.date | date: "%d" }}</h4>
18+
<h4 class="category">{{ page.date | date: "%b" }}, {{ page.date | date: "%d" }} {{ page.date | date: "%Y" }}</h4>
1919
</div>
2020
</header>
2121

_posts/2024-05-30-counting.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@ One of our goals at Polymathic-AI is to utilize the recent advances in machine l
1818

1919
In this blog post, we summarize a recent paper which is part of an ongoing effort in our team in this direction. In this work, we introduced a new toy problem specifically designed to advance the interpretability of Transformer models in quantitative and scientific contexts. This task, called **contextual counting**, requires the model to identify a specific region of interest within a dataset and perform accurate counting. As such, it simulates scenarios where precise localization and subsequent computation are critical, such as in object detection or region-based analysis in scientific data.
2020

21-
21+
<br>
2222
## Introducing the Contextual Counting Task
2323
<br>
2424
In this task, the input is a sequence composed of zeros, ones, and square bracket delimiters: `{0, 1, [, ]}`. Each sample sequence contains ones and zeros with several regions marked by the delimiters. The task is to count the number of ones within each delimited region. For example, given the sequence:
@@ -55,7 +55,7 @@ Toy problems serve as simplified models that help us understand complex systems.
5555

5656
Moreover, toy problems are instrumental in benchmarking and testing new theories and methods. They act as proving grounds for hypotheses about model behavior and performance. For instance, by using toy problems, researchers can quickly iterate on models and interpretability techniques, refining their approaches before deploying them on more sophisticated and critical tasks. This iterative process accelerates the development of robust methods that can be confidently applied in high-stakes domains like healthcare, finance, and scientific research. In the context of Transformers, toy problems help uncover how different architectures and encoding methods influence model performance and interpretability, providing essential knowledge for advancing machine learning technologies.
5757

58-
58+
<br>
5959
## Theoretical Insights
6060
<br>
6161
We provide some theoretical insights into the problem, showing that a Transformer with one causal encoding layer and one decoding layer can solve the contextual counting task for arbitrary sequence lengths and numbers of regions.
@@ -88,7 +88,7 @@ For non-causal (bidirectional) Transformers, the task is more complicated:
8888
These propositions highlight the difficulties non-causal Transformers face in solving this task.
8989

9090

91-
91+
<br>
9292
## Experimental Results
9393
<br>
9494
The theoretical results above imply that exact solutions exist but do not clarify whether or not such solutions can indeed be found when the model is trained via SGD. We therefore trained various Transformer architectures on this task. Inspired by the theoretical arguments, we use an encoder-decoder architecture, with one layer and one head for each. A typical output of the network is shown in the following image where the model outputs the probability distribution over the number of ones in each region.
@@ -177,12 +177,12 @@ If you made it this far, here is an interesting bonus point:
177177
* Even though the model has access to the number n through its attention profile, it still does not construct a probability distribution that is sharply peaked at n. As we see in the above figure, as n gets large, this probability distribution gets wider. This, we believe is partly the side-effect of this specific solution where two curves are being balanced against each other. But it is partly a general problem that as the number of tokens that are attended to gets large, we need higher accuracy to be able to infer n exactly. This is because the information about n is coded non-linearly after the attention layer. In this case, if we assume that the model attends to BoS and 1-tokens equally the output becomes:
178178

179179
<p align="center">
180-
<img src="/images/blog/counting/n_dependence.png" alt="The n-dependence of the model output." width="55%" style="mix-blend-mode: darken;">
180+
<img src="/images/blog/counting/n_dependence.png" alt="The n-dependence of the model output." width="25%" style="mix-blend-mode: darken;">
181181
</p>
182182

183183
We see that as n becomes large, the difference between n and n+1 becomes smaller.
184184

185-
185+
<br>
186186
## Conclusion
187187
<br>
188188
The contextual counting task provides a valuable framework for exploring the interpretability of Transformers in scientific and quantitative contexts. Our experiments show that causal Transformers with NoPE can effectively solve this task, while non-causal models struggle. These findings highlight the importance of task-specific interpretability challenges and the potential for developing more robust and generalizable models for scientific applications.

blog.html

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@
1919
</div>
2020
<div class="article-title">
2121
<h3 class="title"><a href="{{ site.baseurl }}{{ post.url }}">{{ post.title }}</a></h3>
22-
<h4 class="category">{{ post.date | date: "%b" }} {{ post.date | date: "%d" }}</h4>
22+
<h4 class="category">{{ post.date | date: "%b" }} {{ post.date | date: "%d" }}, {{ post.date | date: "%Y" }}</h4>
2323
</div>
2424
</header>
2525

0 commit comments

Comments
 (0)