Implement and perform the interpretability analysis of the BigScience models

**duration**: scalable, can be both 175 and 350 hours
**mentor**: @oserikov 
**difficulty**: easy
**requirements**: 
1. pytorch
2. sklearn
3. experience with re-using the academic code
4. experience with Transformer Language models


**useful links**:
- [Models](https://huggingface.co/bigscience) produced by BigScience
- BigScience Interpretability papers [curated list](https://docs.google.com/spreadsheets/d/1D2dD2Xbpr6nfW4MLv1MnHtlHIeOta54c9rsuUfB7pPM/edit?usp=sharing)
- [Survey on probing classifiers](https://direct.mit.edu/coli/article/doi/10.1162/coli_a_00422/107571/Probing-Classifiers-Promises-Shortcomings-and)
- [A Primer on Bertology](https://direct.mit.edu/tacl/article/doi/10.1162/tacl_a_00349/96482/A-Primer-in-BERTology-What-We-Know-About-How-BERT)



## Idea Description: 
During the season 2021/22, the BigScience team reached several crucial milestones by producing large-scale transformer language models. Some of them even come with the training checkpoints archived, thus allowing to study the emergence of the structures in language models. During this task, we propose to cover the released models with the supplementary interpretability information by applying classical XAI and probing methods described in the attached papers.

### Coding Challenge
To better feel what the interpretability work looks like, we ask you to perform a diagnostic classification study of the GPT-like language model, using the [SentEval data](https://github.com/facebookresearch/SentEval/). Reach out to mentors as soon as possible to discuss the analysis results.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Implement and perform the interpretability analysis of the BigScience models #2

Idea Description:

Coding Challenge

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Implement and perform the interpretability analysis of the BigScience models #2

Description

Idea Description:

Coding Challenge

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions