-
Notifications
You must be signed in to change notification settings - Fork 1
Description
duration: scalable, can be both 175 and 350 hours
mentor: @oserikov
difficulty: medium
requirements:
- pytorch
- sklearn
- python engineering code, with OOP and patterns
- experience with Transformer Language models
useful links:
Idea Description:
There exist lots of interpretability tools, both for Industry and Academia users.
While some of them are general-purpose, and the others are very field-specific, all of them have several things in common.
One would typically apply them to HuggingFace models. All of these methods try to explain the black-boxes we have.
What we propose is, shortly, to put together the existing popular models interpretation stack. We've made a survey of interpretability for LLMs and now have both scientific and engineering vision of what we should implement in order to maximize the interpretability of the existing LLMs.
You need to implement the HF-compatible interpretability aggregation API. The exact tasks to accomplish are:
- choose the most important methods provided by Captum, Interpret and NeuroX (which ones? to better understand the task, try to figure it out yourself. having done this, reach out to us ASAP and we will discuss your vision)
- implement the all-in-one interpret method to run all the chosen ones
- perform the initial analysis of the BigScience models checkpoints
- ensure the codebase is easy to cover the new methods
Coding Challenge
see task 1.