Skip to content
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
55 changes: 55 additions & 0 deletions packages/tasks/src/model-libraries-snippets.ts
Original file line number Diff line number Diff line change
Expand Up @@ -132,6 +132,61 @@ wav = model.generate(text, audio_prompt_path=AUDIO_PROMPT_PATH)
ta.save("test-2.wav", wav, model.sr)`,
];

export const contexttab = (): string[] => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I can see, these 3 snippets don't really show how to use the model in the repo. Adding snippets would be very beneficial if contexttab had a from_pretrained method for instance to load a pre-trained model from a repo_id. By doing so, you allow users to share their own weights in new repos which will result in personalized snippets to reuse their model. Does that make sense to you?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I understand the confusion! I've omitted the default parameters, but both ConTextTabClassifier and ConTextTabRegressor shown in the example use the HF hub model weights, taking a checkpoint and checkpoint_revision argument pointing to the HF repo. I can add it to be more verbose if you like.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Wauplin : done ✅

const installSnippet = `pip install git+https://github.com/SAP-samples/contexttab`;

const classificationSnippet = `# Run a classification task
from sklearn.datasets import load_breast_cancer
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split

from contexttab import ConTextTabClassifier

# Load sample data
X, y = load_breast_cancer(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.5, random_state=42)

# Initialize a classifier
clf = ConTextTabClassifier(bagging=1, max_context_size=2048)

clf.fit(X_train, y_train)

# Predict probabilities
prediction_probabilities = clf.predict_proba(X_test)
# Predict labels
predictions = clf.predict(X_test)
print("Accuracy", accuracy_score(y_test, predictions))`;

const regressionsSnippet = `# Run a regression task
from sklearn.datasets import fetch_openml
from sklearn.metrics import r2_score
from sklearn.model_selection import train_test_split

from contexttab import ConTextTabRegressor


# Load sample data
df = fetch_openml(data_id=531, as_frame=True)
X = df.data
y = df.target.astype(float)

# Train-test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.5, random_state=42)

# Initialize the regressor
regressor = ConTextTabRegressor(bagging=1, max_context_size=2048)

regressor.fit(X_train, y_train)

# Predict on the test set
predictions = regressor.predict(X_test)

r2 = r2_score(y_test, predictions)
print("R² Score:", r2)`;
return [installSnippet, classificationSnippet, regressionsSnippet];
};


export const cxr_foundation = (): string[] => [
`# pip install git+https://github.com/Google-Health/cxr-foundation.git#subdirectory=python

Expand Down
7 changes: 7 additions & 0 deletions packages/tasks/src/model-libraries.ts
Original file line number Diff line number Diff line change
Expand Up @@ -208,6 +208,13 @@ export const MODEL_LIBRARIES_UI_ELEMENTS = {
repoUrl: "https://github.com/Unbabel/COMET/",
countDownloads: `path:"hparams.yaml"`,
},
contexttab: {
prettyLabel: "ConTextTab",
repoName: "contexttab",
repoUrl: "https://github.com/SAP-samples/contexttab",
countDownloads: `path:"l2/base.pt"`,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Getting back to the discussion above #1528 (comment).

I do think the best structure would be:

  • store the weights at top level in the repo as mentioned by @Vaibhavs10
  • create 1 repo per checkpoint that you want to add
  • add in each readme why the checkpoint is particular (how it has been trained, which data, etc.). It is still possible to refer to a "base repo" for more details about the family of models
  • have all models tracked with library_name: contexttab

Doing so has many advantages:

  1. you can define a common structure for all contexttab repos on the Hub and therefore implement methods like from_pretrained/push_to_hub in your library that will help other users build on top of your work and share their results
  2. related to 1., the snippets section should be much more relevant with personalized snippets (see above)
  3. you will have 1 download counter per checkpoint. If everything is in the same repo, the counter will mix everything and you won't really know which ones got the most traction
  4. no need to maintain a countDownloads` rule with hardcoded paths
  5. improves discoverability for users with correct tags and filtering

This is why we usually strongly recommend 1 model == 1 repo instead of having all of them in one repo. If you want, you can use the Collection feature to nicely group several repos.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the context! I absolutely understand the benefits, however we have to go through an approval process for open sourcing repositories so this approach would add quite a lot of overhead on our side. Hence, having the folder structure in the repo together with version tagging is, for us, much more efficient to maintain. I do understand the downsides related to, e.g., download counting but would propose to keep it as is and only create new repositories for major model changes. Would that work for you?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand the constraints, so yes let's keep it like this.

Copy link
Contributor

@Wauplin Wauplin Jun 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw we double-checked with @Vaibhavs10 and path_extension:"pt" works correclty on sub-folders so you won't have to manage a list of hardcoded paths

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah nice, perfect, than I'll revert that back!

snippets: snippets.contexttab,
},
cosmos: {
prettyLabel: "Cosmos",
repoName: "Cosmos",
Expand Down