Skip to content

Commit f053d48

Browse files
committed
Add finetuning, alignment repo
1 parent 4c18d3a commit f053d48

File tree

2 files changed

+62
-22
lines changed

2 files changed

+62
-22
lines changed

docs/index.md

Lines changed: 38 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -278,7 +278,7 @@ a.dataset-tag:hover {
278278

279279
<div class="catalog-stats">
280280
<div class="stat">
281-
<div class="stat-number">47</div>
281+
<div class="stat-number">53</div>
282282
<div class="stat-label">Implementations</div>
283283
</div>
284284
<div class="stat">
@@ -340,6 +340,8 @@ a.dataset-tag:hover {
340340

341341

342342

343+
344+
343345

344346

345347

@@ -374,61 +376,61 @@ a.dataset-tag:hover {
374376
<div class="grid cards" markdown>
375377
<div class="card" markdown>
376378
<div class="header">
377-
<h3><a href="https://github.com/VectorInstitute/diffusion_models" title="Go to Repository">diffusion-models</a></h3>
378-
<span class="tag year-tag">2024</span>
379+
<h3><a href="https://github.com/VectorInstitute/anomaly-detection" title="Go to Repository">anomaly-detection</a></h3>
380+
<span class="tag year-tag">2023</span>
379381
<span class="tag type-tag">bootcamp</span>
380382
</div>
381-
<p>A repository with demos for various diffusion models for tabular and time series data</p>
383+
<p>A repository with implementation of anomaly detection techniques</p>
382384
<div class="tag-container">
383-
<span class="tag" data-tippy="TabDDPM">TabDDPM</span> <span class="tag" data-tippy="TabSyn">TabSyn</span> <a href="https://arxiv.org/abs/2405.17724" class="tag" target="_blank">ClavaDDPM</a> <span class="tag" data-tippy="CSDI">CSDI</span> <a href="https://arxiv.org/abs/2307.11494" class="tag" target="_blank">TSDiff</a>
385+
<span class="tag" data-tippy="Logistic Regression (Supervised)">Logistic Regression (Supervised)</span> <span class="tag" data-tippy="Random Forest (Supervised)">Random Forest (Supervised)</span> <span class="tag" data-tippy="XGBoost (Supervised)">XGBoost (Supervised)</span> <span class="tag" data-tippy="CatBoost (Supervised)">CatBoost (Supervised)</span> <span class="tag" data-tippy="Light GBM (Supervised)">Light GBM (Supervised)</span> <span class="tag" data-tippy="TabNet (Supervised and Semi-supervised)">TabNet (Supervised and Semi-supervised)</span> <span class="tag" data-tippy="Autoencoder (AE) (Unsupervised)">Autoencoder (AE) (Unsupervised)</span> <span class="tag" data-tippy="Isolation Forest (Unsupervised)">Isolation Forest (Unsupervised)</span>
384386
</div>
385387
<div class="datasets">
386-
<strong>Datasets:</strong> <a href="https://www.physionet.org/content/challenge-2012/1.0.0/" class="dataset-tag" target="_blank">Physionet Challenge 2012</a> <a href="https://archive.ics.uci.edu/dataset/321/electricityloaddiagrams20112014" class="dataset-tag" target="_blank">Electricity dataset (UCI Machine Learning Repository)</a>
388+
<strong>Datasets:</strong> <a href="https://arxiv.org/pdf/2211.13358.pdf" class="dataset-tag" target="_blank">Bank Account Fraud Detection</a> <a href="https://dgraph.xinye.com/dataset" class="dataset-tag" target="_blank">DGraph dataset</a> <a href="https://www.mvtec.com/company/research/datasets/mvtec-ad" class="dataset-tag" target="_blank">MVTec dataset</a> <a href="http://www.svcl.ucsd.edu/projects/anomaly/dataset.htm" class="dataset-tag" target="_blank">UCSD Anomaly Detection Dataset</a> <a href="https://www.kaggle.com/datasets/odins0n/ucf-crime-dataset" class="dataset-tag" target="_blank">UCF Crime Dataset</a>
387389
</div>
388390

389391
</div>
390392
<div class="card" markdown>
391393
<div class="header">
392-
<h3><a href="https://github.com/VectorInstitute/ai-deployment" title="Go to Repository">ai-deployment</a></h3>
394+
<h3><a href="https://github.com/VectorInstitute/self-supervised-learning" title="Go to Repository">self-supervised-learning</a></h3>
393395
<span class="tag year-tag">2024</span>
394396
<span class="tag type-tag">bootcamp</span>
395397
</div>
396-
<p>A repository with reference implementations for deploying AI models in production environments, focusing on best practices and cloud-native solutions.</p>
398+
<p>A repository with reference implementations of self-supervised learning techniques</p>
397399
<div class="tag-container">
398-
<a href="https://aws.amazon.com/" class="tag" target="_blank">AWS</a> <a href="https://cloud.google.com/" class="tag" target="_blank">GCP</a>
400+
<a href="https://proceedings.mlr.press/v162/qiu22b/qiu22b.pdf" class="tag" target="_blank">Internal Contrastive Learning (ICL) + Latent Outlier Exposure (LOE)</a> <a href="https://arxiv.org/abs/2302.00861" class="tag" target="_blank">SimMTM</a> <a href="https://arxiv.org/abs/2303.15747" class="tag" target="_blank">TabRet</a> <a href="https://arxiv.org/abs/2202.03555" class="tag" target="_blank">Data2Vec</a>
401+
</div>
402+
<div class="datasets">
403+
<strong>Datasets:</strong> <a href="https://cs.stanford.edu/~acoates/stl10/" class="dataset-tag" target="_blank">STL-10</a> <a href="https://archive.ics.uci.edu/dataset/381/beijing+pm2+5+data" class="dataset-tag" target="_blank">Beijing PM 2.5</a>
399404
</div>
400-
401405

402406
</div>
403407
<div class="card" markdown>
404408
<div class="header">
405-
<h3><a href="https://github.com/VectorInstitute/anomaly-detection" title="Go to Repository">anomaly-detection</a></h3>
406-
<span class="tag year-tag">2023</span>
409+
<h3><a href="https://github.com/VectorInstitute/diffusion_models" title="Go to Repository">diffusion-models</a></h3>
410+
<span class="tag year-tag">2024</span>
407411
<span class="tag type-tag">bootcamp</span>
408412
</div>
409-
<p>A repository with implementation of anomaly detection techniques</p>
413+
<p>A repository with demos for various diffusion models for tabular and time series data</p>
410414
<div class="tag-container">
411-
<span class="tag" data-tippy="Logistic Regression (Supervised)">Logistic Regression (Supervised)</span> <span class="tag" data-tippy="Random Forest (Supervised)">Random Forest (Supervised)</span> <span class="tag" data-tippy="XGBoost (Supervised)">XGBoost (Supervised)</span> <span class="tag" data-tippy="CatBoost (Supervised)">CatBoost (Supervised)</span> <span class="tag" data-tippy="Light GBM (Supervised)">Light GBM (Supervised)</span> <span class="tag" data-tippy="TabNet (Supervised and Semi-supervised)">TabNet (Supervised and Semi-supervised)</span> <span class="tag" data-tippy="Autoencoder (AE) (Unsupervised)">Autoencoder (AE) (Unsupervised)</span> <span class="tag" data-tippy="Isolation Forest (Unsupervised)">Isolation Forest (Unsupervised)</span>
415+
<span class="tag" data-tippy="TabDDPM">TabDDPM</span> <span class="tag" data-tippy="TabSyn">TabSyn</span> <a href="https://arxiv.org/abs/2405.17724" class="tag" target="_blank">ClavaDDPM</a> <span class="tag" data-tippy="CSDI">CSDI</span> <a href="https://arxiv.org/abs/2307.11494" class="tag" target="_blank">TSDiff</a>
412416
</div>
413417
<div class="datasets">
414-
<strong>Datasets:</strong> <a href="https://arxiv.org/pdf/2211.13358.pdf" class="dataset-tag" target="_blank">Bank Account Fraud Detection</a> <a href="https://dgraph.xinye.com/dataset" class="dataset-tag" target="_blank">DGraph dataset</a> <a href="https://www.mvtec.com/company/research/datasets/mvtec-ad" class="dataset-tag" target="_blank">MVTec dataset</a> <a href="http://www.svcl.ucsd.edu/projects/anomaly/dataset.htm" class="dataset-tag" target="_blank">UCSD Anomaly Detection Dataset</a> <a href="https://www.kaggle.com/datasets/odins0n/ucf-crime-dataset" class="dataset-tag" target="_blank">UCF Crime Dataset</a>
418+
<strong>Datasets:</strong> <a href="https://www.physionet.org/content/challenge-2012/1.0.0/" class="dataset-tag" target="_blank">Physionet Challenge 2012</a> <a href="https://archive.ics.uci.edu/dataset/321/electricityloaddiagrams20112014" class="dataset-tag" target="_blank">Electricity dataset (UCI Machine Learning Repository)</a>
415419
</div>
416420

417421
</div>
418422
<div class="card" markdown>
419423
<div class="header">
420-
<h3><a href="https://github.com/VectorInstitute/self-supervised-learning" title="Go to Repository">self-supervised-learning</a></h3>
424+
<h3><a href="https://github.com/VectorInstitute/ai-deployment" title="Go to Repository">ai-deployment</a></h3>
421425
<span class="tag year-tag">2024</span>
422426
<span class="tag type-tag">bootcamp</span>
423427
</div>
424-
<p>A repository with reference implementations of self-supervised learning techniques</p>
428+
<p>A repository with reference implementations for deploying AI models in production environments, focusing on best practices and cloud-native solutions.</p>
425429
<div class="tag-container">
426-
<a href="https://proceedings.mlr.press/v162/qiu22b/qiu22b.pdf" class="tag" target="_blank">Internal Contrastive Learning (ICL) + Latent Outlier Exposure (LOE)</a> <a href="https://arxiv.org/abs/2302.00861" class="tag" target="_blank">SimMTM</a> <a href="https://arxiv.org/abs/2303.15747" class="tag" target="_blank">TabRet</a> <a href="https://arxiv.org/abs/2202.03555" class="tag" target="_blank">Data2Vec</a>
427-
</div>
428-
<div class="datasets">
429-
<strong>Datasets:</strong> <a href="https://cs.stanford.edu/~acoates/stl10/" class="dataset-tag" target="_blank">STL-10</a> <a href="https://archive.ics.uci.edu/dataset/381/beijing+pm2+5+data" class="dataset-tag" target="_blank">Beijing PM 2.5</a>
430+
<a href="https://aws.amazon.com/" class="tag" target="_blank">AWS</a> <a href="https://cloud.google.com/" class="tag" target="_blank">GCP</a>
430431
</div>
431432

433+
432434
</div>
433435
<div class="card" markdown>
434436
<div class="header">
@@ -444,6 +446,21 @@ a.dataset-tag:hover {
444446
<strong>Datasets:</strong> <a href="https://pubmed.ncbi.nlm.nih.gov" class="dataset-tag" target="_blank">PubMed</a> <a href="https://www.kaggle.com/datasets/prakharrathi25/banking-dataset-marketing-targets" class="dataset-tag" target="_blank">Banking Dataset - Marketing Targets</a>
445447
</div>
446448

449+
</div>
450+
<div class="card" markdown>
451+
<div class="header">
452+
<h3><a href="https://github.com/VectorInstitute/finetuning-and-alignment" title="Go to Repository">finetuning-and-alignment</a></h3>
453+
<span class="tag year-tag">2024</span>
454+
<span class="tag type-tag">bootcamp</span>
455+
</div>
456+
<p>A repository with implementations advanced fine-tuning techniques and approaches to enhance Large Language Model performance, reduce their computational cost, with a focus on alignment with human values</p>
457+
<div class="tag-container">
458+
<a href="https://docs.pytorch.org/tutorials/intermediate/FSDP_tutorial.html" class="tag" target="_blank">FSDP</a> <a href="https://docs.pytorch.org/tutorials/intermediate/ddp_tutorial.html" class="tag" target="_blank">DDP</a> <span class="tag" data-tippy="Instruction Tuning">Instruction Tuning</span> <a href="https://github.com/huggingface/peft" class="tag" target="_blank">PEFT</a> <span class="tag" data-tippy="Quantization">Quantization</span> <span class="tag" data-tippy="Supervised Fine-tuning">Supervised Fine-tuning</span>
459+
</div>
460+
<div class="datasets">
461+
<strong>Datasets:</strong> <a href="https://huggingface.co/datasets/knkarthick/samsum" class="dataset-tag" target="_blank">SAMSum dataset</a> <a href="https://github.com/cardiffnlp/tweeteval" class="dataset-tag" target="_blank">TweetEval</a>
462+
</div>
463+
447464
</div>
448465

449466
</div>
@@ -479,4 +496,3 @@ a.dataset-tag:hover {
479496
</div>
480497

481498
</div>
482-
Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,24 @@
1+
name: finetuning-and-alignment
2+
repo_id: VectorInstitute/finetuning-and-alignment
3+
description: "A repository with implementations advanced fine-tuning techniques and approaches to enhance Large Language Model performance, reduce their computational cost, with a focus on alignment with human values"
4+
implementations:
5+
- name: FSDP
6+
url: https://docs.pytorch.org/tutorials/intermediate/FSDP_tutorial.html
7+
- name: DDP
8+
url: https://docs.pytorch.org/tutorials/intermediate/ddp_tutorial.html
9+
- name: Instruction Tuning
10+
url: null
11+
- name: PEFT
12+
url: https://github.com/huggingface/peft
13+
- name: Quantization
14+
url: null
15+
- name: Supervised Fine-tuning
16+
url: null
17+
public_datasets:
18+
- name: SAMSum dataset
19+
url: https://huggingface.co/datasets/knkarthick/samsum
20+
- name: TweetEval
21+
url: https://github.com/cardiffnlp/tweeteval
22+
type: bootcamp
23+
year: 2024
24+
github_url: https://github.com/VectorInstitute/finetuning-and-alignment

0 commit comments

Comments
 (0)