Skip to content
This repository was archived by the owner on Apr 16, 2024. It is now read-only.

Commit 8773e57

Browse files
committed
added project
0 parents  commit 8773e57

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

49 files changed

+34376
-0
lines changed

.gitignore

Lines changed: 259 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,259 @@
1+
# Created by https://www.toptal.com/developers/gitignore/api/python,visualstudiocode,linux,macos,windows
2+
# Edit at https://www.toptal.com/developers/gitignore?templates=python,visualstudiocode,linux,macos,windows
3+
4+
### Linux ###
5+
*~
6+
7+
# temporary files which can be created if a process still has a handle open of a deleted file
8+
.fuse_hidden*
9+
10+
# KDE directory preferences
11+
.directory
12+
13+
# Linux trash folder which might appear on any partition or disk
14+
.Trash-*
15+
16+
# .nfs files are created when an open file is removed but is still being accessed
17+
.nfs*
18+
19+
### macOS ###
20+
# General
21+
.DS_Store
22+
.AppleDouble
23+
.LSOverride
24+
25+
# Icon must end with two \r
26+
Icon
27+
28+
29+
# Thumbnails
30+
._*
31+
32+
# Files that might appear in the root of a volume
33+
.DocumentRevisions-V100
34+
.fseventsd
35+
.Spotlight-V100
36+
.TemporaryItems
37+
.Trashes
38+
.VolumeIcon.icns
39+
.com.apple.timemachine.donotpresent
40+
41+
# Directories potentially created on remote AFP share
42+
.AppleDB
43+
.AppleDesktop
44+
Network Trash Folder
45+
Temporary Items
46+
.apdisk
47+
48+
### macOS Patch ###
49+
# iCloud generated files
50+
*.icloud
51+
52+
### Python ###
53+
# Byte-compiled / optimized / DLL files
54+
__pycache__/
55+
*.py[cod]
56+
*$py.class
57+
58+
# C extensions
59+
*.so
60+
61+
# Distribution / packaging
62+
.Python
63+
build/
64+
develop-eggs/
65+
dist/
66+
downloads/
67+
eggs/
68+
.eggs/
69+
lib/
70+
lib64/
71+
parts/
72+
sdist/
73+
var/
74+
wheels/
75+
share/python-wheels/
76+
*.egg-info/
77+
.installed.cfg
78+
*.egg
79+
MANIFEST
80+
81+
# PyInstaller
82+
# Usually these files are written by a python script from a template
83+
# before PyInstaller builds the exe, so as to inject date/other infos into it.
84+
*.manifest
85+
*.spec
86+
87+
# Installer logs
88+
pip-log.txt
89+
pip-delete-this-directory.txt
90+
91+
# Unit test / coverage reports
92+
htmlcov/
93+
.tox/
94+
.nox/
95+
.coverage
96+
.coverage.*
97+
.cache
98+
nosetests.xml
99+
coverage.xml
100+
*.cover
101+
*.py,cover
102+
.hypothesis/
103+
.pytest_cache/
104+
cover/
105+
106+
# Translations
107+
*.mo
108+
*.pot
109+
110+
# Django stuff:
111+
*.log
112+
local_settings.py
113+
db.sqlite3
114+
db.sqlite3-journal
115+
116+
# Flask stuff:
117+
instance/
118+
.webassets-cache
119+
120+
# Scrapy stuff:
121+
.scrapy
122+
123+
# Sphinx documentation
124+
docs/_build/
125+
126+
# PyBuilder
127+
.pybuilder/
128+
target/
129+
130+
# Jupyter Notebook
131+
.ipynb_checkpoints
132+
133+
# IPython
134+
profile_default/
135+
ipython_config.py
136+
137+
# pyenv
138+
# For a library or package, you might want to ignore these files since the code is
139+
# intended to run in multiple environments; otherwise, check them in:
140+
# .python-version
141+
142+
# pipenv
143+
# According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
144+
# However, in case of collaboration, if having platform-specific dependencies or dependencies
145+
# having no cross-platform support, pipenv may install dependencies that don't work, or not
146+
# install all needed dependencies.
147+
#Pipfile.lock
148+
149+
# poetry
150+
# Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control.
151+
# This is especially recommended for binary packages to ensure reproducibility, and is more
152+
# commonly ignored for libraries.
153+
# https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control
154+
#poetry.lock
155+
156+
# pdm
157+
# Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control.
158+
#pdm.lock
159+
# pdm stores project-wide configurations in .pdm.toml, but it is recommended to not include it
160+
# in version control.
161+
# https://pdm.fming.dev/#use-with-ide
162+
.pdm.toml
163+
164+
# PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm
165+
__pypackages__/
166+
167+
# Celery stuff
168+
celerybeat-schedule
169+
celerybeat.pid
170+
171+
# SageMath parsed files
172+
*.sage.py
173+
174+
# Environments
175+
.env
176+
.venv
177+
env/
178+
venv/
179+
ENV/
180+
env.bak/
181+
venv.bak/
182+
183+
# Spyder project settings
184+
.spyderproject
185+
.spyproject
186+
187+
# Rope project settings
188+
.ropeproject
189+
190+
# mkdocs documentation
191+
/site
192+
193+
# mypy
194+
.mypy_cache/
195+
.dmypy.json
196+
dmypy.json
197+
198+
# Pyre type checker
199+
.pyre/
200+
201+
# pytype static type analyzer
202+
.pytype/
203+
204+
# Cython debug symbols
205+
cython_debug/
206+
207+
# PyCharm
208+
# JetBrains specific template is maintained in a separate JetBrains.gitignore that can
209+
# be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore
210+
# and can be added to the global gitignore or merged into this file. For a more nuclear
211+
# option (not recommended) you can uncomment the following to ignore the entire idea folder.
212+
#.idea/
213+
214+
### VisualStudioCode ###
215+
.vscode/*
216+
!.vscode/settings.json
217+
!.vscode/tasks.json
218+
!.vscode/launch.json
219+
!.vscode/extensions.json
220+
!.vscode/*.code-snippets
221+
222+
# Local History for Visual Studio Code
223+
.history/
224+
225+
# Built Visual Studio Code Extensions
226+
*.vsix
227+
228+
### VisualStudioCode Patch ###
229+
# Ignore all local history of files
230+
.history
231+
.ionide
232+
233+
### Windows ###
234+
# Windows thumbnail cache files
235+
Thumbs.db
236+
Thumbs.db:encryptable
237+
ehthumbs.db
238+
ehthumbs_vista.db
239+
240+
# Dump file
241+
*.stackdump
242+
243+
# Folder config file
244+
[Dd]esktop.ini
245+
246+
# Recycle Bin used on file shares
247+
$RECYCLE.BIN/
248+
249+
# Windows Installer files
250+
*.cab
251+
*.msi
252+
*.msix
253+
*.msm
254+
*.msp
255+
256+
# Windows shortcuts
257+
*.lnk
258+
259+
# End of https://www.toptal.com/developers/gitignore/api/python,visualstudiocode,linux,macos,windows

.vscode/settings.json

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
{
2+
"editor.rulers": [
3+
120
4+
]
5+
}

CITATION.cff

Lines changed: 47 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,47 @@
1+
# This CITATION.cff file was generated with cffinit.
2+
# Visit https://bit.ly/cffinit to generate yours today!
3+
4+
cff-version: 1.2.0
5+
title: >-
6+
Effective and Efficient Ranking using a Dual Encoder
7+
Approach
8+
message: >-
9+
If you use this software, please cite it using the
10+
metadata from this file.
11+
type: thesis
12+
authors:
13+
- given-names: Tim
14+
family-names: Hagen
15+
repository-code: >-
16+
https://github.com/TheMrSheldon/Effective-and-Efficient-Ranking-using-a-Dual-Encoder-Approach
17+
abstract: >-
18+
Ever since BERT's inception, the Information Retrieval
19+
community has worked on harnessing BERT's potential for
20+
relevance ranking. To this end, the most prevalent
21+
approaches are distilling BERT into smaller transformer
22+
architectures or designing efficient ranking architectures
23+
around (distilled versions of) BERT. We propose replacing
24+
MMSE-ColBERT's document encoder by distilling it into a
25+
vastly smaller, graph-based architecture using a modified
26+
version of TinyBERT's loss objective. Our architecture
27+
creates an initial graph-of-word that is then refined
28+
using multiple heads of Graph Structure Learning.
29+
Empirically, we find that the smallest variant of our
30+
architecture works best. It consists of a single GAT-layer
31+
and three GCN-layers. The modified version of TinyBERT's
32+
loss objective is competitive with strong baselines, like
33+
MMSE-ColBERT, but does not beat them. By using Margin-MSE
34+
loss instead, we can further significantly improve
35+
effectiveness such that our model beats every baseline
36+
except the strongest, MMSE-ColBERT with query expansion.
37+
Due to its simplicity, our model is three times as fast as
38+
MMSE-ColBERT's document encoding. Our experiments show
39+
promise that our model can further be used for document
40+
encoding and to replace the query encoding of MMSE-ColBERT
41+
as well for more efficient ranking.
42+
keywords:
43+
- BERT
44+
- re-ranking
45+
- cross-architecture knowledge distillation
46+
- graph neural networks
47+
license: GPL-3.0

0 commit comments

Comments
 (0)