Skip to content

Commit f249659

Browse files
committed
Add docs for all scripts.
1 parent bb837c9 commit f249659

File tree

6 files changed

+152
-37
lines changed

6 files changed

+152
-37
lines changed

docs/finalfusion.scripts.rst

Lines changed: 118 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,118 @@
1+
Scripts
2+
=======
3+
4+
Installing ``finalfusion`` adds some exectuables:
5+
6+
* ``ffp-convert`` for converting embeddings
7+
* ``ffp-similar`` for similarity queries
8+
* ``ffp-analogy`` for analogy queries
9+
* ``ffp-bucket-to-explicit`` to convert bucket subword to explicit subword embeddings
10+
11+
.. Convert:
12+
13+
Convert
14+
-------
15+
16+
``ffp-convert`` makes conversion between all supported embedding formats possible:
17+
18+
.. code-block:: bash
19+
20+
$ ffp-convert --help
21+
usage: ffp-convert [-h] [-f INPUT_FORMAT] [-t OUTPUT_FORMAT] INPUT OUTPUT
22+
23+
Convert embeddings.
24+
25+
positional arguments:
26+
INPUT Input embeddings
27+
OUTPUT Output path
28+
29+
optional arguments:
30+
-h, --help show this help message and exit
31+
-f INPUT_FORMAT, --from INPUT_FORMAT
32+
Valid choices: ['word2vec', 'finalfusion', 'fasttext',
33+
'text', 'textdims'] Default: 'word2vec'
34+
-t OUTPUT_FORMAT, --to OUTPUT_FORMAT
35+
Valid choices: ['word2vec', 'finalfusion', 'fasttext',
36+
'text', 'textdims'] Default: 'finalfusion'
37+
38+
.. Similar:
39+
40+
Similar
41+
-------
42+
43+
``ffp-similar`` supports similarity queries:
44+
45+
.. code-block:: bash
46+
47+
$ ffp-similar --help
48+
usage: ffp-similar [-h] [-f INPUT_FORMAT] [-k K] EMBEDDINGS [input]
49+
50+
Similarity queries.
51+
52+
positional arguments:
53+
EMBEDDINGS Input embeddings
54+
input Optional input file with one word per line. If
55+
unspecified reads from stdin
56+
57+
58+
optional arguments:
59+
-h, --help show this help message and exit
60+
-f INPUT_FORMAT, --format INPUT_FORMAT
61+
Valid choices: ['word2vec', 'finalfusion', 'fasttext',
62+
'text', 'textdims'] Default: 'finalfusion'
63+
-k K Number of neighbours. Default: 10
64+
65+
.. Analogy:
66+
67+
Analogy
68+
-------
69+
70+
``ffp-analogy`` answers analogy queries:
71+
72+
.. code-block:: bash
73+
74+
$ ffp-analogy --help
75+
usage: ffp-analogy [-h] [-f INPUT_FORMAT] [-i {a,b,c} [{a,b,c} ...]] [-k K]
76+
EMBEDDINGS [input]
77+
78+
Analogy queries.
79+
80+
positional arguments:
81+
EMBEDDINGS Input embeddings
82+
input Optional input file with 3 words per line. If
83+
unspecified reads from stdin
84+
85+
optional arguments:
86+
-h, --help show this help message and exit
87+
-f INPUT_FORMAT, --format INPUT_FORMAT
88+
Valid choices: ['word2vec', 'finalfusion', 'fasttext',
89+
'text', 'textdims'] Default: 'finalfusion'
90+
-i {a,b,c} [{a,b,c} ...], --include {a,b,c} [{a,b,c} ...]
91+
Specify query parts that should be allowed as answers.
92+
Valid choices: ['a', 'b', 'c']
93+
-k K Number of neighbours. Default: 10
94+
95+
.. bucket to explicit:
96+
97+
Bucket to Explicit
98+
------------------
99+
100+
101+
``ffp-bucket-to-explicit`` converts bucket subword embeddings to explicit subword embeddings:
102+
103+
.. code-block:: bash
104+
105+
$ ffp-bucket-to-explicit
106+
usage: ffp-bucket-to-explicit [-h] [-f INPUT_FORMAT] INPUT OUTPUT
107+
108+
Convert bucket embeddings to explicit lookups.
109+
110+
positional arguments:
111+
INPUT Input bucket embeddings
112+
OUTPUT Output path
113+
114+
optional arguments:
115+
-h, --help show this help message and exit
116+
-f INPUT_FORMAT, --from INPUT_FORMAT
117+
Valid choices: ['finalfusion', 'fasttext'] Default:
118+
'finalfusion'

docs/index.rst

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -38,8 +38,9 @@ other tools from the ``finalfusion`` ecosystem.
3838
It integrates nicely with :mod:`.numpy` since its :class:`~.Storage` types can be
3939
treated as numpy arrays.
4040

41-
``finalfusion`` comes with :doc:`ffp-convert <scripts/finalfusion.scripts.ffp-convert>` to convert between
42-
any of the supported embedding formats.
41+
``finalfusion`` comes with some :doc:`scripts <finalfusion.scripts>` to convert between
42+
embedding formats, do analogy and similarity queries and turn bucket subword embeddings
43+
into explicit subword embeddings.
4344

4445
The package is implemented in Python with some ``Cython`` extensions, it is not based on bindings
4546
to the `finalfusion-rust crate <https://github.com/finalfusion/finalfusion-rust/>`__.
@@ -58,7 +59,7 @@ Contents
5859
install
5960
modules/re-exports
6061
modules/api
61-
scripts/finalfusion.scripts.ffp-convert
62+
finalfusion.scripts
6263

6364
Indices and tables
6465
------------------

docs/quickstart.rst

Lines changed: 15 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -62,10 +62,22 @@ The full API documentation can be found :doc:`here <modules/api>`.
6262
Conversion
6363
----------
6464

65-
``finalfusion`` also comes with a conversion tool to convert between supported file formats:
65+
``finalfusion`` also comes with a conversion tool to convert between supported file formats
66+
and from bucket subword embeddings to explicit subword embeddings:
6667

6768
.. code-block:: bash
6869
69-
ffp-convert -f fasttext from_fasttext.bin -t finalfusion to_finalfusion.fifu
70+
$ ffp-convert -f fasttext from_fasttext.bin -t finalfusion to_finalfusion.fifu
71+
$ ffp-bucket-to-explicit buckets.fifu explicit.fifu
7072
71-
See :doc:`ffp-convert <scripts/finalfusion.scripts.ffp-convert>`
73+
See :doc:`Scripts<finalfusion.scripts>`
74+
75+
Similarity and Analogy
76+
----------------------
77+
78+
.. code-block:: bash
79+
80+
$ echo Tübingen | ffp-similar embeddings.fifu
81+
$ echo Tübingen Stuttgart Heidelberg | ffp-analogy embeddings.fifu
82+
83+
See :doc:`Scripts<finalfusion.scripts>`

docs/scripts/finalfusion.scripts.ffp-convert.rst

Lines changed: 0 additions & 27 deletions
This file was deleted.

src/finalfusion/scripts/analogy.py

Lines changed: 7 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -36,9 +36,14 @@ def main() -> None: # pylint: disable=missing-function-docstring
3636
parser.add_argument("-k",
3737
type=int,
3838
default=10,
39-
help=f"Number of neighbours. Default: 1",
39+
help="Number of neighbours. Default: 10",
4040
metavar="K")
41-
parser.add_argument("input", nargs='?', default=0)
41+
parser.add_argument(
42+
"input",
43+
help=
44+
"Optional input file with 3 words per line. If unspecified reads from stdin",
45+
nargs='?',
46+
default=0)
4247
args = parser.parse_args()
4348
if args.include != [] and len(args.include) > 3:
4449
print("-i/--include can take up to 3 unique values: a, b and c.",

src/finalfusion/scripts/similar.py

Lines changed: 8 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -25,10 +25,16 @@ def main() -> None: # pylint: disable=missing-function-docstring
2525
metavar="INPUT_FORMAT")
2626
parser.add_argument("-k",
2727
type=int,
28+
help="Number of neighbours. Default: 10",
2829
default=10,
29-
help=f"Number of neighbours. Default: 10",
3030
metavar="K")
31-
parser.add_argument("input", nargs='?', default=0)
31+
parser.add_argument(
32+
"input",
33+
help=
34+
"Optional input file with one word per line. If unspecified reads from stdin",
35+
nargs='?',
36+
default=0,
37+
)
3238
args = parser.parse_args()
3339
embeds = Format(args.format).load(args.embeddings)
3440
with open(args.input) as queries:

0 commit comments

Comments
 (0)