Skip to content

Commit a258c19

Browse files
authored
Update README.md
1 parent 3da9ec6 commit a258c19

File tree

1 file changed

+128
-1
lines changed

1 file changed

+128
-1
lines changed

README.md

Lines changed: 128 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1,128 @@
1-
# finalfusion python
1+
# finalfusion-python
2+
3+
## Introduction
4+
5+
`finalfusion` is a Python package for reading, writing and using
6+
[finalfusion](https://finalfusion.github.io) embeddings, but also
7+
supports other commonly used embeddings like fastText, GloVe and
8+
word2vec.
9+
10+
The Python package supports the same types of embeddings as the
11+
[finalfusion-rust crate](https://docs.rs/finalfusion/):
12+
13+
* Vocabulary:
14+
* No subwords
15+
* Subwords
16+
* Embedding matrix:
17+
* Array
18+
* Memory-mapped
19+
* Quantized
20+
* Norms
21+
* Metadata
22+
23+
## Installation
24+
25+
The finalfusion module is
26+
[available](https://pypi.org/project/finalfusion/#files) on PyPi for Linux,
27+
Mac and Windows. You can use `pip` to install the module:
28+
29+
~~~shell
30+
$ pip install --upgrade finalfusion
31+
~~~
32+
33+
## Installing from source
34+
35+
Building from source depends on `Cython`. If you install the package using
36+
`pip`, you don't need to explicitly install the dependency since it is
37+
specified in `pyproject.toml`.
38+
39+
~~~shell
40+
$ git clone https://github.com/finalfusion/finalfusion-python
41+
$ cd finalfusion-python
42+
$ pip install .
43+
~~~
44+
45+
If you want to build wheels from source, `wheel` needs to be installed.
46+
It's then possible to build wheels through:
47+
48+
~~~shell
49+
$ python setup.py bdist_wheel
50+
~~~
51+
52+
The wheels can be found in `dist`.
53+
54+
## Package Usage
55+
56+
### Basic usage
57+
58+
~~~python
59+
import finalfusion
60+
# loading from different formats
61+
w2v_embeds = finalfusion.load_word2vec("/path/to/w2v.bin")
62+
text_embeds = finalfusion.load_text("/path/to/embeds.txt")
63+
text_dims_embeds = finalfusion.load_text_dims("/path/to/embeds.dims.txt")
64+
fasttext_embeds = finalfusion.load_fasttext("/path/to/fasttext.bin")
65+
fifu_embeds = finalfusion.load_finalfusion("/path/to/embeddings.fifu")
66+
67+
# serialization to formats works similarly
68+
finalfusion.compat.write_word2vec("to_word2vec.bin", fifu_embeds)
69+
70+
# embedding lookup
71+
embedding = fifu_embeds["Test"]
72+
73+
# reading an embedding into a buffer
74+
import numpy as np
75+
buffer = np.zeros(fifu_embeds.storage.shape[1], dtype=np.float32)
76+
fifu_embeds.embedding("Test", out=buffer)
77+
78+
# similarity and analogy query
79+
sim_query = fifu_embeds.word_similarity("Test")
80+
analogy_query = fifu_embeds.analogy("A", "B", "C")
81+
82+
# accessing the vocab and printing the first 10 words
83+
vocab = fifu_embeds.vocab
84+
print(vocab.words[:10])
85+
86+
# SubwordVocabs give access to the subword indexer:
87+
subword_indexer = vocab.subword_indexer
88+
print(subword_indexer.subword_indices("Test", with_ngrams=True))
89+
90+
# accessing the storage and calculate its dot product with an embedding
91+
res = embedding.dot(fifu_embeds.storage)
92+
93+
# printing metadata
94+
print(fifu_embeds.metadata)
95+
~~~
96+
97+
### Beyond Embeddings
98+
99+
~~~Python
100+
# load only a vocab from a finalfusion file
101+
from finalfusion import load_vocab
102+
path = "/path/to/finalfusion_file.fifu"
103+
vocab = load_vocab(path)
104+
105+
# serialize vocab to single file
106+
vocab.write("/path/to/vocab_file.fifu")
107+
108+
# more specific loading functions exist
109+
from finalfusion.vocab import load_finalfusion_bucket_vocab
110+
fifu_bucket_vocab = load_finalfusion_bucket_vocab(path)
111+
~~~
112+
113+
The package supports loading and writing all `finalfusion` chunks this way.
114+
115+
## Scripts
116+
117+
`finalfusion` also includes a conversion script `ffp-convert` to convert
118+
between the supported formats.
119+
~~~shell
120+
# convert from fastText format to finalfusion
121+
$ ffp-convert -f fasttext fasttext.bin -t finalfusion embeddings.fifu
122+
~~~
123+
124+
## Where to go from here
125+
126+
* [finalfrontier](https://finalfusion.github.io/finalfrontier)
127+
* [finalfusion](https://finalfusion.github.io/)
128+
* [pretrained embeddings](https://finalfusion.github.io/pretrained)

0 commit comments

Comments
 (0)