Skip to content

Commit 95ad62e

Browse files
authored
update examples
1 parent 825bba1 commit 95ad62e

File tree

14 files changed

+103
-91
lines changed

14 files changed

+103
-91
lines changed

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -101,7 +101,7 @@ If you find this code useful in your research, please cite it using the followin
101101
<td>
102102
<a href="https://github.com/zanshuxun"><img width="70" height="70" src="https://github.com/zanshuxun.png?s=40" alt="pic"></a><br>
103103
<a href="https://github.com/zanshuxun">Zan Shuxun</a> ​
104-
<p>Beijing University <br> of Posts and <br> Telecommunications </p>​
104+
<p>Alibaba Group </p>​
105105
</td>
106106
<td>
107107
​ <a href="https://github.com/pandeconscious"><img width="70" height="70" src="https://github.com/pandeconscious.png?s=40" alt="pic"></a><br>

deepctr/__init__.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
11
from .utils import check_version
22

3-
__version__ = '0.8.6'
3+
__version__ = '0.8.7'
44
check_version(__version__)

deepctr/layers/sequence.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -561,7 +561,7 @@ def call(self, inputs, mask=None, training=None, **kwargs):
561561
try:
562562
outputs = tf.matrix_set_diag(outputs, tf.ones_like(outputs)[
563563
:, :, 0] * (-2 ** 32 + 1))
564-
except AttributeError as e:
564+
except AttributeError:
565565
outputs = tf.compat.v1.matrix_set_diag(outputs, tf.ones_like(outputs)[
566566
:, :, 0] * (-2 ** 32 + 1))
567567

docs/source/Examples.md

Lines changed: 66 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -322,6 +322,72 @@ if __name__ == "__main__":
322322
history = model.fit(model_input, data[target].values,
323323
batch_size=256, epochs=10, verbose=2, validation_split=0.2, )
324324
```
325+
## Hash Layer with pre-defined key-value vocabulary
326+
327+
This examples how to use pre-defined key-value vocabulary in `Hash` Layer.`movielens_age_vocabulary.csv` stores the key-value mapping for `age` feature.
328+
329+
```python
330+
from deepctr.models import DeepFM
331+
from deepctr.feature_column import SparseFeat, VarLenSparseFeat, get_feature_names
332+
import numpy as np
333+
import pandas as pd
334+
from tensorflow.python.keras.preprocessing.sequence import pad_sequences
335+
336+
try:
337+
import tensorflow.compat.v1 as tf
338+
except ImportError as e:
339+
import tensorflow as tf
340+
341+
if __name__ == "__main__":
342+
data = pd.read_csv("./movielens_sample.txt")
343+
sparse_features = ["movie_id", "user_id",
344+
"gender", "age", "occupation", "zip", ]
345+
346+
data[sparse_features] = data[sparse_features].astype(str)
347+
target = ['rating']
348+
349+
# 1.Use hashing encoding on the fly for sparse features,and process sequence features
350+
351+
genres_list = list(map(lambda x: x.split('|'), data['genres'].values))
352+
genres_length = np.array(list(map(len, genres_list)))
353+
max_len = max(genres_length)
354+
355+
# Notice : padding=`post`
356+
genres_list = pad_sequences(genres_list, maxlen=max_len, padding='post', dtype=str, value=0)
357+
358+
# 2.set hashing space for each sparse field and generate feature config for sequence feature
359+
360+
fixlen_feature_columns = [SparseFeat(feat, data[feat].nunique() * 5, embedding_dim=4, use_hash=True,
361+
vocabulary_path='./movielens_age_vocabulary.csv' if feat == 'age' else None,
362+
dtype='string')
363+
for feat in sparse_features]
364+
varlen_feature_columns = [
365+
VarLenSparseFeat(SparseFeat('genres', vocabulary_size=100, embedding_dim=4,
366+
use_hash=True, dtype="string"),
367+
maxlen=max_len, combiner='mean',
368+
)] # Notice : value 0 is for padding for sequence input feature
369+
linear_feature_columns = fixlen_feature_columns + varlen_feature_columns
370+
dnn_feature_columns = fixlen_feature_columns + varlen_feature_columns
371+
feature_names = get_feature_names(linear_feature_columns + dnn_feature_columns)
372+
373+
# 3.generate input data for model
374+
model_input = {name: data[name] for name in feature_names}
375+
model_input['genres'] = genres_list
376+
377+
# 4.Define Model,compile and train
378+
model = DeepFM(linear_feature_columns, dnn_feature_columns, task='regression')
379+
model.compile("adam", "mse", metrics=['mse'], )
380+
if not hasattr(tf, 'version') or tf.version.VERSION < '2.0.0':
381+
with tf.Session() as sess:
382+
sess.run(tf.tables_initializer())
383+
history = model.fit(model_input, data[target].values,
384+
batch_size=256, epochs=10, verbose=2, validation_split=0.2, )
385+
else:
386+
history = model.fit(model_input, data[target].values,
387+
batch_size=256, epochs=10, verbose=2, validation_split=0.2, )
388+
389+
```
390+
325391

326392
## Estimator with TFRecord: Classification Criteo
327393

docs/source/Features.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -26,10 +26,10 @@ DNN based CTR prediction models usually have following 4 modules:
2626
``SparseFeat`` is a namedtuple with signature ``SparseFeat(name, vocabulary_size, embedding_dim, use_hash, vocabulary_path, dtype, embeddings_initializer, embedding_name, group_name, trainable)``
2727

2828
- name : feature name
29-
- vocabulary_size : number of unique feature values for sprase feature or hashing space when `use_hash=True`
29+
- vocabulary_size : number of unique feature values for sparse feature or hashing space when `use_hash=True`
3030
- embedding_dim : embedding dimension
3131
- use_hash : default `False`.If `True` the input will be hashed to space of size `vocabulary_size`.
32-
- vocabulary_path : default `None`. The `CSV` text file path of the vocabulary table used by `tf.lookup.TextFileInitializer`, which assigns one entry in the table for each line in the file. One entry contains two columns seperated by comma, the first is the value column, the second is the key column. The `0` value is reserved to use if a key is missing in the table, so hash value need start from `1`.
32+
- vocabulary_path : default `None`. The `CSV` text file path of the vocabulary table used by `tf.lookup.TextFileInitializer`, which assigns one entry in the table for each line in the file. One entry contains two columns separated by comma, the first is the value column, the second is the key column. The `0` value is reserved to use if a key is missing in the table, so hash value need start from `1`.
3333
- dtype : default `int32`.dtype of input tensor.
3434
- embeddings_initializer : initializer for the `embeddings` matrix.
3535
- embedding_name : default `None`. If None, the embedding_name will be same as `name`.

docs/source/History.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,5 @@
11
# History
2+
- 07/18/2021 : [v0.8.7](https://github.com/shenweichen/DeepCTR/releases/tag/v0.8.7) released.Support pre-defined key-value vocabulary in `Hash` Layer. [example](./Examples.html#hash-layer-with-pre-defined-key-value-vocabulary)
23
- 06/14/2021 : [v0.8.6](https://github.com/shenweichen/DeepCTR/releases/tag/v0.8.6) released.Add [IFM](./Features.html#ifm-input-aware-factorization-machine) [DIFM](./Features.html#difm-dual-input-aware-factorization-machine), [FEFM and DeepFEFM](./Features.html#deepfefm-deep-field-embedded-factorization-machine) model.
34
- 03/13/2021 : [v0.8.5](https://github.com/shenweichen/DeepCTR/releases/tag/v0.8.5) released.Add [BST](./Features.html#bst-behavior-sequence-transformer) model.
45
- 02/12/2021 : [v0.8.4](https://github.com/shenweichen/DeepCTR/releases/tag/v0.8.4) released.Fix bug in DCN-Mix.

docs/source/Quick-Start.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -86,7 +86,7 @@ fixlen_feature_columns = [SparseFeat(feat, vocabulary_size=data[feat].max() + 1,
8686
```
8787
- Feature Hashing on the fly
8888
```python
89-
fixlen_feature_columns = [SparseFeat(feat, vocabulary_size=1e6,embedding_dim=4, use_hash=True, dtype='string') # since the input is string
89+
fixlen_feature_columns = [SparseFeat(feat, vocabulary_size=1e6,embedding_dim=4, use_hash=True, dtype='string') # the input is string
9090
for feat in sparse_features] + [DenseFeat(feat, 1, )
9191
for feat in dense_features]
9292
```

docs/source/conf.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,7 @@
2626
# The short X.Y version
2727
version = ''
2828
# The full version, including alpha/beta/rc tags
29-
release = '0.8.6'
29+
release = '0.8.7'
3030

3131

3232
# -- General configuration ---------------------------------------------------

docs/source/index.rst

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -42,12 +42,12 @@ You can read the latest code and related projects
4242

4343
News
4444
-----
45+
07/18/2021 : Support pre-defined key-value vocabulary in `Hash` Layer. `example <./Examples.html#hash-layer-with-pre-defined-key-value-vocabulary>`_ `Changelog <https://github.com/shenweichen/DeepCTR/releases/tag/v0.8.7>`_
46+
4547
06/14/2021 : Add `IFM <./Features.html#ifm-input-aware-factorization-machine>`_ , `DIFM <./Features.html#difm-dual-input-aware-factorization-machine>`_ and `DeepFEFM <./Features.html#deepfefm-deep-field-embedded-factorization-machine>`_ . `Changelog <https://github.com/shenweichen/DeepCTR/releases/tag/v0.8.6>`_
4648

4749
03/13/2021 : Add `BST <./Features.html#bst-behavior-sequence-transformer>`_ . `Changelog <https://github.com/shenweichen/DeepCTR/releases/tag/v0.8.5>`_
4850

49-
02/12/2021 : Fix bug in DCN-Mix. `Changelog <https://github.com/shenweichen/DeepCTR/releases/tag/v0.8.4>`_
50-
5151
DisscussionGroup
5252
-----------------------
5353

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
1,1
2+
2,18
3+
3,25
4+
4,35
5+
5,45
6+
6,50
7+
7,56

0 commit comments

Comments
 (0)