Skip to content

Commit d4b656b

Browse files
Merge pull request #9836 from hylee817:yonsei-yt8m
PiperOrigin-RevId: 365507990
2 parents daae6a0 + 136cb32 commit d4b656b

19 files changed

+2507
-0
lines changed
Lines changed: 151 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,151 @@
1+
# YouTube-8M Tensorflow Starter Code
2+
3+
This repo contains starter code (written in TensorFlow 2.x) for training and
4+
evaluating machine learning models over the [YouTube-8M][1] dataset.
5+
This is the Tensorflow2 version of the original starter code:
6+
[YouTube-8M Tensorflow Starter Code][2]
7+
which was tested on Tensorflow 1.14. (The code gives an end-to-end
8+
working example for reading the dataset, training a TensorFlow model,
9+
and evaluating the performance of the model). Functionalities are maintained,
10+
while necessary migrations were done to accomodate running on tf2 environment.
11+
12+
### Requirements
13+
14+
The starter code requires Tensorflow. If you haven't installed it yet, follow
15+
the instructions on [tensorflow.org][3].
16+
This code has been tested with Tensorflow 2.4.0. Going forward,
17+
we will continue to target the latest released version of Tensorflow.
18+
19+
Please verify that you have Python 3.6+ and Tensorflow 2.4.0 or higher
20+
installed by running the following commands:
21+
22+
```sh
23+
python --version
24+
python -c 'import tensorflow as tf; print(tf.__version__)'
25+
```
26+
27+
Refer to the [instructions here][4]
28+
for using the model in this repo. Make sure to add the models folder to your
29+
Python path.
30+
31+
[1]: https://research.google.com/youtube8m/
32+
[2]: https://github.com/google/youtube-8m
33+
[3]: https://www.tensorflow.org/install/
34+
[4]:
35+
https://github.com/tensorflow/models/tree/master/official#running-the-models
36+
37+
#### Using GPUs
38+
39+
If your Tensorflow installation has GPU support
40+
(which should have been provided with `pip install tensorflow` for any version
41+
above Tensorflow 1.15), this code will make use of all of your compatible GPUs.
42+
You can verify your installation by running
43+
44+
```
45+
tf.config.list_physical_devices('GPU')
46+
```
47+
48+
This will print out something like the following for each of your compatible
49+
GPUs.
50+
51+
```
52+
I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720]
53+
Found device 0 with properties:
54+
pciBusID: 0000:00:04.0 name: Tesla P100-PCIE-16GB computeCapability: 6.0
55+
coreClock: 1.3285GHz coreCount: 56 deviceMemorySize: 15.90GiB
56+
deviceMemoryBandwidth: 681.88GiB/s
57+
...
58+
```
59+
60+
### Train and inference
61+
Train video-level model on frame-level features and inference at segment-level.
62+
63+
#### Train using the config file.
64+
Create a YAML or JSON file for specifying the parameters to be overridden.
65+
Working examples can be found in yt8m/experiments directory.
66+
```sh
67+
task:
68+
model:
69+
cluster_size: 2048
70+
hidden_size: 2048
71+
add_batch_norm: true
72+
sample_random_frames: true
73+
is_training: true
74+
activation: "relu6"
75+
pooling_method: "average"
76+
yt8m_agg_classifier_model: "MoeModel"
77+
train_data:
78+
segment_labels: false
79+
temporal_stride: 1
80+
num_devices: 1
81+
input_path: 'gs://youtube8m-ml/2/frame/train/train*.tfrecord'
82+
num_examples: 3888919
83+
...
84+
```
85+
86+
The code can be run in different modes: `train / train_and_eval / eval`.
87+
Run `yt8m_train.py` and specify which mode you wish to execute.
88+
Training is done using frame-level features with video-level labels,
89+
while inference can be done at segment-level.
90+
Setting `segment_labels=True` in your configuration forces
91+
the segment level labels to be used in the evaluation/validation phrase.
92+
If set to `False`, video level labels are used for inference.
93+
94+
The following commands will train a model on Google Cloud over frame-level
95+
features.
96+
97+
```bash
98+
python3 yt8m_train.py --mode='train' \
99+
--experiment='yt8m_experiment' \
100+
--model_dir=$MODEL_DIR \
101+
--config_file=$CONFIG_FILE
102+
```
103+
104+
In order to run evaluation after each training epoch,
105+
set the mode to `train_and_eval`.
106+
Paths to both train and validation dataset on Google Cloud are set as
107+
train: `input_path=gs://youtube8m-ml/2/frame/train/train*.tfrecord`
108+
validation:`input_path=gs://youtube8m-ml/3/frame/validate/validate*.tfrecord`
109+
as default.
110+
111+
```bash
112+
python3 yt8m_train.py --mode='train_and_eval' \
113+
--experiment='yt8m_experiment' \
114+
--model_dir=$MODEL_DIR \
115+
--config_file=$CONFIG_FILE \
116+
```
117+
118+
Running on evaluation mode loads saved checkpoint from specified path
119+
and runs inference task.
120+
```bash
121+
python3 yt8m_train.py --mode='eval' \
122+
--experiment='yt8m_experiment' \
123+
--model_dir=$MODEL_DIR \
124+
--config_file=$CONFIG_FILE
125+
```
126+
127+
128+
Once these job starts executing you will see outputs similar to the following:
129+
```
130+
train | step: 15190 | training until step 22785...
131+
train | step: 22785 | steps/sec: 0.4 | output:
132+
{'learning_rate': 0.0049961056,
133+
'model_loss': 0.0012011167,
134+
'total_loss': 0.0013538885,
135+
'training_loss': 0.0013538885}
136+
137+
```
138+
139+
and the following for evaluation:
140+
141+
```
142+
eval | step: 22785 | running 2172 steps of evaluation...
143+
eval | step: 22785 | eval time: 1663.4 | output:
144+
{'avg_hit_at_one': 0.5572835238737471,
145+
'avg_perr': 0.557277077999072,
146+
'gap': 0.768825760186494,
147+
'map': 0.19354554465020685,
148+
'model_loss': 0.0005052475,
149+
'total_loss': 0.0006564412,
150+
'validation_loss': 0.0006564412}
151+
```
Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
# Copyright 2021 The TensorFlow Authors. All Rights Reserved.
2+
#
3+
# Licensed under the Apache License, Version 2.0 (the "License");
4+
# you may not use this file except in compliance with the License.
5+
# You may obtain a copy of the License at
6+
#
7+
# http://www.apache.org/licenses/LICENSE-2.0
8+
#
9+
# Unless required by applicable law or agreed to in writing, software
10+
# distributed under the License is distributed on an "AS IS" BASIS,
11+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12+
# See the License for the specific language governing permissions and
13+
# limitations under the License.
14+
Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
# Copyright 2021 The TensorFlow Authors. All Rights Reserved.
2+
#
3+
# Licensed under the Apache License, Version 2.0 (the "License");
4+
# you may not use this file except in compliance with the License.
5+
# You may obtain a copy of the License at
6+
#
7+
# http://www.apache.org/licenses/LICENSE-2.0
8+
#
9+
# Unless required by applicable law or agreed to in writing, software
10+
# distributed under the License is distributed on an "AS IS" BASIS,
11+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12+
# See the License for the specific language governing permissions and
13+
# limitations under the License.
14+
15+
"""Configs package definition."""
16+
17+
from official.vision.beta.projects.yt8m.configs import yt8m
Lines changed: 157 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,157 @@
1+
# Copyright 2021 The TensorFlow Authors. All Rights Reserved.
2+
#
3+
# Licensed under the Apache License, Version 2.0 (the "License");
4+
# you may not use this file except in compliance with the License.
5+
# You may obtain a copy of the License at
6+
#
7+
# http://www.apache.org/licenses/LICENSE-2.0
8+
#
9+
# Unless required by applicable law or agreed to in writing, software
10+
# distributed under the License is distributed on an "AS IS" BASIS,
11+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12+
# See the License for the specific language governing permissions and
13+
# limitations under the License.
14+
15+
"""Video classification configuration definition."""
16+
from typing import Optional, Tuple
17+
from absl import flags
18+
import dataclasses
19+
20+
from official.core import config_definitions as cfg
21+
from official.core import exp_factory
22+
from official.modeling import hyperparams
23+
from official.modeling import optimization
24+
25+
FLAGS = flags.FLAGS
26+
27+
YT8M_TRAIN_EXAMPLES = 3888919
28+
YT8M_VAL_EXAMPLES = 1112356
29+
# 2/frame -> frame level
30+
# 3/frame -> segment level
31+
YT8M_TRAIN_PATH = 'gs://youtube8m-ml/2/frame/train/train*.tfrecord'
32+
YT8M_VAL_PATH = 'gs://youtube8m-ml/3/frame/validate/validate*.tfrecord'
33+
34+
35+
@dataclasses.dataclass
36+
class DataConfig(cfg.DataConfig):
37+
"""The base configuration for building datasets."""
38+
name: Optional[str] = 'yt8m'
39+
split: Optional[str] = None
40+
feature_sizes: Tuple[int, ...] = (1024, 128)
41+
feature_names: Tuple[str, ...] = ('rgb', 'audio')
42+
segment_size: int = 1
43+
segment_labels: bool = False
44+
temporal_stride: int = 1
45+
max_frames: int = 300
46+
num_frames: int = 300 # set smaller to allow random sample (Parser)
47+
num_classes: int = 3862
48+
num_devices: int = 1
49+
input_path: str = ''
50+
is_training: bool = True
51+
random_seed: int = 123
52+
num_examples: int = -1
53+
54+
55+
def yt8m(is_training):
56+
"""YT8M dataset configs."""
57+
return DataConfig(
58+
num_frames=30,
59+
temporal_stride=1,
60+
segment_labels=False,
61+
segment_size=5,
62+
is_training=is_training,
63+
split='train' if is_training else 'valid',
64+
num_examples=YT8M_TRAIN_EXAMPLES if is_training else YT8M_VAL_EXAMPLES,
65+
input_path=YT8M_TRAIN_PATH if is_training else YT8M_VAL_PATH)
66+
67+
68+
@dataclasses.dataclass
69+
class YT8MModel(hyperparams.Config):
70+
"""The model config."""
71+
cluster_size: int = 2048
72+
hidden_size: int = 2048
73+
add_batch_norm: bool = True
74+
sample_random_frames: bool = True
75+
is_training: bool = True
76+
activation: str = 'relu6'
77+
pooling_method: str = 'average'
78+
yt8m_agg_classifier_model: str = 'MoeModel'
79+
80+
81+
@dataclasses.dataclass
82+
class Losses(hyperparams.Config):
83+
name: str = 'binary_crossentropy'
84+
from_logits: bool = False
85+
label_smoothing: float = 0.0
86+
87+
88+
@dataclasses.dataclass
89+
class YT8MTask(cfg.TaskConfig):
90+
"""The task config."""
91+
model: YT8MModel = YT8MModel()
92+
train_data: DataConfig = yt8m(is_training=True)
93+
validation_data: DataConfig = yt8m(is_training=False)
94+
losses: Losses = Losses()
95+
gradient_clip_norm: float = 1.0
96+
num_readers: int = 8
97+
top_k: int = 20
98+
top_n: Optional[int] = None
99+
100+
101+
def add_trainer(
102+
experiment: cfg.ExperimentConfig,
103+
train_batch_size: int,
104+
eval_batch_size: int,
105+
learning_rate: float = 0.005,
106+
train_epochs: int = 44,
107+
):
108+
"""Add and config a trainer to the experiment config."""
109+
if YT8M_TRAIN_EXAMPLES <= 0:
110+
raise ValueError('Wrong train dataset size {!r}'.format(
111+
experiment.task.train_data))
112+
if YT8M_VAL_EXAMPLES <= 0:
113+
raise ValueError('Wrong validation dataset size {!r}'.format(
114+
experiment.task.validation_data))
115+
experiment.task.train_data.global_batch_size = train_batch_size
116+
experiment.task.validation_data.global_batch_size = eval_batch_size
117+
steps_per_epoch = YT8M_TRAIN_EXAMPLES // train_batch_size
118+
experiment.trainer = cfg.TrainerConfig(
119+
steps_per_loop=steps_per_epoch,
120+
summary_interval=steps_per_epoch,
121+
checkpoint_interval=steps_per_epoch,
122+
train_steps=train_epochs * steps_per_epoch,
123+
validation_steps=YT8M_VAL_EXAMPLES // eval_batch_size,
124+
validation_interval=steps_per_epoch,
125+
optimizer_config=optimization.OptimizationConfig({
126+
'optimizer': {
127+
'type': 'adam',
128+
'adam': {}
129+
},
130+
'learning_rate': {
131+
'type': 'exponential',
132+
'exponential': {
133+
'initial_learning_rate': learning_rate,
134+
'decay_rate': 0.95,
135+
'decay_steps': 1500000,
136+
}
137+
},
138+
}))
139+
return experiment
140+
141+
142+
@exp_factory.register_config_factory('yt8m_experiment')
143+
def yt8m_experiment() -> cfg.ExperimentConfig:
144+
"""Video classification general."""
145+
exp_config = cfg.ExperimentConfig(
146+
runtime=cfg.RuntimeConfig(mixed_precision_dtype='bfloat16'),
147+
task=YT8MTask(),
148+
trainer=cfg.TrainerConfig(),
149+
restrictions=[
150+
'task.train_data.is_training != None',
151+
'task.validation_data.is_training != None',
152+
'task.train_data.num_classes == task.validation_data.num_classes',
153+
'task.train_data.feature_sizes != None',
154+
'task.train_data.feature_names != None',
155+
])
156+
157+
return add_trainer(exp_config, train_batch_size=512, eval_batch_size=512)

0 commit comments

Comments
 (0)