-
Notifications
You must be signed in to change notification settings - Fork 138
Open
Description
My config:
%YAML 1.2
---
#EARLY RELEASE###########################################################################
# If you are reading this, this guided-example.yaml is still a work in progress,
# Please read it over and make add corrections/improvements/explanations/recommended
# parameter values and share it back via PM to 12341234 on discord. Eventually this .yaml
# will help jumpstart new trainers and reduce the amount redundant work answering the same
# questions on discord.
#########################################################################################
#CREDITS#################################################################################
# This .yaml was produced by the hard work of many contributor from the Leela Chess Discord Channel
#########################################################################################
#----------------------------------------------------------------------------------------#
#----------------------------------------------------------------------------------------#
#WELCOME#################################################################################
# Welcome to the training configuration example.yaml! Here you can experiment on various
# training parameters. Each parameter is commented to help you better understand the params
# and values. If you are just starting, you can leave most the values as they are, (just change
# "YOURUSERNAME" and YOURPATH). Try and familiarize yourself with the options avaiable below.
# Happy training!
#########################################################################################
#INITIAL SETUP###########################################################################
# Choose a name which properly reflects the parameters of your training. In this examples
# I use "ex" for example, separated by a dash (ideally use no spaces in the name)
# and 64x6 for the network size, which is defined at the end of this configuration file.
# I added se4 to show the parameter "squeeze/excite" is at a ratio of 4
# Of course you can come up with your own logical convention or fun ones such as terminator
# or Cyberdyne Systems Model 101, but try to add meta information to help identify specific
# training parameters, such as Schwarzenegger-64x6-se10, etc...
name: 'asd-ok'
#----------------------------------------------------------------------------------------
# gpu: 0, sets the GPU ID to use for training. Leela can only can use 1 GPU for training
gpu: 0
##########################################################################################
# dataset section is a group of parameters which control the input data for training and testing
dataset:
#TRAINING/TESTING DATA#####################################################################
# input Edit directory path to reflect your prepared data
input: '/home/france1/t60data/prepareddata/'
#----------------------------------------------------------------------------------------
# train_ratio value is the proportion of data divided between training and testing.
# This example.yaml is setup for a one-shot run with all data in one directory.
# .65=65% training /35% testing, .70=70/30, .80=80/20, .90=90/10, etc...
train_ratio: 0.90
#----------------------------------------------------------------------------------------
# For manully setting up separated test and train data, uncomment input_train and input_test below
# and comment out train_ratio and train above
# You can randomly choose what data goes into what folder, but it should be in the proportion
# you desire (see training set ratio for example proportions)
# input_train: 'C:\Users\YOURUSERNAME\lczero-training\t60data\t60preppeddata\train\' # supports glob
# input_test: 'C:\Users\YOURUSERNAME\lczero-training\t60data\t60preppeddata\test\' # supports glob
# Manually setting up test and train directories allows you to have more control over what data is
# used for testing and training.
############################################################################################
#CHUNKS###################################################################################
# Chunks are input samples needed for training. For Leela these samples come from chunk files
# which have at least one and sometimes many games. Samples are positions selected randomly
# from the games in the chunk files.
#----------------------------------------------------------------------------------------
# num_chunks specifies the number of chunks from which to select samples. It uses only the X most
# recent chunks to parse for rolling training that the RL (reinforcement learning) method uses.
# For instance, if you have 10 million games in the training path, it only uses X most # recent ones.
# Advanced info: It is quite common to train with hundreds of thousands of chunk files.
# When starting training, the code will create a list of all the files and then sort them by
# date/time stamp. This can take a considerable amount of time. Consuming the files in order
# is important for reinforcement learning (RL) and less so for supervised learning (SL).
# So, if you are doing SL, you can just comment out the sort in train.py currently at lines 51
# and 62 [chunks.sort(key=os.path.getmtime, reverse=True)] and it will run much faster.
# Also, Windows typically slows down with more than about 30-50,000 files in a directory, so use
# a multi-tier directory structure for a large number of files. If you get the not enough chunks
# message with zero, it usually means there is an error in the input: parameter string pointing to
# the top chunk file directory. Because syntax errors are common (at least for me), I like to comment
# out the sort to make sure the chunk files are being found first, and then restart training with
# the sort if doing RL.
num_chunks: 100000
# -----------------------------------------------------------------------------------------
# allow_less_chunks: true allows the training to proceed even if there are fewer chunk files than
# num_chunks, as will almost always be the case. Earlier versions of the training code did not
# have this option and training would fail with a not enough chunks message.
allow_less_chunks: true
# num_chunks_train sets the value of the amount of chunks to be used for training data.
# this is not needed if you are using the train_ratio set.
#num_chunks_train: 10000000
# num_chunks_train sets the value of the amount of chunks to be used for training data.
# Also not needed if you are using the train_ratio set.
#num_chunks_test: 500000
###########################################################################################
# ADVANCED DATA OPTIONS ###################################################################
# The games in PGN format are converted in input bit planes in the chunk files during self-play
# (controlled by the client) or using the trainingdata-tool (typically for SL). During training
# the input bit planes are converted into tensor format for Tensorflow processing. This is done
# one of two ways. The most recent training code uses protobufs.
# experimental_v5_only_dataset automatically creates workers for reading and converting input.
# The older way uses the chunkparser.py code, which is part of a Python generator to provide the
# samples and uses additional processes called workers. Reading and converting the input chunk files
# can constrain training performance, so more workers is generally good. In addition, is is best to
# keep them on a fast SSD disk. More workers can be created than physical cores, depending on your
# system, so experiment. First get training working at using the most simple options, then experiment
# and try different things to train faster. Training speed for smaller nets (10b size) will often be
# limited by the disk reads or chunk processing in the CPU. Larger nets will quickly become limited
# by the GPU speed. However, if you use the newer format the workers are created automatically.
# So, start with experimental_v5_only_dataset: true
experimental_v5_only_dataset: true
#----------------------------------------------------------------------------------------
# train_workers manually specifies the number of workers to create for reading and converting the input
# for training
#train_workers=x
#----------------------------------------------------------------------------------------
# test_workers manually specifies the number of workers to create for reading and converting the input
# for testing
#test_workers=x
#----------------------------------------------------------------------------------------
#input_validation: 'C:\YOURDIRECTORY\validate_v5\'
############################################################################################
# dataset section is a group of parameters which control the testing and training parameters
training:
#PATH SETUP###############################################################################
# path specifies where to save your networks and checkpoints
path: '/home/france1/t60data/trainednet/'
##########################################################################################
#TEST POSITIONS###########################################################################
# Total number of steps for training
num_test_positions: 10000
##########################################################################################
#BATCH SETTINGS##################################################################################
# batch_size: value specifies the number of training positions per step sent in parallel to the GPU.
# A step is the fundamental training unit) my usual is 4096.
# A good rule of thumb is for every 1 Million training games, use 10k steps with 4096 batch,
# which is 40,960,000 positions sampled or just over 40 positions per game. In the same way, with
# a 2048 batch use 20k steps. More sampling than this is may lead to overfitting from oversampling the # same games.
# Each training.tar from http://data.lczero.org/files/ contains about 10k games, so every 100 # training.tar files is roughly 1 million games.
batch_size: 4096
# num_batch_splits is a "technical trick" to avoid overflowing the GPU memory. It divides the
# batch_size into "splits". Make sure batch_size divides evenly into num_batch_splits. Although the
# training speed decrease is small, its best to use the smallest split your VRAM can handle. A split
# of 4 (compared to 8) leaves 2x the VRAM headroom with little speed difference. If the net
# is very small, num_batch_splits can be 1. If the net is any decent size it needs to be higher.
num_batch_splits: 8
##########################################################################################
#STOCHASTIC WEIGHT AVERAGING############################################################
# swa is a weight averaging method that is usually a good idea to apply, most nets use this.
# It averages weights over some number of recent sets of nets, with net sampling determined by
# the parameters. This “average net” will tend to be a little better than just taking the final net
# https://arxiv.org/pdf/1803.05407.pdf
swa: true #----------------------------------------------------------------------------------------
# swa_output turns on the stochastic weight averaging for weights outside of the optimization, averaging
# the (non-active) weights over a number of defined steps. This takes quite the computational resources
# but is generally desired and stronger than the weights created without this feature.
swa_output: true
#----------------------------------------------------------------------------------------
# swa steps specifies how many steps in are consecutively averaged
swa_steps: 20
#----------------------------------------------------------------------------------------
# swa_max_n is the cap on n used in the 1/n multiplier for the current weights when merging them with
# the current SWA weights. First its 1/1 - all of it comes from the current weights, then 1/2 - half
# and half - then 1/3, etc... When it reaches n the network is the average of the last n sample points
# but once n becomes capped its a weighted average of all previous sample points. This is biased towards
# more recent samples - but still theoretically has tiny contributions from samples all the way back.
# In practice, floating point accuracy means that the old contributions can be ignored at some point.
# So for a swa_max_n of 10 it means networks farther back than the last 10 networks will not be
# multiplied by 1/11, 1/12, 1/13 etc, but instead, 1/10. The larger n is the less past averages
# affect the current swa network. The smaller n is the more past averages affect the new swa average.
swa_max_n: 10
##########################################################################################
#CONFIGURE STEP TRIGGERS##################################################################
# eval test set values after this many steps
test_steps: 2000
#----------------------------------------------------------------------------------------
# training reports values after this many steps.
train_avg_report_steps: 200
#----------------------------------------------------------------------------------------
# validation_steps specifies how long the training will run.
# Each step consists of a number of input samples. This is the batch_size. So, if the batch_size
# is 1024 and the number of steps is 100,000, then 102,400,000 input samples will be used.
# Note that this is the number of sample positions, not games. Each game has roughly 150 positions;
# even so, a large number of games is needed. Not every position is used in a game. In fact only
# every 32nd position generally is. Generally, with machine learning (ML) the input samples are
# divided into several groups: training, testing, and validation. The training samples are the ones
# used to do the actual learning. The test samples are used every so often (test_steps parameter)
# to evaluate how well the net is doing against samples it has not seen before. Sometimes the
# validation samples are also used. Like test_steps validation_steps specifies how often the
# validation data get used.
# Also, note that the validation samples should be totally separate from any of the train and test
# data. Tensorflow can produce graphs (using tensorboard) that show how the learning is progressing.
# The train, test and optional validation trends help to know when to change things to improve
# training, or not. The validation_steps parameter is relatively new and can certainly be omitted
# until experimenting with more advanced training.
# Finally, note that in general ML training is also measured in epochs. An epoch is one run through
# all of the train/test/validation number of steps. Then, training starts again with the somewhat
# smarter net using the same collection of input samples. Most ML training runs for many epochs.
# With Leela, there is so much data that the epoch concept is not used. So, when reading the papers
# about ML that give results after many epochs, keep in mind that the technique used may not work for
# Leela (which effectively is using just one epoch).
validation_steps: 2000
#----------------------------------------------------------------------------------------
# terminate (total) steps, for batch 4096 <10k steps per million games
total_steps: 140000
#----------------------------------------------------------------------------------------
# optional frequency for checkpoints before finish
checkpoint_steps: 10000
##########################################################################################
#RENORMALIZATION##########################################################################
# Renormalizing outputs for residual layers, basically when activation values (not weights)
# for each layer are computed, it redistributes them to match some preferred distribution
# (renormalizes). These should start low and gradually increase.
renorm: false
#----------------------------------------------------------------------------------------
#Start at renorm_max_r=1.0 renorm_max_d=0.0
#---------------------------------
#renorm_max_r=1.1 renorm_max_d=0.1
#renorm_max_r=1.2 renorm_max_d=0.2
#renorm_max_r=1.3 renorm_max_d=0.3
#renorm_max_r=1.4 renorm_max_d=0.4
#renorm_max_r=1.5 renorm_max_d=0.5
#(all of the above can happen during the 1st LR (0.2))
#---------------------------------
#renorm_max_r=1.7 renorm_max_d=0.7
#renorm_max_r=2.0 renorm_max_d=1.0
#renorm_max_r=3.0 renorm_max_d=2.5
#renorm_max_r=4.0 renorm_max_d=5.0
#(all of the above can happen during the 2nd LR (0.02))
#---------------------------------
#If continuing traing upon a relatively mature net, you can set to final values
#-------------------------------------------------------------------------------
#renorm_max_r: 4.0 # gradually raise from 1.0 to 4.0
#renorm_max_d: 5.0 # gradually raise from 0.0 to 5.0
#max_grad_norm: 5.0 # can be much higher than 2.0 after first LR NEEDS MORE EXPLANATION
##########################################################################################
#LEARNING RATE############################################################################
# The learning rate (LR) is a critical hyper-parameter in machine learning (ML). This term is
# multiplied in the calculation to determine how much to change things after a step. If the LR
# is too big, the steps will be too large and not converge to a good value (local or global minimum)
# as the net "bounces" around the solution space. If the LR is too little, the net will converge,
# but very slowly. And, it may get "stuck" at a local minimum and not be able to see past that to
# find a better minimum. There are many papers and techniques about how to pick a starting learning
# rate, and how and when to change it during training.
#-------------------------------------------------------------------------------
# list of LR values
lr_values:
- 0.02
- 0.002
- 0.0005
#-------------------------------------------------------------------------------
# list of boundaries in steps
lr_boundaries:
- 100000 # steps until 1st LR drop
- 130000 # steps until 2nd LR drop
#-------------------------------------------------------------------------------
# warmup_steps specifies how many steps to use before using the first value in the lr_values list until
# the number of steps exceeds the first value in the lr_boundaries list. Since small LR values will
# head in a relatively good direction, small values are initially used during a "warm up" period to
# get the net started. Use the default until value before attempting more advanced training.
warmup_steps: 250
##########################################################################################
#LOSS WEIGHTS##############################################################################
# Each loss term measures how far the current net is from matching the training data,
# there is policy loss, value loss, moves left loss and (optionally) Q loss
# for each of those loss terms, a "backpropagation" method changes the net weights to better
# match the training data
# the measurement is repeated on batch_size random positions from training data and the average
# losses are used backpropagation
# Each such batch is a "step"
# it isn't all that clear whether Q loss is useful, sometimes I use it and sometimes I don't
policy_loss_weight: 1.0 # weight of policy loss, value range: 0-1
#-------------------------------------------------------------------------------
value_loss_weight: 1.0 # weight of value loss, values range: 0-1
#-------------------------------------------------------------------------------
moves_left_loss_weight: 0.0 # weight of moves_left loss, values range: 0-1
##########################################################################################
#Q-RATIO##################################################################################
# The loss_weight of q ratio + z outcome = 1.
# Q = W-L (no draw)
# (ex: q_ratio of .35, sets z_outcome_loss_weight to .65)
q_ratio: 0.00
##########################################################################################
#SHUFFLE#############################################################################
#This controls the "Stochastic part of training, basically loading large sets of positions and randomizing their order.
# As positions get used in training, it replaces those with new random positions.
shuffle_size: 200000 # typcially 500k or more, but you can get away with alot less
##########################################################################################
#MISCELLANEOUS###NEEDS SORTED TO PROPER CATEGORY IN .YAML, IF ONE EXISTS, OTHERWISE STAYS IN MISC##
# lookahead optimizer is not supported by Lczero published training scripts, it has not shown any
# significant improvements.
#lookahead_optimizer: false
mask_legal_moves: true # Filters out illegal moves. NEEDS BETTER EXPLANATION
#-------------------------------------------------------------------------------
# precision: 'single' or 'half' specifies the floating point precision used in the training calculations.
# Single is tf.float32 and half is tf.float16 Some GPUs can use different precision formats
# faster than others. Even with tf.float16 which is not a accurate as tf.float32, the key internal
# calculations requiring the most precision are done with tf.float32.
# Start with single and once training is working see how your GPU does with half.
# It may not be supported or slower or faster.
precision: 'single'
#-------------------------------------------------------------------------------
# model section is a group of parameters which control the architecture of the network
model:
#NETWORK ARCHITECTURE#####################################################################
filters: 64 # Number of filters
residual_blocks: 6 # Number of blocks
#-------------------------------------------------------------------------------
se_ratio: 4 #Squeeze Excite structural network architecture.
# (SE provides significant improvement with minimal speed costs.)
# The se-ratio should be based on the net size.
#(Not all values are supported with the backend optimizations.) A common/default value is 8
# You want the se_ratio to divide into your filters evenly, (at least a multiple of 8 32 and 64
# are popular target outcomes)
# A recent update to the codebase means that for 320 filters, SE ratio 10 and 5 are optimized chess
# fp16 flows using a fused SE kernel)
##########################################################################################
##########################################################################################
#POLICY##################################################################################
# policy: the default is convolution with classical option.
# policy: convolution allows skipping blocks.
policy: 'convolution'
#policy: classical option does not have block skipping
# policy_channels are only used with "policy: classical" architecture. Default value is 32.
# The value is the number of outputs of a specific layer, in this case the policy head
# policy_channels: 80
##########################################################################################
##########################################################################################
#MISCELLANEOUS###NEEDS SORTED TO PROPER CATEGORY IN .YAML, IF ONE EXISTS, OTHERWISE STAYS IN MISC##
# value: 'wdl' stands for )wins/draws/losses), default is value: 'wdl', can optionally be false.
value: 'wdl'
# moves_left is a recent third type of head, in addition to policy and value. It tries to predict
# game length to reduce endgame shuffling. Default value is moves_left: 'v1', optional value is false.
moves_left: 'none'
# input_type default value is classic
# input_type: 'frc_castling' changes input format for FRC (Fischer Random Chess)
# input_type: 'canonical' changes input format so that enpassant is passed to the net, rather
# than inferred from history. History is clipped at last 50 move reset or castling move. If
# no castling rights, board is flipped to ensure your king is on the righthand side. If no pawns
# (and no castling rights) board is flipped/mirrored/transposed to ensure king is in bottom right
# octant towards the middle. If king is on the diagonal, other rules apply to decide whether to
# transpose or not, to ensure a consistent view of the board is always given to the net.
# Also, the castling information is encoded for the net to understand FRC castling.
# input_type: 'canonical_100' is the same as 'canonical' but divide 50 move rule by 100 instead.
# input_type: 'ccanonical_armageddon' changes input format for armageddon chess variant
# More on input_type:
# Self-play games are saved in PGN and bitplane format which is used for training. There have been
# many bitplane versions. These are the samples (positions) that are randomly picked from chunk file
# games. After being generated by self-play, the chunk files are typically rescored. The rescorer is
# a custom verision of Lc0 that checks syzygy and Gaviota tablebases (TBs) to correct the results
# using the perfect information from the TBs. After rescoring the chunks are finally ready for
# training. The versions should match. Rescorer is here: https://github.com/Tilps/lc0/tree/rescore_tb
# The rescorer 'up converts' versions not input types, 3 to 4 or 4 to 5 (with classical input type),
# 5 (classical) to 5 (canonical). I would have to study the code more to see the difference between and input "type" and a "version"
input_type: 'canonical'
##########################################################################################
##########################################################################################
And the output:
python train.py --cfg ../../asd.yaml
dataset:
allow_less_chunks: true
experimental_v5_only_dataset: true
input: /home/france1/t60data/prepareddata/
num_chunks: 100000
train_ratio: 0.9
gpu: 0
model:
filters: 64
input_type: canonical
moves_left: none
policy: convolution
residual_blocks: 6
se_ratio: 4
value: wdl
name: asd-ok
training:
batch_size: 4096
checkpoint_steps: 10000
lr_boundaries:
- 100000
- 130000
lr_values:
- 0.02
- 0.002
- 0.0005
mask_legal_moves: true
moves_left_loss_weight: 0.0
num_batch_splits: 8
num_test_positions: 10000
path: /home/france1/t60data/trainednet/
policy_loss_weight: 1.0
precision: single
q_ratio: 0.0
renorm: false
shuffle_size: 200000
swa: true
swa_max_n: 10
swa_output: true
swa_steps: 20
test_steps: 2000
total_steps: 140000
train_avg_report_steps: 200
validation_steps: 2000
value_loss_weight: 1.0
warmup_steps: 250
got 20673 chunks for /home/france1/t60data/prepareddata/
sorting 20673 chunks...[done]
training.283088654.gz - training.283109356.gz
Using 30 worker processes.
Using 30 worker processes.
2022-02-27 11:22:38.905570: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2022-02-27 11:22:39.533747: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcuda.so.1
2022-02-27 11:22:39.552706: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-02-27 11:22:39.552831: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties:
pciBusID: 0000:07:00.0 name: NVIDIA GeForce GTX 1660 SUPER computeCapability: 7.5
coreClock: 1.785GHz coreCount: 22 deviceMemorySize: 5.80GiB deviceMemoryBandwidth: 312.97GiB/s
2022-02-27 11:22:39.552844: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2022-02-27 11:22:39.554756: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublas.so.11
2022-02-27 11:22:39.554778: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublasLt.so.11
2022-02-27 11:22:39.569353: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcufft.so.10
2022-02-27 11:22:39.569492: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcurand.so.10
2022-02-27 11:22:39.569837: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcusolver.so.11
2022-02-27 11:22:39.571197: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcusparse.so.11
2022-02-27 11:22:39.571279: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudnn.so.8
2022-02-27 11:22:39.571334: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-02-27 11:22:39.571466: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-02-27 11:22:39.571564: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1871] Adding visible gpu devices: 0
[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
2022-02-27 11:22:39.572156: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-02-27 11:22:39.573005: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-02-27 11:22:39.573113: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties:
pciBusID: 0000:07:00.0 name: NVIDIA GeForce GTX 1660 SUPER computeCapability: 7.5
coreClock: 1.785GHz coreCount: 22 deviceMemorySize: 5.80GiB deviceMemoryBandwidth: 312.97GiB/s
2022-02-27 11:22:39.573144: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-02-27 11:22:39.573255: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-02-27 11:22:39.573351: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1871] Adding visible gpu devices: 0
2022-02-27 11:22:39.573368: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2022-02-27 11:22:39.852428: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1258] Device interconnect StreamExecutor with strength 1 edge matrix:
2022-02-27 11:22:39.852457: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1264] 0
2022-02-27 11:22:39.852462: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1277] 0: N
2022-02-27 11:22:39.852602: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-02-27 11:22:39.852737: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-02-27 11:22:39.852840: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-02-27 11:22:39.852941: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1418] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 1893 MB memory) -> physical GPU (device: 0, name: NVIDIA GeForce GTX 1660 SUPER, pci bus id: 0000:07:00.0, compute capability: 7.5)
2022-02-27 11:22:39.982744: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:176] None of the MLIR Optimization Passes are enabled (registered 2)
2022-02-27 11:22:39.996678: I tensorflow/core/platform/profile_utils/cpu_utils.cc:114] CPU Frequency: 3401190000 Hz
Using 19 evaluation batches
2022-02-27 11:22:40.993055: W tensorflow/core/framework/op_kernel.cc:1755] Unknown: AssertionError:
Traceback (most recent call last):
File "/home/france1/anaconda3/envs/lc0-training/lib/python3.6/site-packages/tensorflow/python/ops/script_ops.py", line 249, in __call__
ret = func(*args)
File "/home/france1/anaconda3/envs/lc0-training/lib/python3.6/site-packages/tensorflow/python/autograph/impl/api.py", line 645, in wrapper
return func(*args, **kwargs)
File "/home/france1/anaconda3/envs/lc0-training/lib/python3.6/site-packages/tensorflow/python/data/ops/dataset_ops.py", line 961, in generator_py_func
values = next(generator_state.get_iterator(iterator_id))
File "/home/france1/smr-hdd/asd/lczero-training/tf/chunkparser.py", line 565, in parse
for b in gen:
File "/home/france1/smr-hdd/asd/lczero-training/tf/chunkparser.py", line 551, in batch_gen
s = list(itertools.islice(gen, self.batch_size))
File "/home/france1/smr-hdd/asd/lczero-training/tf/chunkparser.py", line 542, in tuple_gen
yield self.convert_v6_to_tuple(r)
File "/home/france1/smr-hdd/asd/lczero-training/tf/chunkparser.py", line 335, in convert_v6_to_tuple
assert input_format == self.expected_input_format
AssertionError
Traceback (most recent call last):
File "train.py", line 257, in <module>
main(argparser.parse_args())
File "train.py", line 234, in main
batch_splits=batch_splits)
File "/home/france1/smr-hdd/asd/lczero-training/tf/tfprocess.py", line 625, in process_loop
self.process(batch_size, test_batches, batch_splits=batch_splits)
File "/home/france1/smr-hdd/asd/lczero-training/tf/tfprocess.py", line 811, in process
self.calculate_test_summaries(test_batches, steps + 1)
File "/home/france1/smr-hdd/asd/lczero-training/tf/tfprocess.py", line 933, in calculate_test_summaries
x, y, z, q, m = next(self.test_iter)
File "/home/france1/anaconda3/envs/lc0-training/lib/python3.6/site-packages/tensorflow/python/data/ops/iterator_ops.py", line 761, in __next__
return self._next_internal()
File "/home/france1/anaconda3/envs/lc0-training/lib/python3.6/site-packages/tensorflow/python/data/ops/iterator_ops.py", line 747, in _next_internal
output_shapes=self._flat_output_shapes)
File "/home/france1/anaconda3/envs/lc0-training/lib/python3.6/site-packages/tensorflow/python/ops/gen_dataset_ops.py", line 2728, in iterator_get_next
_ops.raise_from_not_ok_status(e, name)
File "/home/france1/anaconda3/envs/lc0-training/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 6897, in raise_from_not_ok_status
six.raise_from(core._status_to_exception(e.code, message), None)
File "<string>", line 3, in raise_from
tensorflow.python.framework.errors_impl.UnknownError: AssertionError:
Traceback (most recent call last):
File "/home/france1/anaconda3/envs/lc0-training/lib/python3.6/site-packages/tensorflow/python/ops/script_ops.py", line 249, in __call__
ret = func(*args)
File "/home/france1/anaconda3/envs/lc0-training/lib/python3.6/site-packages/tensorflow/python/autograph/impl/api.py", line 645, in wrapper
return func(*args, **kwargs)
File "/home/france1/anaconda3/envs/lc0-training/lib/python3.6/site-packages/tensorflow/python/data/ops/dataset_ops.py", line 961, in generator_py_func
values = next(generator_state.get_iterator(iterator_id))
File "/home/france1/smr-hdd/asd/lczero-training/tf/chunkparser.py", line 565, in parse
for b in gen:
File "/home/france1/smr-hdd/asd/lczero-training/tf/chunkparser.py", line 551, in batch_gen
s = list(itertools.islice(gen, self.batch_size))
File "/home/france1/smr-hdd/asd/lczero-training/tf/chunkparser.py", line 542, in tuple_gen
yield self.convert_v6_to_tuple(r)
File "/home/france1/smr-hdd/asd/lczero-training/tf/chunkparser.py", line 335, in convert_v6_to_tuple
assert input_format == self.expected_input_format
AssertionError
[[{{node PyFunc}}]] [Op:IteratorGetNext]
Started it with python train.py --cfg ../../asd.yaml
Had to install tensorflow-gpu not tensorflow, the requirements file is pretty broken
input_validation: 'C:\YOURDIRECTORY\validate_v5\' Is this required? What is this.
Metadata
Metadata
Assignees
Labels
No labels