Skip to content

Conversation

@csukuangfj
Copy link
Contributor

@csukuangfj csukuangfj commented Feb 19, 2020

support distributed training across multiple GPUs.

TODOs:

  • there are lots of code duplicates

Part of the training log

2020-02-19 13:55:10,646 INFO [ddp_train.py:160] Device (1) processing 1100/4724(23.285351%) global average objf: -0.225449 over 6165760.0 frames, current batch average objf: -0.130735 over 6400 frames, epoch 0
2020-02-19 13:55:55,251 INFO [ddp_train.py:160] Device (0) processing 1200/4724(25.402202%) global average objf: -0.216779 over 6732672.0 frames, current batch average objf: -0.123979 over 3840 frames, epoch 0
2020-02-19 13:55:55,252 INFO [ddp_train.py:160] Device (1) processing 1200/4724(25.402202%) global average objf: -0.216412 over 6738176.0 frames, current batch average objf: -0.132368 over 4736 frames, epoch 0

The training seems working.

@csukuangfj
Copy link
Contributor Author

csukuangfj commented Feb 19, 2020

@qindazhu @fanlu

could you please try this pullrequest when you have time?

If you have multiple GPUs on a single machine, it should accelerate your training
by using multiple GPUs.

I have NOT tested multiple-machines with multiple-gpus using ddp.

@qindazhu
Copy link
Contributor

Sure, will do. Maybe later today or tomorrow.

@csukuangfj
Copy link
Contributor Author

Here are the results for DDP (DistributedDataParallel) training WITHOUT constrain_orthonormal()

wer/cer

DDP training (no constrain_orthonormal) previous pullrequest (#3925)
test cer 8.23 7.91
test wer 16.98 16.49
dev cer 6.74 6.48
dev wer 14.77 14.48

Training time

DDP (2 GPUs) previous pullrequest (#3925) with single GPU
6 epochs in total 2 hours 10 minutes 49 seconds 3 hours, 54 seconds

I will enable constrain_orthornormal() via forward hooks.

@danpovey
Copy link
Contributor

danpovey commented Feb 19, 2020 via email

@fanlu
Copy link

fanlu commented Feb 19, 2020

cool! I will try later.

@fanlu
Copy link

fanlu commented Feb 20, 2020

this is my result based on this pr

exp ddp test cer test wer dev cer dev wer training time
2gpu 7.96 16.61 6.53 14.50 2h53m54s
4gpu 8.25 16.94 6.73 14.76 1h36m8s
4gpu lr=4e-3 7.96 16.61 6.48 14.45 1h33m2s

@csukuangfj
Copy link
Contributor Author

@fanlu

thank you very much.

Did you use the commit with constrain_orthonormal() enabled?

@fanlu
Copy link

fanlu commented Feb 20, 2020

Yes, self.register_forward_pre_hook(constrain_orthonormal_hook) is in my exp.

@csukuangfj
Copy link
Contributor Author

csukuangfj commented Feb 20, 2020

@fanlu

Your results show that the training time is indeed decreased, although it is not decreased linearly.

There is a slight degradation in wer/cer; I guess this is caused by setting shuffle=False
in the dataloader. PyTorch requires that if we provide a sampler, we have
to set shuffle = False.

According to this paper
"Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour" https://arxiv.org/abs/1706.02677

Linear Scaling Rule: When the minibatch size is multiplied by k, multiply the learning rate by k

When 4 gpus are used, it actually increases the batch size by 4x, so the
learning rate has to be changed from 1e-3 to 4e-3.

@qindazhu
Copy link
Contributor

Regarding shuffle, it seems DistributedSampler supports shuffle in its parameters, so you can just set it with True.

@fanlu
Copy link

fanlu commented Feb 20, 2020

yes, I have changed lr from 1e-3 to 4e-3 already, look at the last line of my result.
I have also changed the num_epochs from 6 to 18 which is kaldi's config. please wait a while

@fanlu
Copy link

fanlu commented Feb 20, 2020

DDP training epoch18 lr=4e-3 epoch6 lr=4e-3 +dropout(0.3)
test cer 7.88 7.72
test wer 16.53 16.51
dev cer 6.52 6.49
dev wer 14.47 14.46
global avg objf -0.039195 -0.053055
valid avg objf -0.056288 -0.057104

dropout was copied dropout

@csukuangfj
Copy link
Contributor Author

@fanlu
thanks.

So the result shows dropout can improve the cer/wer a little bit and
increasing the number of epochs from 6 to 18 does not really have
a big effect. I guess the final learning rate is really small because of the learning rate decay.

@csukuangfj
Copy link
Contributor Author

By the way, your way to insert a hyperlink using markdown is not correct.

@danpovey
Copy link
Contributor

danpovey commented Feb 20, 2020 via email

@fanlu
Copy link

fanlu commented Feb 20, 2020

add global objf and validation objf in above result

You could reduce the weight decay. Please always report the final train and valid objective function values.

On Thu, Feb 20, 2020 at 5:07 PM Fangjun Kuang @.***> wrote: @fanlu https://github.com/fanlu thanks. So the result shows dropout can improve the cer/wer a little bit and increasing the number of epochs from 6 to 18 does not really have a big effect. I guess the final learning rate is really small because of the learning rate decay. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#3940?email_source=notifications&email_token=AAZFLOYIO6HYXNL6USR6MKLRDZB55A5CNFSM4KXSLYMKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEMMIWAI#issuecomment-588811009>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAZFLOZCE5ZPZD3NQ5JOMXTRDZB55ANCNFSM4KXSLYMA .

@fanlu
Copy link

fanlu commented Feb 21, 2020

update result by reducing the weight decay

base lr=4e-3 test cer test wer dev cer dev wer global avg objf validation avg objf
weight_decay=5e-4 7.80 16.49 6.52 14.46 -0.045908 -0.057462
weight_decay=1e-4 7.89 16.63 6.58 14.54 -0.047904 -0.058500
weight_decay=5e-5 7.82 16.46 6.53 14.51 -0.045770 -0.056831

@danpovey
Copy link
Contributor

Show what happens to the norm of output.affine. It's odd there are not bigger differences in WER or objf.. I suspect either a bug, or it was too small.

@fanlu
Copy link

fanlu commented Feb 21, 2020

base lr=4e-3 parameters/module.output_affine.weight
weight_decay=5e-4 80.1
weight_decay=1e-4 75.78
weight_decay=5e-5 80.35

image

@danpovey
Copy link
Contributor

danpovey commented Feb 21, 2020 via email

@qindazhu
Copy link
Contributor

Here is the result with multiple machines:

WER/CER
dev_cer 6.75
dev_wer 14.82
test_cer 8.21
test_wer 16.94

It takes about 45 minutes for 6 epochs with 8 GPUs on different machines.

2020-02-21 21:14:23,462 INFO [ddp_train_copy.py:338] epoch 0, learning rate 0.007
2020-02-21 21:14:24,798 INFO [ddp_train_copy.py:178] Device (0) processing 0/1181(0.000000%) global average objf: -1.187857 over 6400.0 frames, current batch average objf: -1.187857 over 6400 frames, epoch 0

......

2020-02-21 22:02:40,117 INFO [ddp_train_copy.py:178] Device (0) processing 1100/1181(93.141406%) global average objf: -0.045180 over 6252288.0 frames, current batch average objf: -0.047351 over 3840 frames, epoch 5
2020-02-21 22:03:08,614 INFO [common.py:76] Save checkpoint to exp/chain_cleaned_pybind/tdnn1c_sp/best_model.pt: epoch=5, learning_rate=7.168000000000002e-05, objf=-0.04506898490268912

I queue jobs with Kaldi's queue.pl on SGE clusters, see this pr for details. Jobs communicate with each other by a shared file (so certainly your cluster should support share file system).

  INIT_FILE=$dir/ddp_init
  rm -f $INIT_FILE # delete old one before starting
  init_method=file://$(readlink -f $INIT_FILE)
  echo "$init_method"
  
  queue.pl -q v100.q --gpu $num_gpus JOB=1:$num_gpus $dir/logs/task.JOB.log \
      python3 ./chain/ddp_train_copy.py \	
      --train.init-method $init_method \
      ......
# use SGE_TASK_ID as rank
local_rank = int(os.environ['SGE_TASK_ID']) - 1  
dist.init_process_group('nccl',
                            init_method=args.init_method, 
                            rank=local_rank,
                            world_size=args.world_size)

Note learning_rate=7*1e-3 instead 8*1e-3 ( 8*1e-3 it will make the result a little worse)

@danpovey
Copy link
Contributor

danpovey commented Feb 21, 2020 via email

@fanlu
Copy link

fanlu commented Feb 22, 2020

update result by increasing the weight decay

base lr=4e-3 test cer test wer dev cer dev wer global avg objf validation avg objf parameters/module.output_affine.weight
weight_decay=1e-3 8.83 17.73 7.13 15.30 -0.060892 -0.075458 31.87
weight_decay=5e-3 10.08 19.32 8.06 16.53 -0.088142 -0.105510 18.94

@danpovey
Copy link
Contributor

Is that consistent with the plot you showed before? Looked like the output-affine.weight param's norm wasn't ever smaller than about 70 or 80. But I couldn't see the legend of the plot so I'm not sure.

@fanlu
Copy link

fanlu commented Feb 22, 2020

Maybe there are some problems when I change the code below from

opts = chain.ChainTrainingOptions()
opts.l2_regularize = args.l2_regularize

optimizer = optim.Adam(model.parameters(),
                           lr=learning_rate,
                           weight_decay=5e-4)

to

opts = chain.ChainTrainingOptions()
opts.l2_regularize = args.l2_regularize

optimizer = optim.Adam(model.parameters(),
                           lr=learning_rate,
                           weight_decay=args.l2_regularize)

I'll make the weight_decay different from opts.l2_regularize and rerun this exp.

image

@danpovey
Copy link
Contributor

danpovey commented Feb 22, 2020 via email

@fanlu
Copy link

fanlu commented Feb 26, 2020

@qindazhu I have converged problem about SGD optimizer with momentum=0.9, But it seems that it will not appear when this configuration is not used. Do you have the similar phenomenon?
This is the log below:

2020-02-26 10:22:10,397 INFO [ddp_train.py:344] epoch 0, learning rate 0.003999350705525672
2020-02-26 10:22:12,160 INFO [ddp_train.py:179] Device (1) processing 0/2364(0.000000%) global average objf: -1.277822 over 3840.0 frames, current batch average objf: -1.277822 over 3840 frames, epoch 0, scheduler lr is 0.0039987015164471736
WARNING ([5.5.791~1-be084]:DenominatorComputation():chain-denominator.cc:64) Nnet outputs -7.77028e+09, 3.26698e+09 outside the range [-30,30], derivs may be inaccurate.
WARNING ([5.5.791~1-be084]:DenominatorComputation():chain-denominator.cc:64) Nnet outputs -6.8477e+09, 2.8795e+09 outside the range [-30,30], derivs may be inaccurate.
2020-02-26 10:22:20,200 INFO [ddp_train.py:194] Validation average objf: -737218620.269293 over 17869.0 frames
2020-02-26 10:23:42,259 INFO [ddp_train.py:179] Device (1) processing 100/2364(4.230118%) global average objf: -7.088334 over 558336.0 frames, current batch average objf: -4.976176 over 6400 frames, epoch 0, scheduler lr is 0.003934311931812134
2020-02-26 10:25:00,463 INFO [ddp_train.py:179] Device (1) processing 200/2364(8.460237%) global average objf: -7.405582 over 1115520.0 frames, current batch average objf: -7.530059 over 6400 frames, epoch 0, scheduler lr is 0.003870959188409785
2020-02-26 10:26:20,953 INFO [ddp_train.py:179] Device (1) processing 300/2364(12.690355%) global average objf: -7.439822 over 1671168.0 frames, current batch average objf: -9.668958 over 6400 frames, epoch 0, scheduler lr is 0.003808626590376223
WARNING ([5.5.791~1-be084]:DenominatorComputation():chain-denominator.cc:64) Nnet outputs -46.543, 61.0312 outside the range [-30,30], derivs may be inaccurate.
WARNING ([5.5.791~1-be084]:DenominatorComputation():chain-denominator.cc:64) Nnet outputs -42.9122, 55.318 outside the range [-30,30], derivs may be inaccurate.
WARNING ([5.5.791~1-be084]:DenominatorComputation():chain-denominator.cc:64) Nnet outputs -51.7164, 74.0772 outside the range [-30,30], derivs may be inaccurate.
2020-02-26 10:27:41,098 INFO [ddp_train.py:179] Device (1) processing 400/2364(16.920474%) global average objf: -7.498366 over 2221696.0 frames, current batch average objf: -10.663645 over 3840 frames, epoch 0, scheduler lr is 0.0037472977106947553
WARNING ([5.5.791~1-be084]:DenominatorComputation():chain-denominator.cc:64) Nnet outputs -52.6615, 59.2798 outside the range [-30,30], derivs may be inaccurate.
2020-02-26 10:28:59,904 INFO [ddp_train.py:179] Device (1) processing 500/2364(21.150592%) global average objf: -7.657387 over 2778752.0 frames, current batch average objf: -8.657759 over 6400 frames, epoch 0, scheduler lr is 0.0036869563868667512
WARNING ([5.5.791~1-be084]:DenominatorComputation():chain-denominator.cc:64) Nnet outputs -47.5137, 49.1453 outside the range [-30,30], derivs may be inaccurate.
2020-02-26 10:30:17,675 INFO [ddp_train.py:179] Device (1) processing 600/2364(25.380711%) global average objf: -7.948807 over 3345280.0 frames, current batch average objf: -10.879383 over 4736 frames, epoch 0, scheduler lr is 0.0036275867166522097

And maybe we should focus on SGD now, as your result is nearly same with Kaldi's.

@danpovey
Copy link
Contributor

danpovey commented Feb 26, 2020 via email

@fanlu
Copy link

fanlu commented Feb 26, 2020

Thanks, Dan

@qindazhu
Copy link
Contributor

@fanlu, I suppose you are using a small L2, try a larger one, say, 1e-3~1e-4

@qindazhu
Copy link
Contributor

@csukuangfj, wondering when this pr is ready to merge? I'll make another pr based on yours.

@csukuangfj
Copy link
Contributor Author

@qindazhu @danpovey

The only issue about this pullrequest is that train.py and ddp_train.py
share some common code blocks.

I think it can be merged now and we can remove the duplicated code later.
I do not have time to remove them right now.

@qindazhu
Copy link
Contributor

ok, then we can merge it for now. @danpovey, can you help to do this?

@csukuangfj
Copy link
Contributor Author

I have the permission to merge.

@csukuangfj csukuangfj merged commit b63efc3 into kaldi-asr:pybind11 Feb 27, 2020
@csukuangfj
Copy link
Contributor Author

merged.

@fanlu
Copy link

fanlu commented Feb 29, 2020

Hi, @danpovey , I want to compare each layer's calculation in kaldi and pytorch. So I have loaded kaldi tdnn_1c model to pytorch model. we can see the parameter of the two model are same.
kaldi's model
image
load kaldi's model to pytorch's model
image
the kaldi's calculation was given by"nnet3-copy --nnet-config='echo output-node name=output input=$output_name |' $model -|", pytorch is used forward after model.eval().
And we can see that the lda layer of kaldi and pytorch give the same result on same feat data.
image
we can assume that the kaldi's tdnn1.affine and pytorch's tdnn1_affine layer have the same input. but after tdnn1_affine layer in pytorch, I have different result with kaldi's output.(the line 2~28 on right side of picture is pytorch's first frame padding, so we can ignore it)
image
I am not sure what's the problem. Any help will be appreciated. Thanks.

@danpovey
Copy link
Contributor

danpovey commented Mar 1, 2020 via email

@fanlu
Copy link

fanlu commented Mar 1, 2020

Is the code below use -1 0 1 position to calculate?

relu-batchnorm-dropout-layer name=tdnn1 $affine_opts dim=1024

kaldi maybe used this code to Propagate

void* AffineComponent::Propagate(const ComponentPrecomputedIndexes *indexes,
                                const CuMatrixBase<BaseFloat> &in,
                                 CuMatrixBase<BaseFloat> *out) const {

  // No need for asserts as they'll happen within the matrix operations.
  out->CopyRowsFromVec(bias_params_); // copies bias_params_ to each row
  // of *out.
  out->AddMatMat(1.0, in, kNoTrans, linear_params_, kTrans, 1.0);
  return NULL;
}

And I have used three method in pytorch. they give me the same result, but different with kaldi's

torch.nn.functional.conv1d(x.permute(0,2,1), self.tdnn1_affine.weight.view((1024, 120, -1)), bias=self.tdnn1_affine.bias).permute(0,2,1)[:,25:30,:]

(torch.matmul(x, self.tdnn1_affine.weight.permute(1,0))+self.tdnn1_affine.bias)[:,25:30,:]

torch.nn.functional.linear(x, self.tdnn1_affine.weight, self.tdnn1_affine.bias).cpu().detach().numpy()[:,25:30,:]

@csukuangfj
Copy link
Contributor Author

relu-batchnorm-dropout-layer name=tdnn1 $affine_opts dim=1024

The affine layer above in kaldi is equivalent to nn.Linear in PyTorch. There is no splicing here.

How did you load the kaldi's affine weight into PyTorch? Please show the code here.

@csukuangfj
Copy link
Contributor Author

You can refer to the code of loading lda.mat in PyTorch.

@fanlu
Copy link

fanlu commented Mar 1, 2020

kaldi-onnx

def load_kaldi_ckpt(filename, model):
    dst_state_dict = model.state_dict()
    pair = {"weight": "params", "bias": "bias", 
            "running_mean": "stats_mean", "running_var": "stats_var", "num_batches_tracked": "count"}     
    from parser import Nnet3Parser
    kaldi_dict = {}
    with open(filename, 'r') as f:
        p = Nnet3Parser(f)
        p.run()
        for component in p._components:
            kaldi_dict[component["name"]] = component
    for name in dst_state_dict.keys():
        kaldi_key = ""
        if "tdnn1" in name:
            kaldi_name = name.split(".")[0].replace("_", ".")
            pair_key = name.split(".")[-1]
            if pair_key in pair:
                kaldi_key = f"{kaldi_name}.{pair.get(pair_key)}"
        if "tdnnfs" in name:
            index = int(name.split(".")[1])
            l_a_b = name.split(".")[2]
            kaldi_name = f"tdnnf{index+2}.{l_a_b}"
            pair_key = name.split(".")[-1]
            if pair_key in pair:
                kaldi_key = f"{kaldi_name}.{pair.get(pair_key)}"
        if "prefinal" in name:
            kaldi_n = name.split(".")[0].replace("_", "-")
            pair_key = name.split(".")[-1]
            if "prefinal_l.conv" not in name:
                kaldi_name = f"{kaldi_n}.{name.split('.')[1]}"
            else:
                kaldi_name = kaldi_n
            if pair_key in pair:
                kaldi_key = f"{kaldi_name}.{pair.get(pair_key)}"
        if "output" in name:
            k_name, pair_key = name.split(".")
            kaldi_name = k_name.replace("_affine", ".affine").replace("_", "-")
            if pair_key in pair:
                kaldi_key = f"{kaldi_name}.{pair.get(pair_key)}"
        kaldi_value = kaldi_dict.get(kaldi_name, {}).get(pair.get(pair_key))
        # dst_state_dict[name] = kaldi_value
        if isinstance(kaldi_value, np.ndarray) and kaldi_value.size > 20:
            kaldi_key = f"{kaldi_key}, shape is {kaldi_value.shape}"
        if isinstance(kaldi_value, np.ndarray):
            if kaldi_value.ndim != dst_state_dict[name].ndim:
                kaldi_value = kaldi_value.reshape(dst_state_dict[name].shape)
            dst_state_dict[name] = torch.from_numpy(kaldi_value)
        else:
            dst_state_dict[name] = torch.tensor(kaldi_value)
        logging.info(f"name: {name}, shape: {dst_state_dict[name].shape}, kaldi_key: {kaldi_key}")
    model.load_state_dict(dst_state_dict)
    return kaldi_dict

@csukuangfj
Copy link
Contributor Author

How did you define model? What is the architecture of model?

@fanlu
Copy link

fanlu commented Mar 1, 2020

The kernel (2,2) version that I have pull request to your git repo.

@csukuangfj
Copy link
Contributor Author

torch.matmul(x, self.tdnn1_affine.weight.permute(1,0))+self.tdnn1_affine.bias)

torch.nn.functional.linear(x, self.tdnn1_affine.weight, self.tdnn1_affine.bias)

should produce the same output as kaldi's affine layer when

  • the weight, bias
  • and the input

are identical.


There must be something wrong somewhere.

@fanlu
Copy link

fanlu commented Mar 1, 2020

Yes. I think so.the lda layer give the same result that I have showed above.So we can assume that we have the same input of tdnn1 affine.And the parameter looks similar either.I have no idea about this

@csukuangfj
Copy link
Contributor Author

You can create some faked data to debug the program.

For example, create a 2x2 weight matrix and 2x1 bias. Calculate the result on paper,
compare it with the program, and you will find out the reason.

@fanlu
Copy link

fanlu commented Mar 1, 2020

The all data below was calculated by kaldi. I have no problem of pytorch's operation, but kaldi's. And let me show you what kaldi does.
this is the first line of result after lda layer

In [19]: a = "2.300342 -0.2682175 0.5862507 -0.06933385 0.434848 -0.149417 0.04790286 0.1030131 -0.01732294 -0.09611828 0.136682 0.1814303 -0.05899795 -0.201803
    ...: 8 0.1067935 -0.02466351 0.08162478 -0.02118591 0.124744 0.01374727 -0.06972296 -0.005960235 -0.03282106 0.04361587 0.002798802 0.05377146 -0.0349902 0.
    ...: 01767303 -0.0250284 0.01511859 -0.01311266 0.04825886 -0.0002267546 -0.09969836 -0.002055793 0.02966474 0.02634935 -0.02635059 0.03734357 0.02878513 -0
    ...: .01532372 -0.04639435 0.02087254 0.008594703 -0.003571067 0.01858955 -0.03278194 -0.01541399 0.005724381 -0.02329349 -0.0201427 -0.001844179 -0.0260285
    ...: 2 0.0314902 0.03338297 0.0105119 -0.004434433 0.06942096 -0.03313885 -0.05314681 -0.01254019 -0.0003705059 0.08503703 0.02103205 -0.0002556719 0.003088
    ...: 835 -0.004352001 -0.008468844 0.003002871 0.0007132483 -0.01116457 0.02239098 0.0113832 0.06207408 -0.05190803 0.007352485 0.023138 0.03291445 -0.04150
    ...: 248 0.002104727 -0.02316702 -0.005105506 -0.001692873 0.002865741 0.05086495 0.004914522 0.000904361 0.008471237 0.008629705 -0.01996578 0.05934002 0.0
    ...: 1623728 -0.007537241 0.03523977 -0.01370161 -0.009647677 -0.01505581 0.003700913 -0.02384711 0.01867253 -0.01007206 0.01386262 0.02045397 -0.03400788 0
    ...: .01091084 0.02664956 0.0385044 -0.05636362 0.03181829 0.05379674 0.03861693 -0.004055074 -0.02378738 0.02643323 -0.01879072 -0.03662547 0.01476509 -0.0
    ...: 2003892 -0.0251219 0.02554548"

this is the first line of tdnn_1c's component tdnn1.affine parameter

In [21]: b = "-0.02925109 -0.03204259 -0.05238809 0.0258671 -0.01542425 0.01202498 -0.09273408 0.03191283 -0.02427235 0.009102023 -0.08190203 0.01645355 0.01740
    ...: 446 -0.06069969 -0.01059853 0.00589438 -9.836891e-05 0.005326115 -0.01414909 -0.08893981 0.1006957 -0.08094332 -0.03947062 0.001208165 -0.08148591 -0.0
    ...: 3153317 -0.07677743 0.02790448 -0.01788927 -0.112223 -0.07447639 -0.07033557 0.0335387 0.0151288 -0.07006572 0.0713387 0.03158211 0.07567337 0.02985496
    ...:  -0.05704632 -0.01395622 -0.03394018 0.01646953 -0.04268685 -0.08250771 -0.02267475 -0.09654517 0.03479848 -0.2189472 -0.1256533 0.03554992 -0.0160882
    ...: -0.03978832 0.1058252 -0.02577725 0.09129423 0.0773345 0.06518164 0.08061377 -0.0003800528 -0.007160172 -0.008283933 0.0393545 0.02754511 0.02618444 0.
    ...: 03073195 -0.07522035 -0.00805568 0.05134711 0.01226415 -0.01922073 0.02499216 0.01480277 0.005939888 0.02740019 -0.007221481 -0.008180852 0.03079132 0.
    ...: 02221892 -0.09397198 0.0006076556 0.02956847 -0.0363856 0.03460713 0.01634974 0.01089886 0.0822057 -0.008049296 0.009631585 0.0568191 -0.02234845 -0.01
    ...: 608458 -0.04806978 -0.05295536 0.02976446 0.02290896 0.005668058 0.00426801 0.007513668 -0.03325405 -0.0145233 -0.007884383 -0.04083884 0.002169831 0.0
    ...: 06858787 0.00686969 -0.008723632 -0.03914807 -0.01512483 0.002803016 0.01384272 0.02919035 -0.01326019 -0.001008323 -0.02860635 0.02909619 0.05683269 0
    ...: .009913826 0.008573316 -0.01170192"

the first number of bias is 0.04767551
so the first col of first row of tdnn1.affine value must be

In [24]: sum([float(c)*float(d) for c,d in zip(a.split(), b.split())])+0.04767551
Out[24]: -0.05162175507930113

but the same position of kaldi's tdnn1.affine result is -0.08376444
So I suspect that there must be some other operation than only affine.

@danpovey
Copy link
Contributor

danpovey commented Mar 1, 2020 via email

@fanlu
Copy link

fanlu commented Mar 1, 2020

So I must debug kaldi to find some hint. let me try it.

@danpovey
Copy link
Contributor

danpovey commented Mar 1, 2020 via email

@fanlu
Copy link

fanlu commented Mar 1, 2020

Hi,Dan. Is there a convenient way to print one row data in kaldi c++ code?

@danpovey
Copy link
Contributor

danpovey commented Mar 1, 2020 via email

@fanlu
Copy link

fanlu commented Mar 1, 2020

OK. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants