-
Notifications
You must be signed in to change notification settings - Fork 30
Description
Dear authors,
Thank you very much for your outstanding work and for making the code and resources publicly available.
I am currently working on evaluating the energy consumption of a Spiking Transformer-based model.
I would greatly appreciate it if you could help me verify the correctness of my current approach.
-
Pairing of firing rate and FLOPs.
For the MS_SPS module, I assigned the firing rates of the five convolutional layers as follows:
1.0, MS_SPS_lif, MS_SPS_lif1, MS_SPS_lif2, and MS_SPS_lif3.In the self-attention component of MS_SSA, I calculated the firing rate for Q, K, and V as the average:
(MS_SSA_ConvN_q_lif + MS_SSA_ConvN_k_lif + MS_SSA_ConvN_v_lif) / 3,
and the rate for the attention function f(Q, K, V) as:
(MS_SSA_ConvN_q_lif + MS_SSA_ConvN_k_lif).For the linear projection following the attention, I used:
MS_SSA_ConvN_x_after_qkv.Regarding the MLP module, I used:
MS_MLP_ConvN_fc1_lif and MS_MLP_ConvN_fc2_lif
for the first and second layers, respectively.Could you kindly let me know if there are any mistakes or incorrect assumptions in this methodology?
-
Firing rate tendencies.
While analyzing the output offiring_num.py, I noticed something unexpected:
In my case, the firing rate of Q was more than 10 times higher than that of K and V.
However, the paper suggests that V usually has the highest firing rate.
This suggests that my current observations show the opposite trend.I used the command:
CUDA_VISIBLE_DEVICES=0 python -m torch.distributed.launch --nproc_per_node=1 --master_port 29501 firing_num.py -c conf/imagenet/8_512_300E_t4.yml --model sdt --spike-mode lif --resume 8_512.pth.tar --no-resume-optWould you happen to have any insights or advice regarding this discrepancy?
If you have time to review it, I would be sincerely grateful for your guidance.
Thank you again for your valuable work and your time.