Skip to content

A working software environment for lag-llama #29

@hohe12ly

Description

@hohe12ly

I reported in another issue that the most recent pytorch-lightning does not work with lag-llama. I also tried a few version combinations among pytorch, pytorch-lightning, and gluonts. Eventually I could get the code run for 385 epochs with the following requirements.txt:

orjson
torch==2.0.0
gluonts==0.13.5
pytorch-lightning==1.9.5
datasets
xformers
git+https://github.com/kashif/hopfield-layers@pytorch-2
etsformer-pytorch
reformer_pytorch
einops
opt_einsum
pykeops
scipy
apex
git+https://github.com/microsoft/torchscale

But the run still failed due to a divide-by-zero error in gluonts. Before I try more, I thought it'd be more efficient to ask the question here: could you share a working requirements.txt with version number specified?

BTW, the error I got with my requirements.txt is:

Epoch 385: : 110it [00:23,  4.66it/s, loss=-0.64, v_num=0, val_loss=-.690, train_loss=-1.10]Epoch 385, global step 38600: 'val_loss' was not in top 1

Epoch 385: : 110it [00:23,  4.65it/s, loss=-0.64, v_num=0, val_loss=-.690, train_loss=-1.10]
Use checkpoint: /home/lagllama_test/test/pytorch-transformer-ts/lag-llama/model-size-scaling-logs/0/experiments/lightning_logs/version_0/checkpoints/epoch=335-step=33600.ckpt
Predict on m4_weekly
m4_weekly prediction length: 13

Running evaluation:   0%|          | 0/359 [00:00<?, ?it/s]
Running evaluation: 100%|██████████| 359/359 [00:00<00:00, 81024.28it/s]
logger.log_dir :  /home/lagllama_test/test/pytorch-transformer-ts/lag-llama/model-size-scaling-logs/0/experiments/lightning_logs/version_0
os.path.exists(logger.log_dir) :  True
Predict on traffic
traffic prediction length: 24

Running evaluation:   0%|          | 0/6034 [00:00<?, ?it/s]
Running evaluation: 100%|██████████| 6034/6034 [00:00<00:00, 1090128.80it/s]
/home/lagllama_test/conda/envs/lagllama/lib/python3.10/site-packages/gluonts/evaluation/_base.py:422: RuntimeWarning: divide by zero encountered in scalar divide
  metrics["ND"] = cast(float, metrics["abs_error"]) / cast(
/home/lagllama_test/conda/envs/lagllama/lib/python3.10/site-packages/gluonts/evaluation/_base.py:422: RuntimeWarning: divide by zero encountered in scalar divide
  metrics["ND"] = cast(float, metrics["abs_error"]) / cast(
/home/lagllama_test/conda/envs/lagllama/lib/python3.10/site-packages/gluonts/evaluation/_base.py:422: RuntimeWarning: divide by zero encountered in scalar divide
  metrics["ND"] = cast(float, metrics["abs_error"]) / cast(
/home/lagllama_test/conda/envs/lagllama/lib/python3.10/site-packages/gluonts/evaluation/_base.py:422: RuntimeWarning: divide by zero encountered in scalar divide
  metrics["ND"] = cast(float, metrics["abs_error"]) / cast(
/home/lagllama_test/conda/envs/lagllama/lib/python3.10/site-packages/pandas/core/dtypes/astype.py:134: UserWarning: Warning: converting a masked element to nan.
  return arr.astype(dtype, copy=True)

Thanks a lot.

Yan

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions