Skip to content
Closed
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions convert_hf_to_gguf.py
Original file line number Diff line number Diff line change
Expand Up @@ -381,6 +381,12 @@ def prepare_metadata(self, vocab_only: bool):
# output in the same directory as the model by default
self.fname_out = self.dir_model / f"{fname_default}.gguf"

# Upon missing model uuid, generate uuid based on tensor content
if not vocab_only and self.metadata.uuid is None:
self.metadata.uuid = self.gguf_writer.generate_tensors_uuid()
max_name_len = max(len(s) for _, s in self.tensor_map.mapping.values()) + len(".weight,")
logger.info(f"{f'%-{max_name_len}s' % f'generating general.uuid'} {self.metadata.uuid}")

self.set_type()

logger.info("Set meta model")
Expand Down
15 changes: 15 additions & 0 deletions gguf-py/gguf/gguf_writer.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,8 @@

import logging
import os
import uuid
import hashlib
import shutil
import struct
import tempfile
Expand Down Expand Up @@ -417,6 +419,19 @@ def write_tensor_data(self, tensor: np.ndarray[Any, Any]) -> None:

self.state = WriterState.WEIGHTS

def generate_tensors_uuid(self) -> str:
uuidv5_sha1 = hashlib.sha1()
uuidv5_sha1.update(uuid.UUID('ef001206-dadc-5f6d-a15f-3359e577d4e5').bytes)

for tensors in self.tensors:
# relying on the fact that Python dicts preserve insertion order (since 3.7)
for name, ti in tensors.items():
assert ti.tensor is not None
assert ti.tensor.nbytes == ti.nbytes
uuidv5_sha1.update(ti.tensor.tobytes('C'))
Copy link
Collaborator

@compilade compilade Jul 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While writing #8645 (comment), I've realized that this specific line is materializing the lazy tensors by reading their data, which would cause a very noticeable memory regression (making it no better than --no-lazy, which is not good RAM-usage-wise), at least when there is no UUID specified (which means, by default). This is because the data read is not immediately written (unlike in GGUFWriter.write_tensors_to_file), so this puts all the tensors in memory before even writing the metadata.

This will be more visible with models with BF16 weights and/or MoE models, because their original tensors are not used as-is (type conversion and/or expert stacking) and so the output tensor list is never mmap-ed.

If you can prove there is no memory regression, I'd be happy to dismiss this review.

(otherwise, be quick with Ctrl+C (at least on Linux) to interrupt the conversion with SIGINT, to avoid OOM)


return str(uuid.UUID(bytes=uuidv5_sha1.digest()[:16], version=5))

def write_tensors_to_file(self, *, progress: bool = False) -> None:
self.write_ti_data_to_file()

Expand Down