Skip to content

All You Need to Know About GGUF

MasakiMu319

WARNING

The GGUF metadata content listed here is up to 2024-11-06. If some models add new metadata fields later, follow model publisher documentation first, or check: gguf/constants.py.

GGUF is a binary file format supported by Hugging Face Hub features, designed to optimize fast model loading and saving, with better inference efficiency in practice.

GGUF is built for GGML and GGML-based runtimes. Models developed in frameworks like PyTorch can be converted into GGUF and used by those engines.

GGUF evolved from GGJT with changes for better extensibility and usability. Target capabilities include:

GGUF Naming Rules

GGUF naming convention:

<BaseName><SizeLabel><FineTune><Version><Encoding><Type><Shard>.gguf

Each component, if present, is separated by -, so key details are visible directly from filename. In practice, historical naming diversity means not all filenames are fully parseable.

Components:

  1. BaseName: descriptive base model family/architecture name; can be derived from general.basename in GGUF metadata.

  2. SizeLabel: model size label, usually expressed as <expertCount>x<count><scale-prefix> for parameter category (useful for leaderboards).

    It can be derived from general.size_label, or computed if missing.

    Single-letter scale prefixes include:

    • Q: quadrillion parameters.
    • T: trillion parameters.
    • B: billion parameters.
    • M: million parameters.
    • K: thousand parameters.
  3. FineTune: descriptive fine-tuning purpose label, such as Chat, Instruct; can be derived from general.finetune.

  4. Version (optional): version in form v<Major>.<Minor>. If version is absent, v1.0 is often assumed as first public release. Can be derived from general.version.

  5. Encoding: weight encoding / quantization representation used by the model.

  6. Type: GGUF file type and intended use. If missing, it is treated as a typical tensor model file.

    • LoRA: GGUF file is a LoRA adapter.
    • vocab: GGUF file contains only vocabulary data and metadata.
  7. Shard (optional): indicates the model is split into shards, format <ShardNum>-of-<ShardTotal>.

    • ShardNum: shard index in the model; must be zero-padded to 5 digits. Shard numbering always starts from 00001 (not 00000).
    • ShardTotal: total shard count; also zero-padded to 5 digits.

A model filename should at least contain BaseName, SizeLabel, and Version for easy GGUF naming validation.

You can use the following regex to validate core ordering and extraction:

^(?<BaseName>[A-Za-z0-9\s]*(?:(?:-(?:(?:[A-Za-z\s][A-Za-z0-9\s]*)|(?:[0-9\s]*)))*))-(?:(?<SizeLabel>(?:\d+x)?(?:\d+\.)?\d+[A-Za-z](?:-[A-Za-z]+(\d+\.)?\d+[A-Za-z]+)?)(?:-(?<FineTune>[A-Za-z0-9\s-]+))?)?-(?:(?<Version>v\d+(?:\.\d+)*))(?:-(?<Encoding>(?!LoRA|vocab)[\w_]+))?(?:-(?<Type>LoRA|vocab))?(?:-(?<Shard>\d{5}-of-\d{5}))?\.gguf$

Examples

GGUF File Structure

GGUF (GGUF v3) file structure is shown below. It uses global alignment specified by general.alignment (though many models do not explicitly set it).

image-20241106114401225

From this structure you can see a GGUF file contains format/version info, tensor count, metadata length, concrete metadata, and tensor descriptors.

Metadata (Standardized Key-Value Pairs)

The key-value pairs below are standardized. The list may grow as more use cases emerge. Naming tries to align with original model definitions where possible, making mapping easier.

Not all keys are mandatory, but all are recommended. Required keys are marked in bold. If a key is omitted, readers should treat value as unknown and fall back to default/error handling as needed.

Communities can define custom namespaced keys for extra data. To avoid conflicts, use a community prefix. For example, rustformers.* for Rustformers-specific fields.

By convention, unless otherwise specified, most count/length-like values are uint64 for future large-model support. Some models may use uint32; readers should support both.

General

Required

General metadata

Source metadata

Metadata about model origin/provenance. Useful for tracing source models and upstream references when conversion/modification is involved (for example GGML-to-GGUF conversion lineage).

Usually not the primary focus in day-to-day usage.

LLM

[llm] should be replaced by architecture name. For example, llama for LLaMA, bert for BERT, etc. If architecture sections require these keys, they should be provided (not every key applies to every architecture).

Attention

Remaining key-value fields are less frequently used in most practical scenarios, so they are not expanded one by one here.

GGUF Conversion

  1. Hugging Face Space: https://huggingface.co/spaces/ggml-org/gguf-my-repo
Previous
Is careful thinking the most effective method for enhancing LLMs?
Next
Reader-LM: Small Language Models for Cleaning and Converting HTML to Markdown