Editing Openai/691a41cd-2efc-800c-9eff-de439224a90d (section)

===== File family: loretoken_gpu_compressor, lt_decode*.cu, lt_pack.py, etc. =====

This is where the actual compression strategies live. From the docs and report, it supports:
* Zero-run encoding - Extremely effective on ReLU activations that are mostly zero.
* Sparse matrix compression - Store only non-zero values + indices for pruned or sparse layers.
* Value dictionary compression - If a tensor uses a small set of values (quantized weights), store the dictionary + small indices.
* Batch delta compression - For similar inputs in a batch, store a base sample and small deltas instead of each full tensor.
* LoreToken semantic patterns - This is the special sauce: - Certain known structures / layouts (e.g., standard transformer blocks, repeated KV-cache patterns) can be represented using LoreToken-style codes rather than raw floats. - That turns big, regular patterns into tiny semantic markers that decode back into tensors when needed.

The compressor tools:
* Feed synthetic and real tensors through each strategy.
* Measure: - Compression ratio - Encode/decode time - Overall throughput impact
* Write everything to logs and JSON so you can see which strategy wins for each tensor type.