oumi.quantize#

Quantization module for Oumi.

This module provides comprehensive model quantization capabilities including AWQ, BitsAndBytes, and GGUF quantization methods.

class oumi.quantize.AwqQuantization[source]#

Bases: BaseQuantization

AWQ (Activation-aware Weight Quantization) implementation.

This class handles AWQ quantization with support for simulation mode when AWQ libraries are not available.

quantize(config: QuantizationConfig) QuantizationResult[source]#

Main quantization method for AWQ.

Parameters:

config – Quantization configuration

Returns:

Dictionary containing quantization results

raise_if_requirements_not_met()[source]#

Check if AWQ dependencies are available.

supported_formats: list[str] = ['safetensors']#
supported_methods: list[str] = ['awq_q4_0', 'awq_q4_1', 'awq_q8_0', 'awq_f16']#
class oumi.quantize.BaseQuantization[source]#

Bases: ABC

Abstract base class for all quantization methods.

This class defines the common interface that all quantization implementations must follow, ensuring consistency across different quantization approaches.

get_supported_formats() list[str][source]#

Return list of output formats supported by this quantizer.

Returns:

List of format names (e.g., [“gguf”, “pytorch”])

get_supported_methods() list[str][source]#

Return list of quantization methods supported by this quantizer.

Returns:

List of method names (e.g., [“awq_q4_0”, “awq_q8_0”])

abstractmethod quantize(config: QuantizationConfig) QuantizationResult[source]#

Main quantization method - must be implemented by subclasses.

Parameters:

config – Quantization configuration containing model parameters, method, output path, and other settings.

Returns:

  • quantized_size_bytes: Size of the quantized model in bytes

  • output_path: Path to the quantized model

  • quantization_method: Quantization method used

  • format_type: Format type of the quantized model

  • additional_info: Additional method-specific information

Return type:

QuantizationResult containing

Raises:
  • RuntimeError – If quantization fails for any reason

  • ValueError – If configuration is invalid for this quantizer

abstractmethod raise_if_requirements_not_met() None[source]#

Raise an error if the requirements are not met.

supported_formats: list[str] = []#
supported_methods: list[str] = []#
supports_format(format_name: str) bool[source]#

Check if this quantizer supports the given output format.

Parameters:

format_name – Output format name to check

Returns:

True if format is supported, False otherwise

supports_method(method: str) bool[source]#

Check if this quantizer supports the given method.

Parameters:

method – Quantization method name to check

Returns:

True if method is supported, False otherwise

validate_config(config: QuantizationConfig) None[source]#

Validate configuration for this quantizer.

Parameters:

config – Quantization configuration to validate

Raises:

ValueError – If configuration is invalid for this quantizer

validate_requirements() bool[source]#

Check if all required dependencies are available.

Returns:

True if all dependencies are available and quantization can proceed, False otherwise.

class oumi.quantize.BitsAndBytesQuantization[source]#

Bases: BaseQuantization

BitsAndBytes quantization implementation.

This class handles quantization using the BitsAndBytes library, supporting both 4-bit and 8-bit quantization methods.

quantize(config: QuantizationConfig) QuantizationResult[source]#

Main quantization method for BitsAndBytes.

Parameters:

config – Quantization configuration

Returns:

QuantizationResult containing quantization results

raise_if_requirements_not_met() None[source]#

Check if BitsAndBytes dependencies are available.

Raises:

RuntimeError – If BitsAndBytes dependencies are not available.

supported_formats: list[str] = ['safetensors']#
supported_methods: list[str] = ['bnb_4bit', 'bnb_8bit']#
class oumi.quantize.QuantizationResult(quantized_size_bytes: int, output_path: str, quantization_method: str, format_type: str, additional_info: dict[str, ~typing.Any] = <factory>)[source]#

Bases: object

Result of quantization.

additional_info: dict[str, Any]#

Additional information about the quantization process.

format_type: str#

Format type of the quantized model.

output_path: str#

Path to the quantized model.

quantization_method: str#

Quantization method used.

quantized_size_bytes: int#

Size of the quantized model in bytes.

oumi.quantize.quantize(config: QuantizationConfig) QuantizationResult[source]#

Main quantization function that routes to appropriate quantizer.

Parameters:

config – Quantization configuration containing method, model parameters, and other settings.

Returns:

QuantizationResult containing quantization results including file sizes and compression ratios.

Raises:
  • ValueError – If quantization method is not supported

  • RuntimeError – If quantization fails