Skip to content

Utils API

llm_feature_gen.utils

utils

Reusable utility helpers used by discovery and generation pipelines.

Modules:

Functions:

  • downsample_batch

    Takes a large list of base64 images (e.g. from multiple videos) and

  • extract_audio_track

    Extracts the audio track from a video file and saves it as a temporary WAV file.

  • extract_key_frames

    Selects diverse keyframes from a video using K-Means clustering.

  • extract_text_from_file

    Extracts text from a file and returns a list of text chunks (strings).

downsample_batch(b64_list: List[str], target_count: int = 15) -> List[str]

Takes a large list of base64 images (e.g. from multiple videos) and selects the most diverse set using K-Means clustering.

extract_audio_track(file_path: str) -> Optional[str]

Extracts the audio track from a video file and saves it as a temporary WAV file. Uses FFmpeg to convert the stream to mono, 16kHz PCM (standard for Whisper/STT).

Returns:

  • Optional[str]

    The path to the generated temporary WAV file, or None if extraction fails.

extract_key_frames(video_path: str, frame_limit: int = 10, sharpness_threshold: float = 40.0, max_resolution: int = 1024) -> List[str]

Selects diverse keyframes from a video using K-Means clustering. Instead of simple uniform sampling, it groups visually similar scenes and picks the sharpest image from each group to maximize information density.

Parameters:

  • video_path

    (str) –

    Path to the video file.

  • frame_limit

    (int, default: 10 ) –

    Maximum number of frames to extract (target K for clustering).

  • sharpness_threshold

    (float, default: 40.0 ) –

    Variance of Laplacian threshold to ignore blurry frames.

  • max_resolution

    (int, default: 1024 ) –

    Max dimension (width/height) for resizing to control payload size.

Returns:

  • List[str]

    List of base64-encoded image strings.

extract_text_from_file(path: Path) -> List[str]

Extracts text from a file and returns a list of text chunks (strings).

llm_feature_gen.utils.image

image

llm_feature_gen.utils.text

text

Functions:

extract_text_from_file(path: Path) -> List[str]

Extracts text from a file and returns a list of text chunks (strings).

llm_feature_gen.utils.video

video

Functions:

  • downsample_batch

    Takes a large list of base64 images (e.g. from multiple videos) and

  • extract_audio_track

    Extracts the audio track from a video file and saves it as a temporary WAV file.

  • extract_key_frames

    Selects diverse keyframes from a video using K-Means clustering.

downsample_batch(b64_list: List[str], target_count: int = 15) -> List[str]

Takes a large list of base64 images (e.g. from multiple videos) and selects the most diverse set using K-Means clustering.

extract_audio_track(file_path: str) -> Optional[str]

Extracts the audio track from a video file and saves it as a temporary WAV file. Uses FFmpeg to convert the stream to mono, 16kHz PCM (standard for Whisper/STT).

Returns:

  • Optional[str]

    The path to the generated temporary WAV file, or None if extraction fails.

extract_key_frames(video_path: str, frame_limit: int = 10, sharpness_threshold: float = 40.0, max_resolution: int = 1024) -> List[str]

Selects diverse keyframes from a video using K-Means clustering. Instead of simple uniform sampling, it groups visually similar scenes and picks the sharpest image from each group to maximize information density.

Parameters:

  • video_path

    (str) –

    Path to the video file.

  • frame_limit

    (int, default: 10 ) –

    Maximum number of frames to extract (target K for clustering).

  • sharpness_threshold

    (float, default: 40.0 ) –

    Variance of Laplacian threshold to ignore blurry frames.

  • max_resolution

    (int, default: 1024 ) –

    Max dimension (width/height) for resizing to control payload size.

Returns:

  • List[str]

    List of base64-encoded image strings.