Utils API
llm_feature_gen.utils
utils
Reusable utility helpers used by discovery and generation pipelines.
Modules:
Functions:
-
downsample_batch–Takes a large list of base64 images (e.g. from multiple videos) and
-
extract_audio_track–Extracts the audio track from a video file and saves it as a temporary WAV file.
-
extract_key_frames–Selects diverse keyframes from a video using K-Means clustering.
-
extract_text_from_file–Extracts text from a file and returns a list of text chunks (strings).
downsample_batch(b64_list: List[str], target_count: int = 15) -> List[str]
Takes a large list of base64 images (e.g. from multiple videos) and selects the most diverse set using K-Means clustering.
extract_audio_track(file_path: str) -> Optional[str]
Extracts the audio track from a video file and saves it as a temporary WAV file. Uses FFmpeg to convert the stream to mono, 16kHz PCM (standard for Whisper/STT).
Returns:
-
Optional[str]–The path to the generated temporary WAV file, or None if extraction fails.
extract_key_frames(video_path: str, frame_limit: int = 10, sharpness_threshold: float = 40.0, max_resolution: int = 1024) -> List[str]
Selects diverse keyframes from a video using K-Means clustering. Instead of simple uniform sampling, it groups visually similar scenes and picks the sharpest image from each group to maximize information density.
Parameters:
-
(video_pathstr) –Path to the video file.
-
(frame_limitint, default:10) –Maximum number of frames to extract (target K for clustering).
-
(sharpness_thresholdfloat, default:40.0) –Variance of Laplacian threshold to ignore blurry frames.
-
(max_resolutionint, default:1024) –Max dimension (width/height) for resizing to control payload size.
Returns:
-
List[str]–List of base64-encoded image strings.
extract_text_from_file(path: Path) -> List[str]
Extracts text from a file and returns a list of text chunks (strings).
llm_feature_gen.utils.image
image
llm_feature_gen.utils.text
text
Functions:
-
extract_text_from_file–Extracts text from a file and returns a list of text chunks (strings).
extract_text_from_file(path: Path) -> List[str]
Extracts text from a file and returns a list of text chunks (strings).
llm_feature_gen.utils.video
video
Functions:
-
downsample_batch–Takes a large list of base64 images (e.g. from multiple videos) and
-
extract_audio_track–Extracts the audio track from a video file and saves it as a temporary WAV file.
-
extract_key_frames–Selects diverse keyframes from a video using K-Means clustering.
downsample_batch(b64_list: List[str], target_count: int = 15) -> List[str]
Takes a large list of base64 images (e.g. from multiple videos) and selects the most diverse set using K-Means clustering.
extract_audio_track(file_path: str) -> Optional[str]
Extracts the audio track from a video file and saves it as a temporary WAV file. Uses FFmpeg to convert the stream to mono, 16kHz PCM (standard for Whisper/STT).
Returns:
-
Optional[str]–The path to the generated temporary WAV file, or None if extraction fails.
extract_key_frames(video_path: str, frame_limit: int = 10, sharpness_threshold: float = 40.0, max_resolution: int = 1024) -> List[str]
Selects diverse keyframes from a video using K-Means clustering. Instead of simple uniform sampling, it groups visually similar scenes and picks the sharpest image from each group to maximize information density.
Parameters:
-
(video_pathstr) –Path to the video file.
-
(frame_limitint, default:10) –Maximum number of frames to extract (target K for clustering).
-
(sharpness_thresholdfloat, default:40.0) –Variance of Laplacian threshold to ignore blurry frames.
-
(max_resolutionint, default:1024) –Max dimension (width/height) for resizing to control payload size.
Returns:
-
List[str]–List of base64-encoded image strings.