Generate API
Autogenerated reference for the generation module.
generate
Feature value generation helpers built on top of discovered schemas.
The functions in this module take discovery artifacts produced by
llm_feature_gen.discover and apply them to class-organized folders of raw
inputs, producing CSV files that are ready for downstream analysis.
Functions:
-
assign_feature_values_from_folder–Generate feature values for every supported file in one class folder.
-
generate_features–Run the full feature-generation pipeline for a class-organized dataset.
-
generate_features_from_images–Generate features using
outputs/discovered_image_features.jsonby default. -
generate_features_from_tabular–Generate features using
outputs/discovered_tabular_features.jsonby default. -
generate_features_from_texts–Generate features using
outputs/discovered_text_features.jsonby default. -
generate_features_from_videos–Generate features using
outputs/discovered_video_features.jsonby default. -
load_discovered_features–Load and normalize a discovered-features JSON artifact.
-
parse_json_from_markdown–Parse JSON content that may be wrapped in a fenced Markdown block.
assign_feature_values_from_folder(folder_path: Union[str, Path], class_name: str, discovered_features: Dict[str, Any], provider: Optional[OpenAIProvider] = None, output_dir: Union[str, Path] = 'outputs', use_audio: bool = True, text_column: Optional[str] = None, label_column: Optional[str] = None) -> Path
Generate feature values for every supported file in one class folder.
Parameters:
-
(folder_pathUnion[str, Path]) –Root dataset folder containing one subdirectory per class.
-
(class_namestr) –Name of the class subdirectory to process.
-
(discovered_featuresDict[str, Any]) –Discovery payload containing the feature schema.
-
(providerOptional[OpenAIProvider], default:None) –Provider instance used to generate values. Defaults to OpenAIProvider.
-
(output_dirUnion[str, Path], default:'outputs') –Directory where the per-class CSV should be written.
-
(use_audiobool, default:True) –Whether video files should include audio transcription context when supported by the provider.
-
(text_columnOptional[str], default:None) –Required when processing tabular files. Identifies the column sent to the LLM.
-
(label_columnOptional[str], default:None) –Optional tabular column whose values override the class label for row-level outputs.
Returns:
-
Path–The path to the generated per-class CSV file.
Raises:
-
FileNotFoundError–If the requested class folder does not exist.
-
ValueError–If tabular generation is attempted without
text_column.
generate_features(root_folder: Union[str, Path], discovered_features_path: Union[str, Path], output_dir: Union[str, Path] = 'outputs', classes: Optional[List[str]] = None, provider: Optional[OpenAIProvider] = None, merge_to_single_csv: bool = False, merged_csv_name: str = 'all_feature_values.csv', use_audio: bool = True, text_column: Optional[str] = None, label_column: Optional[str] = None) -> Dict[str, str]
Run the full feature-generation pipeline for a class-organized dataset.
Parameters:
-
(root_folderUnion[str, Path]) –Dataset root containing one subfolder per class.
-
(discovered_features_pathUnion[str, Path]) –Path to a JSON artifact produced by one of the discovery helpers.
-
(output_dirUnion[str, Path], default:'outputs') –Directory where CSV outputs should be written.
-
(classesOptional[List[str]], default:None) –Optional subset of class-folder names to process. When omitted, all immediate subdirectories are used.
-
(providerOptional[OpenAIProvider], default:None) –Provider instance used to generate values.
-
(merge_to_single_csvbool, default:False) –Whether to concatenate per-class CSVs into one additional file.
-
(merged_csv_namestr, default:'all_feature_values.csv') –Filename to use for the merged CSV artifact.
-
(use_audiobool, default:True) –Whether video generation should include transcript context.
-
(text_columnOptional[str], default:None) –Required for tabular generation.
-
(label_columnOptional[str], default:None) –Optional row-level label override for tabular generation.
Returns:
-
Dict[str, str]–A mapping from class name to generated CSV path. When
-
Dict[str, str]–merge_to_single_csvis enabled, the merged output is returned under -
Dict[str, str]–the
"__merged__"key.
generate_features_from_images(*args, **kwargs) -> Dict[str, str]
Generate features using outputs/discovered_image_features.json by default.
generate_features_from_tabular(*args, **kwargs) -> Dict[str, str]
Generate features using outputs/discovered_tabular_features.json by default.
generate_features_from_texts(*args, **kwargs) -> Dict[str, str]
Generate features using outputs/discovered_text_features.json by default.
generate_features_from_videos(*args, **kwargs) -> Dict[str, str]
Generate features using outputs/discovered_video_features.json by default.
load_discovered_features(path: Union[str, Path]) -> Dict[str, Any]
Load and normalize a discovered-features JSON artifact.
The helper accepts both the list-oriented structure written by the
discovery functions and a direct dictionary payload. It normalizes both into
a dictionary with a proposed_features key.
parse_json_from_markdown(text: str) -> Dict[str, Any]
Parse JSON content that may be wrapped in a fenced Markdown block.