Skip to content

Generate API

Autogenerated reference for the generation module.

generate

Feature value generation helpers built on top of discovered schemas.

The functions in this module take discovery artifacts produced by llm_feature_gen.discover and apply them to class-organized folders of raw inputs, producing CSV files that are ready for downstream analysis.

Functions:

assign_feature_values_from_folder(folder_path: Union[str, Path], class_name: str, discovered_features: Dict[str, Any], provider: Optional[OpenAIProvider] = None, output_dir: Union[str, Path] = 'outputs', use_audio: bool = True, text_column: Optional[str] = None, label_column: Optional[str] = None) -> Path

Generate feature values for every supported file in one class folder.

Parameters:

  • folder_path

    (Union[str, Path]) –

    Root dataset folder containing one subdirectory per class.

  • class_name

    (str) –

    Name of the class subdirectory to process.

  • discovered_features

    (Dict[str, Any]) –

    Discovery payload containing the feature schema.

  • provider

    (Optional[OpenAIProvider], default: None ) –

    Provider instance used to generate values. Defaults to OpenAIProvider.

  • output_dir

    (Union[str, Path], default: 'outputs' ) –

    Directory where the per-class CSV should be written.

  • use_audio

    (bool, default: True ) –

    Whether video files should include audio transcription context when supported by the provider.

  • text_column

    (Optional[str], default: None ) –

    Required when processing tabular files. Identifies the column sent to the LLM.

  • label_column

    (Optional[str], default: None ) –

    Optional tabular column whose values override the class label for row-level outputs.

Returns:

  • Path

    The path to the generated per-class CSV file.

Raises:

  • FileNotFoundError

    If the requested class folder does not exist.

  • ValueError

    If tabular generation is attempted without text_column.

generate_features(root_folder: Union[str, Path], discovered_features_path: Union[str, Path], output_dir: Union[str, Path] = 'outputs', classes: Optional[List[str]] = None, provider: Optional[OpenAIProvider] = None, merge_to_single_csv: bool = False, merged_csv_name: str = 'all_feature_values.csv', use_audio: bool = True, text_column: Optional[str] = None, label_column: Optional[str] = None) -> Dict[str, str]

Run the full feature-generation pipeline for a class-organized dataset.

Parameters:

  • root_folder

    (Union[str, Path]) –

    Dataset root containing one subfolder per class.

  • discovered_features_path

    (Union[str, Path]) –

    Path to a JSON artifact produced by one of the discovery helpers.

  • output_dir

    (Union[str, Path], default: 'outputs' ) –

    Directory where CSV outputs should be written.

  • classes

    (Optional[List[str]], default: None ) –

    Optional subset of class-folder names to process. When omitted, all immediate subdirectories are used.

  • provider

    (Optional[OpenAIProvider], default: None ) –

    Provider instance used to generate values.

  • merge_to_single_csv

    (bool, default: False ) –

    Whether to concatenate per-class CSVs into one additional file.

  • merged_csv_name

    (str, default: 'all_feature_values.csv' ) –

    Filename to use for the merged CSV artifact.

  • use_audio

    (bool, default: True ) –

    Whether video generation should include transcript context.

  • text_column

    (Optional[str], default: None ) –

    Required for tabular generation.

  • label_column

    (Optional[str], default: None ) –

    Optional row-level label override for tabular generation.

Returns:

  • Dict[str, str]

    A mapping from class name to generated CSV path. When

  • Dict[str, str]

    merge_to_single_csv is enabled, the merged output is returned under

  • Dict[str, str]

    the "__merged__" key.

generate_features_from_images(*args, **kwargs) -> Dict[str, str]

Generate features using outputs/discovered_image_features.json by default.

generate_features_from_tabular(*args, **kwargs) -> Dict[str, str]

Generate features using outputs/discovered_tabular_features.json by default.

generate_features_from_texts(*args, **kwargs) -> Dict[str, str]

Generate features using outputs/discovered_text_features.json by default.

generate_features_from_videos(*args, **kwargs) -> Dict[str, str]

Generate features using outputs/discovered_video_features.json by default.

load_discovered_features(path: Union[str, Path]) -> Dict[str, Any]

Load and normalize a discovered-features JSON artifact.

The helper accepts both the list-oriented structure written by the discovery functions and a direct dictionary payload. It normalizes both into a dictionary with a proposed_features key.

parse_json_from_markdown(text: str) -> Dict[str, Any]

Parse JSON content that may be wrapped in a fenced Markdown block.