Generation Pipeline

The Udility Diffuser operates through a multi-stage orchestration that transforms natural language prompts into precise, labeled educational illustrations. Unlike traditional diffusion models that generate pixel-based noise, Udility uses a structured scripting approach to ensure clarity and label accuracy.

Pipeline Overview

The generation process follows four distinct phases:

Instruction Synthesis: The user prompt is expanded into a detailed technical blueprint for an illustration.
SVG Code Generation: The blueprint is translated into raw SVG (Scalable Vector Graphics) code.
Rasterization: The SVG code is converted into a high-quality PNG image.
Rendering: The final image is saved and displayed to the user.

Primary Interface

For most use cases, the entire pipeline is handled by a single high-level function.

`generate_image_from_text()`

This is the main entry point for the library. It orchestrates the flow from text input to the final image display.

from Udility import diffuser

diffuser.generate_image_from_text(
    text_description="A diagram of a simple electric circuit with a battery and a bulb.",
    output_filename="circuit_diagram.png"
)

Parameters:

text_description (str): A clear description of the image or educational concept you want to visualize.
output_filename (str, optional): The filename for the generated PNG. Defaults to 'output.png'.

Granular Pipeline Control

For advanced users who require more control (e.g., extracting the raw SVG code or modifying the instructions before generation), Udility exposes the internal stages of the pipeline.

1. Contextual Instruction Generation

This stage uses Meta Llama-3.1-405B (via OpenRouter) to generate a technical description of how the requested image should look.

instructions = diffuser.get_detailed_instructions("The water cycle process.")

Input: text_description (str)
Returns: detailed_instructions (str)

2. SVG Scripting

The instructions are then passed to the LLM to write valid, standalone SVG code.

svg_code = diffuser.generate_svg_from_instructions(instructions)

Input: detailed_instructions (str)
Returns: svg_code (str) — A string starting with <svg> and ending with </svg>.

3. Image Rasterization

The library utilizes cairosvg to convert the vector-based SVG script into a standard image format.

diffuser.svg_to_png(svg_code, output_filename='result.png')

Input: svg_code (str), output_filename (str)
Output: Saves a PNG file to the specified path.

4. Visualization

Finally, the image is rendered within the environment (optimized for Jupyter/Colab notebooks) using matplotlib.

diffuser.display_image('result.png')

Input: image_path (str)

Pipeline Requirements

To ensure the pipeline functions correctly, the following must be configured:

API Key: An OPENROUTER_API_KEY must be set in your environment variables.
System Dependencies: The pipeline relies on cairosvg, which requires libcairo to be installed on the host system (automatically handled in most Python environments, but may require apt-get install libcairo2 on some Linux distributions).