Generation Pipeline
Generation Pipeline
The Udility Diffuser operates through a multi-stage orchestration that transforms natural language prompts into precise, labeled educational illustrations. Unlike traditional diffusion models that generate pixel-based noise, Udility uses a structured scripting approach to ensure clarity and label accuracy.
Pipeline Overview
The generation process follows four distinct phases:
- Instruction Synthesis: The user prompt is expanded into a detailed technical blueprint for an illustration.
- SVG Code Generation: The blueprint is translated into raw SVG (Scalable Vector Graphics) code.
- Rasterization: The SVG code is converted into a high-quality PNG image.
- Rendering: The final image is saved and displayed to the user.
Primary Interface
For most use cases, the entire pipeline is handled by a single high-level function.
generate_image_from_text()
This is the main entry point for the library. It orchestrates the flow from text input to the final image display.
from Udility import diffuser
diffuser.generate_image_from_text(
text_description="A diagram of a simple electric circuit with a battery and a bulb.",
output_filename="circuit_diagram.png"
)
Parameters:
text_description(str): A clear description of the image or educational concept you want to visualize.output_filename(str, optional): The filename for the generated PNG. Defaults to'output.png'.
Granular Pipeline Control
For advanced users who require more control (e.g., extracting the raw SVG code or modifying the instructions before generation), Udility exposes the internal stages of the pipeline.
1. Contextual Instruction Generation
This stage uses Meta Llama-3.1-405B (via OpenRouter) to generate a technical description of how the requested image should look.
instructions = diffuser.get_detailed_instructions("The water cycle process.")
- Input:
text_description(str) - Returns:
detailed_instructions(str)
2. SVG Scripting
The instructions are then passed to the LLM to write valid, standalone SVG code.
svg_code = diffuser.generate_svg_from_instructions(instructions)
- Input:
detailed_instructions(str) - Returns:
svg_code(str) — A string starting with<svg>and ending with</svg>.
3. Image Rasterization
The library utilizes cairosvg to convert the vector-based SVG script into a standard image format.
diffuser.svg_to_png(svg_code, output_filename='result.png')
- Input:
svg_code(str),output_filename(str) - Output: Saves a PNG file to the specified path.
4. Visualization
Finally, the image is rendered within the environment (optimized for Jupyter/Colab notebooks) using matplotlib.
diffuser.display_image('result.png')
- Input:
image_path(str)
Pipeline Requirements
To ensure the pipeline functions correctly, the following must be configured:
- API Key: An
OPENROUTER_API_KEYmust be set in your environment variables. - System Dependencies: The pipeline relies on
cairosvg, which requireslibcairoto be installed on the host system (automatically handled in most Python environments, but may requireapt-get install libcairo2on some Linux distributions).