Files
flow-manager/PROJECT.md
2026-01-27 14:00:02 +01:00

13 KiB

Project: Auteur AI (Flow Edition) - Production Spec

Version: 2.1 (Expanded Production Build) Target Platform: Full Stack Web Application Infrastructure: Self-Hosted Linux Environment (Docker Compose / K8s) Goal: Create a functional pre-production asset management suite for Google Flow (Veo 3.1).

1. System Architecture & Infrastructure

Since resources are unlimited and the target environment is a robust self-hosted Linux cluster, we will run a microservices-ready monolithic structure via Docker. This architecture prioritizes data privacy (keeping scripts and assets local), low latency for heavy asset manipulation, and the flexibility to scale individual components (like the AI inference engine) without refactoring the entire stack.

The Stack

  • Frontend: React 18 (built via Vite). We will utilize TypeScript for strict type safety across the complex JSON data structures required by Google Flow.

    • Styling: TailwindCSS for utility-first styling combined with Shadcn/UI (Radix Primitives) for accessible, keyboard-navigable components.
    • State Management: TanStack Query (React Query) is critical here. It will handle server-state caching, deduping requests, and managing the "loading" and "error" states of asynchronous AI operations. We will use Zustand for transient client-side state (e.g., dragging an ingredient into a slot).
  • Backend: Python (FastAPI).

    • Rationale: While Node.js is capable, Python is the native language of AI. Using FastAPI allows us to integrate directly with libraries like langchain, llama-index, or raw transformers pipelines if we decide to move beyond API-based LLMs in the future. FastAPI also provides automatic OpenAPI (Swagger) documentation and high-performance async support via Starlette.
  • Database: PostgreSQL 16.

    • Rationale: We need a robust relational database to manage the strict hierarchy of Projects -> Scenes -> Shots. PostgreSQL's binary JSON (JSONB) support is essential for storing the flexible metadata associated with AI assets and the complex, nested JSON payloads generated for Veo.
  • Object Storage: MinIO.

    • Rationale: A self-hosted, S3-compatible object storage server. This allows us to handle gigabytes of video references and high-res character sheets without clogging the database or the application server's file system. It supports pre-signed URLs, offloading file serving traffic directly to the client.
  • AI Inference: Local Ollama instance.

    • Rationale: Running Llama 3 (8B or 70B parameter) or Mistral locally ensures zero data leakage. The API will communicate with Ollama via HTTP requests, allowing for easy model swapping (e.g., testing codellama for JSON generation vs llama3 for creative writing).

Docker Compose Services

The production docker-compose.yml will orchestrate the following interconnected services:

  1. frontend: A high-performance Nginx container serving the static React build. It will also act as a reverse proxy to route /api requests to the backend, eliminating CORS issues.
  2. backend: The FastAPI application server running via uvicorn (Port 8000). It acts as the orchestrator.
  3. db: PostgreSQL 16 (Port 5432) with a persistent volume for data safety.
  4. minio: The S3-compatible storage engine (Port 9000 for API, 9001 for Console).
  5. redis: Redis (Port 6379).
    • Usage: This is crucial for a robust production app. It will serve as the message broker for Celery or ARQ (async task queues). When a user uploads a 4K video or requests a full script breakdown, the API will offload this "heavy lifting" to a background worker to keep the interface snappy.
  6. worker: A Python container running the background task consumer (Celery/ARQ) to process video thumbnails and long-running LLM inference tasks.

2. Database Schema (PostgreSQL)

The database schema needs to be robust enough to handle the relationships between creative entities. The agent must create migrations (using Alembic for Python) for the following schema. Note the addition of indices for performance.

-- Enum for strict typing of asset categories, critical for the "Slot" system
CREATE TYPE asset_type AS ENUM ('Character', 'Location', 'Object', 'Style');

-- Projects: The top-level container
CREATE TABLE projects (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    name TEXT NOT NULL,
    resolution TEXT DEFAULT '4K', -- e.g., '3840x2160'
    aspect_ratio TEXT DEFAULT '16:9',
    veo_version TEXT DEFAULT '3.1',
    created_at TIMESTAMP DEFAULT NOW(),
    updated_at TIMESTAMP DEFAULT NOW()
);

-- Ingredients: The reusable assets (Actors, Sets, Props)
CREATE TABLE ingredients (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    project_id UUID REFERENCES projects(id) ON DELETE CASCADE,
    name TEXT NOT NULL,
    type asset_type NOT NULL,
    s3_key TEXT NOT NULL, -- The path in the MinIO bucket
    s3_bucket TEXT DEFAULT 'auteur-assets',
    thumbnail_key TEXT, -- Path to a generated low-res thumbnail
    metadata JSONB DEFAULT '{}', -- Stores AI-generated tags (e.g., {"hair": "blue", "mood": "dark"})
    created_at TIMESTAMP DEFAULT NOW()
);

-- Index for faster filtering by type within a project
CREATE INDEX idx_ingredients_project_type ON ingredients(project_id, type);

-- Scenes: Logical groupings within the script
CREATE TABLE scenes (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    project_id UUID REFERENCES projects(id) ON DELETE CASCADE,
    slugline TEXT NOT NULL, -- e.g., "INT. SERVER ROOM - NIGHT"
    raw_content TEXT, -- The full text body of the scene
    sequence_number INT NOT NULL, -- For ordering scenes in the UI
    created_at TIMESTAMP DEFAULT NOW()
);

-- Shots: The atomic unit of generation
CREATE TABLE shots (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    scene_id UUID REFERENCES scenes(id) ON DELETE CASCADE,
    description TEXT NOT NULL, -- The visual description
    duration FLOAT, -- Estimated duration in seconds
    sequence_number INT, -- Order within the scene
    
    -- "The Slot System": JSONB array of Ingredient UUIDs assigned to slots 1, 2, 3.
    -- Example: ["uuid-char-1", "uuid-loc-2", null]
    assigned_ingredients JSONB DEFAULT '[]', 
    
    -- The computed prompt context sent to the LLM
    llm_context_cache TEXT, 
    
    -- The final output for Google Flow/Veo
    veo_json_payload JSONB,
    
    status TEXT DEFAULT 'draft', -- 'draft', 'generating_json', 'ready'
    updated_at TIMESTAMP DEFAULT NOW()
);

CREATE INDEX idx_shots_scene ON shots(scene_id);

3. API Module Specifications

Module 1: Asset Library (Real Uploads & Processing)

Endpoint: POST /api/assets/upload

Logic Flow:

  1. Validation: Frontend sends FormData containing the file and metadata (project_id, type). Backend validates file type (image/png, image/jpeg) and size limits.
  2. Storage: Backend streams the file directly to the MinIO bucket auteur-assets using boto3 or minio-py. It generates a unique object key (e.g., proj_id/uuid.jpg).
  3. Database: A record is created in the ingredients table with the s3_key.
  4. Background Processing (Async): A task is pushed to the Redis queue to:
    • Generate a 200px thumbnail for the UI.
    • (Optional) Run a "Vision" LLM task (using Ollama's llava model) to auto-caption the image and populate the metadata JSONB field (e.g., "A robotic dog standing in rain").
  5. Return: The API returns the new Asset object, including a pre-signed URL for immediate display.

Module 2: Intelligent Script Parser (The "Ingestion Engine")

Endpoint: POST /api/scripts/parse

Logic Flow:

  1. Ingest: User uploads a .txt or .fountain screenplay file.
  2. Preprocessing: Backend reads the text. If it's a large script, it chunks it by Scene Headers (INT., EXT.).
  3. AI Analysis (Ollama): The content is sent to the local LLM.
    • System Prompt: "You are a Script Supervisor. Break the following screenplay text into a structured JSON array of shots. Identify the action lines that denote visual changes. Ignore dialogue unless it implies visual action."
    • Schema Enforcement: We will use Pydantic models to validate that the LLM's output matches the expected JSON structure (Shot Description, Estimated Duration).
  4. Persistence: The backend iterates through the validated JSON array and performs a bulk insert into the scenes and shots tables, ensuring sequence_numbers are preserved.
  5. Notification: The frontend is notified (via polling or WebSocket) that the script is ready for review.

Module 3: Flow Assembly & JSON Generation (The "Translator")

Endpoint: POST /api/shots/:id/generate-flow

Logic Flow:

  1. Data Gathering: The endpoint fetches the shot record. It then queries the ingredients table to retrieve the full details (Name, Metadata, Visual Description) of the UUIDs stored in assigned_ingredients.
  2. Context Construction: A rich text prompt is assembled.
    • Example: "Construct a Google Veo 3.1 JSON configuration. The shot is: '{shot.description}'. The Character is '{ingredient[0].name}', described as '{ingredient[0].metadata}'. The Location is '{ingredient[1].name}'."
  3. Prompt Engineering: The prompt explicitly forbids the LLM from adding hallucinated details and forces it to map the Ingredient characteristics to the specific JSON fields required by Veo (e.g., subject.description, environment.lighting).
  4. AI Action: Send to Ollama. We use a low-temperature setting (e.g., 0.2) to ensure deterministic, strictly formatted JSON output.
  5. Validation: The backend parses the returned JSON string into a Python Dictionary. If parsing fails, it retries up to 2 times.
  6. Update: The valid JSON is saved to shots.veo_json_payload, and the status is updated to 'ready'.

4. Frontend Integration Guidelines

  • API Client: Use axios with a configured base URL (e.g., /api/v1). Implement interceptors to handle 401/403 errors or global loading states.
  • Image Handling (Presigned URLs): The frontend should never try to fetch images directly from the MinIO container's internal IP. Instead, the API returns a presigned URL (valid for 1 hour) that allows the browser to fetch the image directly from the MinIO public endpoint.
  • Optimistic UI: When a user updates a shot description or drags an ingredient, the UI should update immediately (using react-query's setQueryData) before the API call resolves. If the API call fails, the change is rolled back with a toast notification.
  • Editor Component: For the JSON editor, use @monaco-editor/react to provide syntax highlighting and code folding, giving the "IDE" feel.

5. Implementation Prompt for Coding Agent

Copy and paste this detailed instruction block into Antigravity/Cursor/Windsurf to begin the build process:

"Act as a Senior Full-Stack Software Architect. We are building 'Auteur AI', a professional video production management application.

Core Constraint: DO NOT USE MOCK DATA. This is a real implementation meant for production deployment on a Linux cluster.

Technology Stack Definition:

  1. Backend: Python FastAPI.

    • Use SQLAlchemy (Async) for ORM.
    • Use Alembic for database migrations.
    • Use Pydantic for strict data validation (Models).
    • Use boto3 for MinIO (S3) interaction.
    • Use ollama python library for communicating with the local LLM (http://host.docker.internal:11434).
  2. Frontend: React (Vite) + TypeScript + TailwindCSS.

    • Use axios for API requests.
    • Use tanstack/react-query for data fetching and caching.
    • Use shadcn/ui components for the interface.
    • Use lucide-react for iconography.

Task 1: Infrastructure Setup Create a production-ready docker-compose.yml. It must include:

  • postgres (v16) with a named volume for persistence.
  • minio with a create-bucket entrypoint script.
  • backend service (FastAPI) with hot-reload enabled for dev.
  • frontend service (Node/Vite) proxying requests to the backend.

Task 2: Database Layer Define the SQLAlchemy models exactly matching the schema provided in the TDD (Projects, Ingredients, Scenes, Shots). Create the initial Alembic migration script.

Task 3: Backend API Implementation

  • Implement the POST /api/assets/upload endpoint using UploadFile. It must save to MinIO and Postgres.
  • Implement the POST /api/scripts/parse endpoint. It must accept a text file, chunk it, send it to Ollama for analysis, and store the resulting Shots.

Task 4: Frontend Development

  • Set up the React Router with layouts (Sidebar/Header).
  • Build the 'Asset Library' view: Fetch real data from the API, display images using presigned URLs, and implement a real file upload dropzone."