206 lines
13 KiB
Markdown
206 lines
13 KiB
Markdown
# Project: Auteur AI (Flow Edition) - Production Spec
|
|
|
|
**Version:** 2.1 (Expanded Production Build)
|
|
**Target Platform:** Full Stack Web Application
|
|
**Infrastructure:** Self-Hosted Linux Environment (Docker Compose / K8s)
|
|
**Goal:** Create a functional pre-production asset management suite for Google Flow (Veo 3.1).
|
|
|
|
## 1. System Architecture & Infrastructure
|
|
|
|
Since resources are unlimited and the target environment is a robust self-hosted Linux cluster, we will run a microservices-ready monolithic structure via Docker. This architecture prioritizes data privacy (keeping scripts and assets local), low latency for heavy asset manipulation, and the flexibility to scale individual components (like the AI inference engine) without refactoring the entire stack.
|
|
|
|
### The Stack
|
|
|
|
* **Frontend:** **React 18** (built via Vite). We will utilize **TypeScript** for strict type safety across the complex JSON data structures required by Google Flow.
|
|
* **Styling:** **TailwindCSS** for utility-first styling combined with **Shadcn/UI** (Radix Primitives) for accessible, keyboard-navigable components.
|
|
* **State Management:** **TanStack Query (React Query)** is critical here. It will handle server-state caching, deduping requests, and managing the "loading" and "error" states of asynchronous AI operations. We will use **Zustand** for transient client-side state (e.g., dragging an ingredient into a slot).
|
|
|
|
* **Backend:** **Python (FastAPI)**.
|
|
* *Rationale:* While Node.js is capable, Python is the native language of AI. Using FastAPI allows us to integrate directly with libraries like `langchain`, `llama-index`, or raw `transformers` pipelines if we decide to move beyond API-based LLMs in the future. FastAPI also provides automatic OpenAPI (Swagger) documentation and high-performance async support via Starlette.
|
|
|
|
* **Database:** **PostgreSQL 16**.
|
|
* *Rationale:* We need a robust relational database to manage the strict hierarchy of Projects -> Scenes -> Shots. PostgreSQL's binary JSON (JSONB) support is essential for storing the flexible metadata associated with AI assets and the complex, nested JSON payloads generated for Veo.
|
|
|
|
* **Object Storage:** **MinIO**.
|
|
* *Rationale:* A self-hosted, S3-compatible object storage server. This allows us to handle gigabytes of video references and high-res character sheets without clogging the database or the application server's file system. It supports pre-signed URLs, offloading file serving traffic directly to the client.
|
|
|
|
* **AI Inference:** **Local Ollama instance**.
|
|
* *Rationale:* Running Llama 3 (8B or 70B parameter) or Mistral locally ensures zero data leakage. The API will communicate with Ollama via HTTP requests, allowing for easy model swapping (e.g., testing `codellama` for JSON generation vs `llama3` for creative writing).
|
|
|
|
### Docker Compose Services
|
|
|
|
The production `docker-compose.yml` will orchestrate the following interconnected services:
|
|
|
|
1. `frontend`: A high-performance **Nginx** container serving the static React build. It will also act as a reverse proxy to route `/api` requests to the backend, eliminating CORS issues.
|
|
2. `backend`: The **FastAPI** application server running via `uvicorn` (Port 8000). It acts as the orchestrator.
|
|
3. `db`: **PostgreSQL 16** (Port 5432) with a persistent volume for data safety.
|
|
4. `minio`: The S3-compatible storage engine (Port 9000 for API, 9001 for Console).
|
|
5. `redis`: **Redis** (Port 6379).
|
|
* *Usage:* This is crucial for a robust production app. It will serve as the message broker for **Celery** or **ARQ** (async task queues). When a user uploads a 4K video or requests a full script breakdown, the API will offload this "heavy lifting" to a background worker to keep the interface snappy.
|
|
6. `worker`: A Python container running the background task consumer (Celery/ARQ) to process video thumbnails and long-running LLM inference tasks.
|
|
|
|
## 2. Database Schema (PostgreSQL)
|
|
|
|
The database schema needs to be robust enough to handle the relationships between creative entities. The agent must create migrations (using **Alembic** for Python) for the following schema. Note the addition of indices for performance.
|
|
|
|
```sql
|
|
-- Enum for strict typing of asset categories, critical for the "Slot" system
|
|
CREATE TYPE asset_type AS ENUM ('Character', 'Location', 'Object', 'Style');
|
|
|
|
-- Projects: The top-level container
|
|
CREATE TABLE projects (
|
|
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
|
name TEXT NOT NULL,
|
|
resolution TEXT DEFAULT '4K', -- e.g., '3840x2160'
|
|
aspect_ratio TEXT DEFAULT '16:9',
|
|
veo_version TEXT DEFAULT '3.1',
|
|
created_at TIMESTAMP DEFAULT NOW(),
|
|
updated_at TIMESTAMP DEFAULT NOW()
|
|
);
|
|
|
|
-- Ingredients: The reusable assets (Actors, Sets, Props)
|
|
CREATE TABLE ingredients (
|
|
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
|
project_id UUID REFERENCES projects(id) ON DELETE CASCADE,
|
|
name TEXT NOT NULL,
|
|
type asset_type NOT NULL,
|
|
s3_key TEXT NOT NULL, -- The path in the MinIO bucket
|
|
s3_bucket TEXT DEFAULT 'auteur-assets',
|
|
thumbnail_key TEXT, -- Path to a generated low-res thumbnail
|
|
metadata JSONB DEFAULT '{}', -- Stores AI-generated tags (e.g., {"hair": "blue", "mood": "dark"})
|
|
created_at TIMESTAMP DEFAULT NOW()
|
|
);
|
|
|
|
-- Index for faster filtering by type within a project
|
|
CREATE INDEX idx_ingredients_project_type ON ingredients(project_id, type);
|
|
|
|
-- Scenes: Logical groupings within the script
|
|
CREATE TABLE scenes (
|
|
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
|
project_id UUID REFERENCES projects(id) ON DELETE CASCADE,
|
|
slugline TEXT NOT NULL, -- e.g., "INT. SERVER ROOM - NIGHT"
|
|
raw_content TEXT, -- The full text body of the scene
|
|
sequence_number INT NOT NULL, -- For ordering scenes in the UI
|
|
created_at TIMESTAMP DEFAULT NOW()
|
|
);
|
|
|
|
-- Shots: The atomic unit of generation
|
|
CREATE TABLE shots (
|
|
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
|
scene_id UUID REFERENCES scenes(id) ON DELETE CASCADE,
|
|
description TEXT NOT NULL, -- The visual description
|
|
duration FLOAT, -- Estimated duration in seconds
|
|
sequence_number INT, -- Order within the scene
|
|
|
|
-- "The Slot System": JSONB array of Ingredient UUIDs assigned to slots 1, 2, 3.
|
|
-- Example: ["uuid-char-1", "uuid-loc-2", null]
|
|
assigned_ingredients JSONB DEFAULT '[]',
|
|
|
|
-- The computed prompt context sent to the LLM
|
|
llm_context_cache TEXT,
|
|
|
|
-- The final output for Google Flow/Veo
|
|
veo_json_payload JSONB,
|
|
|
|
status TEXT DEFAULT 'draft', -- 'draft', 'generating_json', 'ready'
|
|
updated_at TIMESTAMP DEFAULT NOW()
|
|
);
|
|
|
|
CREATE INDEX idx_shots_scene ON shots(scene_id);
|
|
```
|
|
|
|
## 3. API Module Specifications
|
|
|
|
### Module 1: Asset Library (Real Uploads & Processing)
|
|
|
|
**Endpoint:** `POST /api/assets/upload`
|
|
|
|
**Logic Flow:**
|
|
|
|
1. **Validation:** Frontend sends `FormData` containing the file and metadata (project_id, type). Backend validates file type (image/png, image/jpeg) and size limits.
|
|
2. **Storage:** Backend streams the file directly to the **MinIO** bucket `auteur-assets` using `boto3` or `minio-py`. It generates a unique object key (e.g., `proj_id/uuid.jpg`).
|
|
3. **Database:** A record is created in the `ingredients` table with the `s3_key`.
|
|
4. **Background Processing (Async):** A task is pushed to the Redis queue to:
|
|
* Generate a 200px thumbnail for the UI.
|
|
* (Optional) Run a "Vision" LLM task (using Ollama's `llava` model) to auto-caption the image and populate the `metadata` JSONB field (e.g., "A robotic dog standing in rain").
|
|
5. **Return:** The API returns the new Asset object, including a pre-signed URL for immediate display.
|
|
|
|
### Module 2: Intelligent Script Parser (The "Ingestion Engine")
|
|
|
|
**Endpoint:** `POST /api/scripts/parse`
|
|
|
|
**Logic Flow:**
|
|
|
|
1. **Ingest:** User uploads a `.txt` or `.fountain` screenplay file.
|
|
2. **Preprocessing:** Backend reads the text. If it's a large script, it chunks it by Scene Headers (`INT.`, `EXT.`).
|
|
3. **AI Analysis (Ollama):** The content is sent to the local LLM.
|
|
* *System Prompt:* "You are a Script Supervisor. Break the following screenplay text into a structured JSON array of shots. Identify the action lines that denote visual changes. Ignore dialogue unless it implies visual action."
|
|
* *Schema Enforcement:* We will use **Pydantic** models to validate that the LLM's output matches the expected JSON structure (Shot Description, Estimated Duration).
|
|
4. **Persistence:** The backend iterates through the validated JSON array and performs a bulk insert into the `scenes` and `shots` tables, ensuring `sequence_numbers` are preserved.
|
|
5. **Notification:** The frontend is notified (via polling or WebSocket) that the script is ready for review.
|
|
|
|
### Module 3: Flow Assembly & JSON Generation (The "Translator")
|
|
|
|
**Endpoint:** `POST /api/shots/:id/generate-flow`
|
|
|
|
**Logic Flow:**
|
|
|
|
1. **Data Gathering:** The endpoint fetches the `shot` record. It then queries the `ingredients` table to retrieve the full details (Name, Metadata, Visual Description) of the UUIDs stored in `assigned_ingredients`.
|
|
2. **Context Construction:** A rich text prompt is assembled.
|
|
* *Example:* "Construct a Google Veo 3.1 JSON configuration. The shot is: '{shot.description}'. The Character is '{ingredient[0].name}', described as '{ingredient[0].metadata}'. The Location is '{ingredient[1].name}'."
|
|
3. **Prompt Engineering:** The prompt explicitly forbids the LLM from adding hallucinated details and forces it to map the Ingredient characteristics to the specific JSON fields required by Veo (e.g., `subject.description`, `environment.lighting`).
|
|
4. **AI Action:** Send to Ollama. We use a low-temperature setting (e.g., 0.2) to ensure deterministic, strictly formatted JSON output.
|
|
5. **Validation:** The backend parses the returned JSON string into a Python Dictionary. If parsing fails, it retries up to 2 times.
|
|
6. **Update:** The valid JSON is saved to `shots.veo_json_payload`, and the status is updated to 'ready'.
|
|
|
|
## 4. Frontend Integration Guidelines
|
|
|
|
* **API Client:** Use `axios` with a configured base URL (e.g., `/api/v1`). Implement interceptors to handle 401/403 errors or global loading states.
|
|
* **Image Handling (Presigned URLs):** The frontend should never try to fetch images directly from the MinIO container's internal IP. Instead, the API returns a presigned URL (valid for 1 hour) that allows the browser to fetch the image directly from the MinIO public endpoint.
|
|
* **Optimistic UI:** When a user updates a shot description or drags an ingredient, the UI should update immediately (using `react-query`'s `setQueryData`) before the API call resolves. If the API call fails, the change is rolled back with a toast notification.
|
|
* **Editor Component:** For the JSON editor, use `@monaco-editor/react` to provide syntax highlighting and code folding, giving the "IDE" feel.
|
|
|
|
## 5. Implementation Prompt for Coding Agent
|
|
|
|
*Copy and paste this detailed instruction block into Antigravity/Cursor/Windsurf to begin the build process:*
|
|
|
|
> "Act as a Senior Full-Stack Software Architect. We are building 'Auteur AI', a professional video production management application.
|
|
>
|
|
> **Core Constraint:** DO NOT USE MOCK DATA. This is a real implementation meant for production deployment on a Linux cluster.
|
|
>
|
|
> **Technology Stack Definition:**
|
|
>
|
|
> 1. **Backend:** Python **FastAPI**.
|
|
> * Use `SQLAlchemy` (Async) for ORM.
|
|
> * Use `Alembic` for database migrations.
|
|
> * Use `Pydantic` for strict data validation (Models).
|
|
> * Use `boto3` for MinIO (S3) interaction.
|
|
> * Use `ollama` python library for communicating with the local LLM (`http://host.docker.internal:11434`).
|
|
>
|
|
> 2. **Frontend:** React (Vite) + TypeScript + TailwindCSS.
|
|
> * Use `axios` for API requests.
|
|
> * Use `tanstack/react-query` for data fetching and caching.
|
|
> * Use `shadcn/ui` components for the interface.
|
|
> * Use `lucide-react` for iconography.
|
|
>
|
|
> **Task 1: Infrastructure Setup**
|
|
> Create a production-ready `docker-compose.yml`. It must include:
|
|
>
|
|
> * `postgres` (v16) with a named volume for persistence.
|
|
> * `minio` with a create-bucket entrypoint script.
|
|
> * `backend` service (FastAPI) with hot-reload enabled for dev.
|
|
> * `frontend` service (Node/Vite) proxying requests to the backend.
|
|
>
|
|
> **Task 2: Database Layer**
|
|
> Define the SQLAlchemy models exactly matching the schema provided in the TDD (Projects, Ingredients, Scenes, Shots). Create the initial Alembic migration script.
|
|
>
|
|
> **Task 3: Backend API Implementation**
|
|
>
|
|
> * Implement the `POST /api/assets/upload` endpoint using `UploadFile`. It must save to MinIO and Postgres.
|
|
> * Implement the `POST /api/scripts/parse` endpoint. It must accept a text file, chunk it, send it to Ollama for analysis, and store the resulting Shots.
|
|
>
|
|
> **Task 4: Frontend Development**
|
|
>
|
|
> * Set up the React Router with layouts (Sidebar/Header).
|
|
> * Build the 'Asset Library' view: Fetch real data from the API, display images using presigned URLs, and implement a real file upload dropzone."
|