LLM Assisted Content Creation Screen

Building AI-assisted screens in any B2B application today is a hand-tooled affair. Every time a product manager wants to add a “generate summary,” “draft outreach,” or “suggest next steps” panel, a developer must hard-code the prompt, wire up the LLM call, and ship a full release.  Teams burn time creating, implementing and debugging the same screens over and over again.   Once released we then find out that the prompt misses the mark and needs another cycle of edits. Fuck.

The LLM Assist Screen Registry takes that process and make it a basic configuration task. Instead of hard-coding prompts, developers register a screen once, reference it in their app as a customer React component and they're done.  The System Administrator can then update and control LLM and version, and map any dynamic fields in a low-code form.  Post release, System Administrators can edit prompts, audience context, or model settings live—no repo commit, no CI/CD pipeline, no App Store review. For our Developer persona the job-to-be-done is “ship AI-powered UI in hours, not days.” For the System Administrator it is “safely tune prompts in production without waking engineering.” For the Product Manager it is “experiment rapidly and measure which prompt variant lifts conversion or NPS.”

Consider the workflow today: a PM drafts a prompt in Google Docs, an engineer pastes it into code, QA opens Postman to mock calls, and DevOps pushes a release. With the registry, the PM fills out a form, clicks Test, and sees side-by-side outputs from GPT-4o, Claude 3.5, and an open-source model. They choose the best result, hit Deploy, and the screen instantly updates for every user. We estimate this trims prompt-iteration time from ~20 minutes and two humans to under 90 seconds and one human, unlocking dozens of micro-optimizations each sprint.

Jobs-to-be-Done come into sharper focus:

  • “Generate tailored content without touching code.” Developers now assemble rather than author AI flows, focusing on business logic instead of boilerplate.

  • “Continuously improve prompt quality in production.” Admins can A/B test prompts, roll back poor performers, and keep an audit trail of every query and response for compliance.

  • “Turn system data into narrative insight.” PMs can map CRM fields, user properties, or transaction logs into the prompt template and trust the registry to pass the right variables every time.

We drew inspiration from a few best-in-class patterns. Notion’s AI blocks let users insert a “Rewrite” or “Summarize” action anywhere in a doc—the power is in how effortlessly a generic capability is packaged for reuse. Likewise, Retool’s new AI-powered components allow builders to drop an “LLM Query” node onto a canvas and bind it to data without writing fetch code. Finally, Salesforce’s emerging Einstein Prompt Builder gives admins a WYSIWYG interface to craft and version LLM prompts, showing that no-code prompt orchestration is rapidly becoming table stakes for enterprise platforms.

By embedding a similar experience straight into Baseplate’s Feature Registry, we keep our open-source stack competitive with much larger commercial suites while preserving developer control. Every screen created through the registry inherits automatic logging, error handling, and rate-limit protection, so compliance and support teams gain visibility for free. Most importantly, teams can now treat AI interactions as living product copy—iterated daily, measured like any other funnel step, and always a quick edit away from getting better.

In short, the LLM Assist Screen Registry collapses prompt engineering, UI wiring, and deployment into a single, governed workflow. It liberates engineering hours, empowers non-technical teammates, and turns AI-assisted experiences into just another knob a product team can twist to drive growth.

🔐 Role Access

System Administrator

System Administrators own the entire LLM Assist Screen Registry. They can create, edit, disable, or delete screen definitions; select or onboard new LLM providers and versions; configure every prompt field (role, system prompt, style guide, audience inserts, refinement prompt); run side-by-side model tests; deploy versions to all tenants; manage rate-limits and error-handling rules; inspect full query/response logs; and assign granular permissions to other roles.

User Stories

  1. As a System Administrator, I want to create a new LLM Assist Screen by filling out the registry form so that product teams can add AI functionality without code pushes.

  2. As a System Administrator, I want to pick “GPT-4o” (v 0610) from the LLM dropdown so that I can target the latest model for better quality.

  3. As a System Administrator, I want to insert the {{user.richRole}} token into the System Prompt so that audience context is dynamically injected.

  4. As a System Administrator, I want to click Generate Preview and see outputs from three models in tabbed panels so that I can choose the most on-brand result before deploying.

  5. As a System Administrator, I want to save a new version and mark it “staged” so that QA can test in production-like conditions without exposing it to all users.

  6. As a System Administrator, I want to press Deploy and have the screen update instantly for every tenant so that we avoid the CI/CD pipeline.

  7. As a System Administrator, I want to roll back to the prior version if error rates exceed 5 % in the first hour so that user impact is minimized.

  8. As a System Administrator, I want to view a timeline of every query, response, and latency metric on the Logs tab so that I can troubleshoot performance.

  9. As a System Administrator, I want to bulk-export logs as CSV so that compliance teams can audit prompts.

  10. As a System Administrator, I want to set a per-tenant daily token cap so that runaway usage does not spike costs.

  11. As a System Administrator, I want to receive an alert when an LLM provider returns a 429 (rate-limit) error so that we can auto-retry or switch models.

  12. As a System Administrator, I want to assign “Edit Prompt” permission only to Customer Administrators in beta tenants so that early adopters can self-serve while others cannot.

Customer Success

Customer Success can view every screen definition, run test prompts and view per-tenant logs. They cannot edit prompt text or deploy new versions.

User Stories

  1. As Customer Success, I want to search the Logs tab for a user’s email so that I can reproduce the error they reported.

  2. As Customer Success, I want to impersonate the user and hit Refine on the screen so that I can verify the issue end-to-end.

All Other Roles (Customer Administrator, Manager, Standard User)

Can access whichever LLM-assisted screens are enabled on the system, input required data, generate and refine content, and copy or save results. They cannot view logs, change prompts, or see other users’ outputs unless granted additional permissions.

User Stories

  1. As a Standard User, I want to select “Q2 Product Line” from the dropdown and press Generate Summary so that I get a first draft in seconds.

  2. As a Standard User, I want to edit a sentence and click Refine so that the LLM polishes my wording.

  3. As a Standard User, I want an error message if I leave the “Objective” field blank so that I know why generation failed.

  4. As a Standard User, I want to copy the final text to clipboard with the Copy icon so that I can paste it into an email.

  5. As a Standard User, I want the screen to remember my last inputs in session storage so that I don’t retype fields on refresh.

  6. As a Standard User, I want a tooltip explaining token limits when my input exceeds 1,000 characters so that I can shorten it.

  7. As a Standard User, I want to see a spinner and “generating…” label while the LLM is processing so that I know the system is working.

  8. As a Standard User, I want to click Report Issue if the output is factually incorrect so that admins can improve prompts.

📄 Pages and Screens

Feature Registry → “LLM Assist Screens” Tab

After signing in, system administrators visit Feature Registry to manage modules. A new tab lists every AI-assisted screen. From here they create, inspect, or open any definition.

Screen Content

Static text

H1: LLM Assist Screen Registry
Sub-copy: Define, test, and deploy AI-assisted UI without code pushes.
Name Constraints Error Checking Field Notes
Search Screens optional ≤ 64 chars Debounced 300 ms
New Screen (button) Opens Screen Editor
Table – Screen Name read-only Click opens editor
Table – Status read-only (Draft / Staged / Live) Pill styling
Table – Model read-only “GPT-4o (v 0610)”
Table – Last Updated read-only ISO Sortable

Illustration: simple list view with sticky header.
Figma: TBD

Query & Setup

Endpoint Params Notes
GET /api/llm-screens tenantId, status?, search?, sort?, page Paged list
GET /api/llm-screens/counts tenantId Tiny payload for status pills

Business logic – role gate (System Admin sees all tenants); join latest version; redact drafts for non-owners.
Caching – counts cached 5 min in Redis.

Actions

Action Trigger Side Effects
Open Screen row click none; navigates
New Screen “New Screen” button creates empty draft row; redirects
Delete kebab menu > Delete soft-delete in llm_screens; toast “Deleted”

Notes

  • ARIA roles for table, search.

  • Load target < 150 ms (cached).

  • Internationalized date column via Intl.DateTimeFormat.

LLM Assist Screen Editor

Overview

Opened from the list (new or existing). Users step through five accordion sections, then stage or deploy.

Screen Content

Five collapsible sections:

  1. LLM Settings – provider, model, type.

  2. Prompt Template – role, system prompt, style guide.

  3. Audience Context – rich role, user/company chips.

  4. Refinement – enable flag + refinement prompt.

  5. Test & Deploy – Test button, version notes, stage/deploy.

Key fields (excerpt)

Name Constraints Errors Field Notes
Screen Name required ≤ 60 duplicate name slug autofills
System Prompt required ≤ 8 kB profanity, unmatched {{ token counter
List Fields ≥ 1 row if Type=List datatype dropdown
Test (button) enabled on valid form opens Model Comparison

Illustrative wireframe: multi-step form.
Figma: TBD

Query & Setup

Endpoint Params Notes
GET /api/llm-screens/{id} Full definition
GET /api/llm-providers Static 1 h cache
POST /api/llm-screens / PUT … body = form JSON Create/update
GET /api/variables tenantId tokens for autocomplete

Business logic – server prompt-linter; audit diff; autosave drafts 30 min.
Prefetch – provider list + tokens on mount.

Actions

Action Trigger Data Changes UI Feedback Events
Save Draft button upsert record, status=Draft toast “Saved” analytics draft_saved
Test Prompt button none (logs in llm_test_logs) opens modal, spinner prompt_tested
Stage button new version row status=Staged toast “Staged” version_staged
Deploy button marks version Live modal→spinner→success toast version_deployed

Notes

  • Keyboard Ctrl+S triggers Save.

  • WCAG 2.2: color contrast 4.5:1; all inputs labelled.

  • Expect ≤ 2 s load including provider fetch.

Model Comparison Modal

Overview

Inline test bed—user enters sample input, compares model outputs in tabs, then closes.

Screen Content

Field Constraints Errors Notes
Test Input required; obey Type validation inline multiline textarea
Tabs (one per model) read-only “Model unavailable” banner shows latency
Accept Output button disabled if error writes chosenModel back

Static header:

H2: Compare model outputs side-by-side

Query & Setup

Endpoint: POST /api/llm-run/comparison
Body: screenId, versionId, testInput, models[].
Server runs fan-out, stores logs, returns outputs+latency.

Caching – none (real-time).

Actions

Action Trigger Side Effects
Run Comparison modal open or “Re-run” inserts rows in llm_test_logs; UI spinners
Accept Output button sets preferredModel on draft version

Notes

  • Escape key closes modal.

  • Outputs in <pre> with copy-to-clipboard.

  • 30 s max per model call, else timeout error card.

Deployment Confirmation Dialog

Overview

Final guardrail—confirms scope and rollback rule.

Screen Content

Field Constraints Errors Notes
Rollback Threshold % optional 0–100 numeric default 5

Static copy:

Title: Deploy Version X.X?
Body: This makes the screen live for all enabled tenants.

Query & Setup

POST /api/llm-screens/{id}/deploy

Actions

Action Trigger Side Effects
Confirm Deploy button enqueue deploy job, toast, close
Cancel button none

Notes

Accessibility – focus trap in dialog; ESC closes.

Screen Logs & Analytics Page

Overview

Ops staff investigate usage spikes or errors here, launched from list “View Logs”.

Screen Content

Field Constraints Errors Notes
Date Range required invalid dates preset buttons
Search by User email regex auto-complete
Export CSV button disabled if >100 k rows w/out filter

Charts: volume, latency, error rate (SVG).

Query & Setup

Endpoints:

  • GET /api/llm-screens/{id}/metrics (agg)

  • GET /api/llm-screens/{id}/logs (raw / paged)

  • POST /api/llm-screens/{id}/logs/export (async)

Caching – metrics 5 min; raw no cache.

Actions

Action Trigger Side Effects
Filter Logs changes new query; chart re-renders
Export CSV button job row in exports; email link; toast “Export started”

Notes

  • Charts keyboard-navigable (role="img" + <title>).

  • Log rows virtualised for 100 k+.

 Generated LLM-Assist Runtime Screen

Overview

End-users land here from app nav; fill inputs; click Generate/Refine; copy or save output.

Screen Content

Field Constraints Errors Notes
Dynamic Inputs per config custom first input auto-focus
Generate button disabled until valid spinner
Output Area read-only click-to-edit inline
Refine button disabled until text edited spinner
Copy icon toast
Report Issue link opens feedback modal

Instruction banner: style guide paragraphs.

Query & Setup

  • GET /api/runtime/llm-screens/{sid} – merged definition

  • POST /api/runtime/llm-screens/{sid}/generate

  • POST /api/runtime/llm-screens/{sid}/refine

Prefetch – prompt cached CDN 15 min.

Actions

Action Trigger Data Changes UI Feedback Events
Generate button log row in llm_query_logs spinner→output generate
Refine button another log row spinner→updated text refine
Copy icon none toast “Copied!” copy_output
Report Issue link feedback row modal close → toast issue_reported

Notes

  • Full keyboard path (Tab → Shift+Tab).

  • Offline/timeout fallback message.

  • Target P95 latency ≤ 4 s.

 Global Error / Toast Components

Overview

Reusable notifications for model errors, quotas, validation.

Screen Content

Toast texts

  • “Model timeout – please retry.”

  • “Daily token quota reached.”

  • “Prompt validation failed.”

No form fields.

Query & Setup

Client-side only; error codes come in X-Error-Code header.

Actions

Displayed automatically when any screen dispatches showToast event.

Notes

  • Announces via aria-live="assertive".

  • Auto-dismiss 6 s; manual × button with keyboard focus.

Usage Dashboard → “AI Usage” Tab

Overview

Roll-up analytics for cost/accounting; accessed by SysAdmin or Customer Admin.

Screen Content

Field Constraints Errors Notes
Scope Selector all / tenant dropdown
Date Range required defaults 30 d
Charts read-only volume, cost

Query & Setup

  • GET /api/llm-usage/aggregate

  • GET /api/llm-usage/costs

Aggregates materialised hourly; cached 1 h.

Actions

Action Trigger Side Effects
Change Date/Scope selector new fetch; charts update
Drill-down link click bar navigates to Screen Logs with pre-filter

Notes

  • Currency shown in tenant-billing currency.

  • Charts label font supports CJK widths.

  • Goal: first paint ≤ 800 ms for cached window.

Global Notes

  • Accessibility – All interactive elements reachable via keyboard; ARIA roles on tablists, modals, toasts; color-blind-safe palette.

  • I18N – Copy held in i18n JSON; dates via locale; numeric commas/decimals.

  • Security – Tenant isolation enforced at every endpoint; prompt strings HTML-escaped before render.

  • Performance – Target < 2 s TTI for config screens; < 4 s P95 for generate/refine.

  • Compliance – Full audit trail (version diffs, query logs) retained 13 months; export encrypted at rest.

This document enumerates every UI surface and the requisite backend integration, giving design, engineering, and QA clear blueprints for the upcoming LLM Assist Screen Registry.

Implementation Notes

[To be filled in by development team]

🧱 Data Model

Table llm_screens

Name Description Constraints Notes
screen_id Primary key for a logical screen (prompt workflow). PK, uuid, generated by gen_random_uuid() Never changes once created.
screen_name Human-readable label shown in UI list. varchar(60), NOT NULL, unique per tenant  
slug URL-safe identifier derived from screen_name. varchar(80), NOT NULL, lowercase letters, digits, hyphens Regenerated only if name changes.
created_by User who first saved the draft. uuid, FK → users.user_id, NOT NULL  
created_at Timestamp of initial creation. timestamptz, default now()  
updated_at Timestamp of last metadata change (not version edits). timestamptz, auto-updated via trigger  
live_version_id The version currently served in production (if any). uuid, FK → llm_screen_versions.version_id, NULLABLE Allows O(1) look-ups at runtime.
status Aggregate lifecycle state: draft | staged | live | archived. screen_status enum, default draft Redundant but speeds list queries.

Table llm_screen_versions

Name Description Constraints Notes
version_id Primary key for a specific prompt/config snapshot. PK, uuid  
screen_id Parent link to llm_screens. FK, uuid, NOT NULL On delete → cascade.
version_number Monotonic integer per screen (1, 2, 3…). int, NOT NULL Generated in DB with max()+1.
status draft | staged | live | rolled_back version_status enum, default draft  
llm_provider e.g. openai, anthropic, mistral. varchar(32), NOT NULL Enumerated options from a system wide LLM table is likely the best implementation here
llm_model Marketing name: gpt-4o, claude-3.5-sonnet. varchar(64), NOT NULL  
llm_model_version Provider-specific build tag (0610, 20240601…). varchar(32), NULLABLE  
prompt_type text | list prompt_type enum, NOT NULL Drives UI rendering.   Allow for a degree of future safety with out screen options to come.
role “persona” the LLM should assume. varchar(100), NULLABLE  
system_prompt Full system-level prompt text. text, NOT NULL, ≤ 8 kB Token-counted in UI & API.
style_guide Audience writing style guidance. text, NULLABLE, ≤ 2 kB  
include_audience Flag: inject audience facts. boolean, default false  
include_rich_role Include user.rich_role if present. boolean, default false Effective only when include_audience = true.
include_user_fields JSONB array of user field keys to pass. jsonb, default '[]'  
include_company_fields JSONB array of company field keys to pass. jsonb, default '[]'  
initial_prompt Template sent on first Generate. text, NOT NULL May reference tokens like {{objective}}.
refinement_prompt Template for subsequent Refine requests. text, NOT NULL  
created_by Author of this version. uuid, FK → users  
created_at Timestamp of version save. timestamptz, default now()  
staged_at When status first became staged. timestamptz, NULLABLE  
live_at When status became live. timestamptz, NULLABLE  
rollback_threshold_pct Auto-rollback error % guard. numeric(5,2), NULLABLE Null = no auto-rollback.
preferred_model Set when tester picks the “winning” model. varchar(64), NULLABLE Used as default on deploy.

Table llm_screen_list_fields

(Only populated when prompt_type = 'list'.)

Name Description Constraints Notes
list_field_id Row primary key. PK, uuid  
version_id Link to specific version. FK, NOT NULL Cascade delete with version.
field_name Key referenced in prompt ({{field_name}}). varchar(60), NOT NULL Snake-case enforced.
data_type string | integer | decimal | date | boolean. varchar(16), NOT NULL  
is_required Must user supply value? boolean, default false  
max_length Soft cap (chars) for strings. int, NULLABLE  
sort_order Display order in UI. int, default 0  

 

Table llm_query_logs

Name Description Constraints Notes
query_id Unique log row. PK, uuid  
tenant_id Tenant making the call. uuid, FK → tenants  
screen_id Logical screen. uuid, FK → llm_screens  
version_id Version in use. uuid, FK → llm_screen_versions  
user_id End-user invoking action. uuid, FK → users  
op_type generate | refine varchar(8)  
llm_provider Recorded provider. varchar(32)  
llm_model Model name. varchar(64)  
llm_model_version Provider build tag. varchar(32), NULLABLE  
input_payload JSONB of variables sent. jsonb PII encrypted at column level.
output_text Raw LLM response (or partial). text  
prompt_tokens # tokens in request. int  
completion_tokens # tokens in response. int  
cost_usd Normalised USD cost. numeric(10,4)  
latency_ms End-to-end duration. int  
error_code Standardised error label. varchar(32), NULLABLE NULL = success.
created_at Timestamp of call. timestamptz, default now() BRIN index by date for fast range scans.

Table llm_test_logs

Name Description Constraints Notes
test_id Unique test record. PK, uuid  
version_id Version under test. uuid, FK  
user_id Tester (usually System / Customer Admin). uuid, FK  
llm_provider Provider tested. varchar(32)  
llm_model Model tested. varchar(64)  
test_input Prompt payload. text  
test_output Model output. text  
latency_ms Duration in ms. int  
error_code Error if any. varchar(32), NULLABLE  
created_at When test executed. timestamptz, default now()  

Table llm_usage_daily

(Materialised daily aggregate for dashboard & billing.)

Name Description Constraints Notes
usage_date Calendar date (UTC). PK, date Part of composite PK.
tenant_id Tenant owner. PK, uuid  
screen_id Screen source. PK, uuid  
llm_provider Provider. varchar(32)  
prompt_tokens Sum of prompt tokens. bigint  
completion_tokens Sum of completion tokens. bigint  
cost_usd Total cost that day. numeric(12,4)  
updated_at Last roll-up time. timestamptz Populated by nightly ETL.

Enumerated Types

CREATE TYPE screen_status  AS ENUM ('draft','staged','live','archived');
CREATE TYPE version_status AS ENUM ('draft','staged','live','rolled_back');
CREATE TYPE prompt_type    AS ENUM ('text','list');

General Notes
  • Naming – Snake_case, singular table names (llm_screen_versions) to match project conventions.

  • Indices – In addition to PK/FK indexes, create GIN on input_payload and output_text (for GDPR-driven deletion searches) and BRIN on created_at for logs.

  • PII / Compliance – Encrypt any direct user text stored in input_payload & output_text with PGP-sym-encrypted columns or field-level AES.

  • Partitioningllm_query_logs and llm_test_logs should be time-partitioned (monthly) to cap index bloat.

  • Row-level Security – Enable RLS on every table keyed by tenant_id plus role checks for System Admin overrides.

This data model captures every entity needed to configure, deploy, override, audit, and bill for LLM-assisted screens in Baseplate.

 

Risks and Mitigations

Risk Impact Mitigation
Leakage of private user / company data to third-party models High – legal/compliance breach (GDPR, SOC-2) Server-side redaction & tokenisation of PII fields.   Ensure that all LLMs have store data inactive for all calls.
Unchecked token usage driving runaway costs High – unexpected cloud bill Per-tenant daily quotas and basic real-time cost reporting.
Prompt or model changed in production without QA Medium–High – degraded UX or brand tone Version gating (Draft → Staged → Live), mandatory test run before Deploy, and 1-click rollback with automatic re-promotion of prior version.
Provider API deprecation / breaking change Medium – sudden outage Abstract provider SDK behind adapter layer; nightly canary test hitting each model/version; fail-over to secondary model if canary fails.
Rate-limit errors or high latency under load Medium – user frustration, lost trust Local response cache for identical prompts < 1 h, exponential back-off with 3 retries, pooled connections, and multi-provider fallback when 429/5xx exceeds threshold.
Unauthorized edits or data access (RBAC gaps) High – security breach Fine-grained role scopes, row-level security in PostgreSQL, and signed audit trail of every create/update/delete with immutable storage (WORM).
Offensive / biased output harming user or brand Medium – reputational damage Post-generation content moderation filter (OpenAI / AWS Comprehend), allow tenant-level blocklists, and capture user ratings to retrain prompts.
Over-retention of query logs containing user text Medium – storage cost & privacy risk Partition logs monthly, purge/aggregate after 13 months by default (configurable), and provide “forget me” API for GDPR erasure.
Slow initial load of registry/config screens Medium – lowers adoption 5-minute Redis cache for provider catalog & variable lists, code-split wizard sections, and defer heavy charts until tab is visible.
Poor prompt quality lowering completion rates Medium – feature seen as unreliable Built-in A/B test harness, star-rating widget feeding prompt analytics, and periodic prompt review SLAs for product owners.