Project — Future Robot Software

Nextmost — LLM Evaluation & Benchmarking Platform

Highlights

Evaluates multiple LLMs across simulated personas
Dashboard + command-driven console MVPs
Structured scoring and analytics for product messaging
Full AWS deployment with secure delivery
Backend architecture built around LLM prompt orchestration

Summary

Nextmost evaluates how different Large Language Models describe, position, and recommend products for simulated customer personas. It replaces intuition-based messaging decisions with analytics and structured scoring — guiding strategy and identifying which models perform best in specific contexts.

Client

Armando Kirwin, creative technologist with projects showcased at Cannes Film Festival, Venice Biennale, and SXSW. Co-founder of Artie (acquired 2024).

Scope & Key Deliverables

Designed and developed two MVP versions to validate core capabilities
Dashboard (v1) — Interactive analytics for persona-based evaluation
Console Interface (v2) — Command-driven web app for rapid simulation
Backend architecture for prompt orchestration and scoring
Structured PostgreSQL schema for analytics and comparison
Secure authentication and environment-based configuration
Full AWS deployment with HTTPS, logging and automation

Tech Stack

Frontend (v1): Next.js, Tremor UI, ECharts, Chart.js
Frontend (v2): Static console web client (HTML/CSS/JS)
Backend: FastAPI, Uvicorn, OpenRouter API
Database: PostgreSQL (structured analytics schema)
Infra: AWS EC2 + Nginx + Certbot (SSL) + systemd

Technical Overview

Nextmost unifies LLM orchestration and analytics visualization into a single simulation platform. FastAPI manages persona definitions, prompt templates, and multi-model requests via OpenRouter. Simulation outputs flow into PostgreSQL, powering visual comparisons and strategy insights.

Two complementary frontends connect to the same API — a Next.js dashboard for visual analysis, and a lightweight console for rapid experimentation. Deployed over AWS EC2 with secure HTTPS, automated services, access control, and monitoring — ensuring stability during R&D scaling.

Links

Dashboard Demo

Console Demo