Nextmost — LLM Evaluation & Benchmarking Platform

Highlights

  • Evaluates multiple LLMs across simulated personas
  • Dashboard + command-driven console MVPs
  • Structured scoring and analytics for product messaging
  • Full AWS deployment with secure delivery
  • Backend architecture built around LLM prompt orchestration

Summary

Nextmost evaluates how different Large Language Models describe, position, and recommend products for simulated customer personas. It replaces intuition-based messaging decisions with analytics and structured scoring — guiding strategy and identifying which models perform best in specific contexts.

Client

Armando Kirwin, creative technologist with projects showcased at Cannes Film Festival, Venice Biennale, and SXSW. Co-founder of Artie (acquired 2024).

Scope & Key Deliverables

  • Designed and developed two MVP versions to validate core capabilities
  • Dashboard (v1) — Interactive analytics for persona-based evaluation
  • Console Interface (v2) — Command-driven web app for rapid simulation
  • Backend architecture for prompt orchestration and scoring
  • Structured PostgreSQL schema for analytics and comparison
  • Secure authentication and environment-based configuration
  • Full AWS deployment with HTTPS, logging and automation

Tech Stack

  • Frontend (v1): Next.js, Tremor UI, ECharts, Chart.js
  • Frontend (v2): Static console web client (HTML/CSS/JS)
  • Backend: FastAPI, Uvicorn, OpenRouter API
  • Database: PostgreSQL (structured analytics schema)
  • Infra: AWS EC2 + Nginx + Certbot (SSL) + systemd

Technical Overview

Nextmost unifies LLM orchestration and analytics visualization into a single simulation platform. FastAPI manages persona definitions, prompt templates, and multi-model requests via OpenRouter. Simulation outputs flow into PostgreSQL, powering visual comparisons and strategy insights.

Two complementary frontends connect to the same API — a Next.js dashboard for visual analysis, and a lightweight console for rapid experimentation. Deployed over AWS EC2 with secure HTTPS, automated services, access control, and monitoring — ensuring stability during R&D scaling.

Links