Back to work
Vertical voice AI·Live · IDR pricing tiers active · early traction

beeready

A voice-first AI interview coach for Indonesian scholarship (LPDP, Chevening) and civil-service (CPNS) candidates. GPT-4o + ElevenLabs, scoring against official rubrics.

In one sentenceVoice AI interview coach scoring candidates against official LPDP, IELTS, and TOEFL rubrics. Chose GPT-4o over Claude because following a 40-row rubric without drifting is the product, not open-ended writing. Live with three IDR pricing tiers.

beeready.devSolo PM + builder, product, evals design, pricing

Stack

GPT-4o (evaluation + coaching)ElevenLabs (realtime voice)Next.js + VercelRubric-based scoring engineStripe / local payment rails

The call I'd own

GPT-4o over Claude for evaluation. Claude 3.5/4 wins on open-ended writing; GPT-4o wins on following a 40-row rubric without drifting, which is the thing the product sells. The rubric is the product, not the prose.

The problem

LPDP (Indonesia's largest scholarship), Chevening, IELTS/TOEFL, and CPNS (civil service) interviews are high-stakes moments for Indonesian applicants. Practice is expensive, inconsistent, and almost never benchmarked against the actual rubrics real evaluators use. Candidates rehearse with friends and hope.

What it does

A realtime voice interview with 4 distinct AI evaluator personas across interview phases (warm-up, behavioral, technical/situational, closing).

Scoring happens against official rubrics: LPDP band descriptors, IELTS band descriptors, TOEFL speaking rubrics. Output dimensions: Communication, Problem-solving, Leadership, with specific evidence pulled from the session transcript.

For professional interviews, a CV-to-JD gap analysis seeds the questions so practice matches the role, not a generic template.

PM decisions I'm proud of

Rubric-first, not LLM-first. The scoring module ingests the actual official rubric as context and has the model produce a score with citations from the transcript. This is the difference between "fuzzy encouragement" and "actionable feedback", and it's what lets us charge.

GPT-4o over Claude for evaluation. I chose GPT-4o specifically because structured-output reliability and instruction-following under rubric constraints tested better on our golden set. Claude 3.5/4 wins on open-ended writing; GPT-4o wins on "follow this 40-row rubric without drifting."

ElevenLabs over OpenAI realtime. Voice is the product. ElevenLabs' emotional range on Indonesian + English code-switching beats alternatives for the local market, and code-switching is the default in real LPDP interviews.

Pricing in IDR with granularity. Rp 10K single session is lower than a coffee. Rp 28K pack nudges toward commitment. Rp 150K monthly locks in power users. The ladder is designed for the psychology of exam-prep spend, not SaaS.

Tradeoffs I'd revisit

Voice latency is the single biggest UX lever; turn-taking under 800ms is what makes it feel like a real interview. Getting there meant carefully sequencing TTS chunks and accepting slightly worse-quality prosody at the boundaries.

Calibrating the rubric scoring against actual human graders is the next milestone. Right now I sanity-check against my own graded set; scaling this needs partnerships with LPDP-prep tutors.

Want to talk about beeready?

Currently taking conversations about AI PM and founding PM roles in the UK, Singapore, and Indonesia. Remote also works. Fastest reply is email.