Vertical voice AI·Live · IDR pricing tiers active · early traction

beeready

A voice-first AI interview coach for Indonesian scholarship (LPDP, Chevening) and civil-service (CPNS) candidates. GPT-4o + ElevenLabs, scoring against official rubrics.

In one sentenceVoice AI interview coach scoring candidates against official LPDP, IELTS, and TOEFL rubrics. Chose GPT-4o over Claude because following a 40-row rubric without drifting is the product, not open-ended writing. Live with three IDR pricing tiers.

beeready.devSolo PM + builder, product, evals design, pricing

Stack

GPT-4o (evaluation + coaching)ElevenLabs (realtime voice)Next.js + VercelRubric-based scoring engineStripe / local payment rails

The call I'd own

GPT-4o over Claude for evaluation. Claude 3.5/4 wins on open-ended writing; GPT-4o wins on following a 40-row rubric without drifting, which is the thing the product sells. The rubric is the product, not the prose.

The problem

LPDP (Indonesia's largest scholarship), Chevening, IELTS/TOEFL, and CPNS (civil service) interviews are high-stakes moments for Indonesian applicants. Practice is expensive, inconsistent, and almost never benchmarked against the actual rubrics real evaluators use. Candidates rehearse with friends and hope.

What it does

A realtime voice interview with 4 distinct AI evaluator personas across interview phases (warm-up, behavioral, technical/situational, closing).

Scoring happens against official rubrics: LPDP band descriptors, IELTS band descriptors, TOEFL speaking rubrics. Output dimensions: Communication, Problem-solving, Leadership, with specific evidence pulled from the session transcript.

For professional interviews, a CV-to-JD gap analysis seeds the questions so practice matches the role, not a generic template.

PM decisions I'm proud of

Rubric-first, not LLM-first. The scoring module ingests the actual official rubric as context and has the model produce a score with citations from the transcript. This is the difference between "fuzzy encouragement" and "actionable feedback", and it's what lets us charge.

GPT-4o over Claude for evaluation. I chose GPT-4o specifically because structured-output reliability and instruction-following under rubric constraints tested better on our golden set. Claude 3.5/4 wins on open-ended writing; GPT-4o wins on "follow this 40-row rubric without drifting."

ElevenLabs over OpenAI realtime. Voice is the product. ElevenLabs' emotional range on Indonesian + English code-switching beats alternatives for the local market, and code-switching is the default in real LPDP interviews.

Pricing in IDR with granularity. Rp 10K single session is lower than a coffee. Rp 28K pack nudges toward commitment. Rp 150K monthly locks in power users. The ladder is designed for the psychology of exam-prep spend, not SaaS.

Tradeoffs I'd revisit

Voice latency is the single biggest UX lever; turn-taking under 800ms is what makes it feel like a real interview. Getting there meant carefully sequencing TTS chunks and accepting slightly worse-quality prosody at the boundaries.

Calibrating the rubric scoring against actual human graders is the next milestone. Right now I sanity-check against my own graded set; scaling this needs partnerships with LPDP-prep tutors.

Want to talk about beeready?

Currently taking conversations about AI PM and founding PM roles in the UK, Singapore, and Indonesia. Remote also works. Fastest reply is email.

simatupang.ega@gmail.com LinkedIn