A good model choice is a tradeoff, not a leaderboard pick. Start with the job to be done, then compare only the models that can actually satisfy it.
Compare with the same token assumptions
Do not treat unknown prices as zero
Verify the official source before production
Define the job
Classify the workload as chat, coding, extraction, summarization, RAG, vision, or agentic automation. Each category weights cost and quality differently.
Set hard constraints
Context window, output limit, JSON mode, tools, vision, and provider route can eliminate otherwise attractive models.
Compare cost after capability
Once the model can satisfy the job, compare scenario cost across the same request volume and token assumptions.
