GSM8K Benchmark

Category - GSM8K Benchmark: Grade-school math word problems for LLMs (8.5k train, ~1.3k test). Evaluates multi-step reasoning via exact match; includes chain-of-thought prompting, self-consistency, and tool use baselines.

  • 3 posts with this tag
Great! You've successfully subscribed.
Great! Next, complete checkout for full access.
Welcome back! You've successfully signed in.
Success! Your account is fully activated, you now have access to all content.