How well can LLMs solve chess puzzles?

httpjames@sh.itjust.works · 8 months ago

How well can LLMs solve chess puzzles?

Blóðbók@slrpnk.net · 8 months ago

Yeah, I don’t know why anyone knowledgeable would expect them to be good at chess. LLMs don’t generalise, reason or spot patterns, so unless they read a chess book where the problems came from…

Model	Solved	Solved %	Illegal Moves	Illegal Moves %	Adjusted Elo
gpt-4-turbo-preview	229	22.9%	163	16.3%	1144
gpt-4	195	19.5%	183	18.3%	1047
claude-3-opus-20240229	72	7.2%	464	46.4%	521
claude-3-haiku-20240307	38	3.8%	590	59.0%	363
claude-3-sonnet-20240229	23	2.3%	663	66.3%	286
gpt-3.5-turbo	23	2.3%	683	68.3%	269
claude-instant-1.2	10	1.0%	707	66.3%	245
mistral-large-latest	4	0.4%	813	81.3%	149
mixtral-8x7b	9	0.9%	832	83.2%	136
gemini-1.5-pro-latest*	FAIL	-	-	-	-

How well can LLMs solve chess puzzles?

How well can LLMs solve chess puzzles?

GitHub - kagisearch/llm-chess-puzzles: Benchmark LLM reasoning capability by solving chess puzzles.