Home
Projects
Publications
People
Join the Lab
Contact
Login
Benchmark
GameTraversalBenchmark: Evaluating Planning Abilities Of Large Language Models Through Traversing 2D Game Maps
Large language models (LLMs) have recently demonstrated great success in generating and understanding natural language. While they have …
Muhammad Umair Nasir
,
Steven James
,
Julian Togelius
PDF
Cite
Code
MinePlanner: A Benchmark for Long-Horizon Planning in Large Minecraft Worlds
We propose a new benchmark for planning tasks based on the Minecraft game. Our benchmark contains 45 tasks overall, but also provides …
William Hill
,
Ireton Liu
,
Anita De Mello Koch
,
Damion Harvey
,
Nishanth Kumar
,
George Konidaris
,
Steven James
PDF
Cite
MiDaS: A Large-Scale Minecraft Dataset for Non-Natural Image Benchmarking
Reinforcement learning (RL) has recently made several significant advances using video games as a testbed. While many of these games …
David Torpey
,
Max Parkin
,
Jonah Alter
,
Richard Klein
,
Steven James
PDF
Cite
DOI
Constructing a Visual Dataset to Study the Effects of Spatial Apartheid in South Africa
Aerial images of neighborhoods in South Africa show the clear legacy of Apartheid, a former policy of political and economic …
Raesetje Sefala
,
Timnit Gebru
,
Luzango Mfupe
,
Nyalleng Moorosi
,
Richard Klein
PDF
Cite
Cite
×