Holistic Agent Leaderboard: The Missing Infrastructure for AI Agent Evaluation

2 hours ago 1

%PDF-1.7 %???? 1 0 obj << /Metadata 3 0 R /Names 4 0 R /OpenAction 5 0 R /Outlines 6 0 R /PageMode /UseOutlines /Pages 7 0 R /Type /Catalog >> endobj 2 0 obj << /Author (Sayash Kapoor; Benedikt Stroebl; Peter Kirgis; Nitya Nadgir; Zachary S Siegel; Boyi Wei; Tianci Xue; Ziru Chen; Felix Chen; Saiteja Utpala; Franck Ndzomga; Dheeraj Oruganty; Sophie Luskin; Kangheng Liu; Botao Yu; Amit Arora; Dongyoon Hahm; Harsh Trivedi; Huan Sun; Juyong Lee; Tengjun Jin; Yifan Mai; Yifei Zhou; Yuxuan Zhu; Rishi Bommasani; Daniel Kang; Dawn Song; Peter Henderson; Yu Su; Percy Liang; Arvind Narayanan) /Creator (arXiv GenPDF \(tex2pdf:e76afa9\)) /DOI (https://doi.org/10.48550/arXiv.2510.11977) /License (http://arxiv.org/licenses/nonexclusive-distrib/1.0/) /PTEX.Fullbanner (This is pdfTeX, Version 3.141592653-2.6-1.40.28 \(TeX Live 2025\) kpathsea version 6.4.1) /Producer (pikepdf 8.15.1) /Title (Holistic Agent Leaderboard: The Missing Infrastructure for AI Agent Evaluation) /Trapped /False /arXivID (https://arxiv.org/abs/2510.11977v1) >> endobj 3 0 obj << /Subtype /XML /Type /Metadata /Length 2464 >> stream <?xpacket begin="" id="W5M0MpCehiHzreSzNTczkc9d"?>

  • Holistic Agent Leaderboard: The Missing Infrastructure for AI Agent Evaluation
  • Sayash Kapoor
  • Benedikt Stroebl
  • Peter Kirgis
  • Nitya Nadgir
  • Zachary S Siegel
  • Boyi Wei
  • Tianci Xue
  • Ziru Chen
  • Felix Chen
  • Saiteja Utpala
  • Franck Ndzomga
  • Dheeraj Oruganty
  • Sophie Luskin
  • Kangheng Liu
  • Botao Yu
  • Amit Arora
  • Dongyoon Hahm
  • Harsh Trivedi
  • Huan Sun
  • Juyong Lee
  • Tengjun Jin
  • Yifan Mai
  • Yifei Zhou
  • Yuxuan Zhu
  • Rishi Bommasani
  • Daniel Kang
  • Dawn Song
  • Peter Henderson
  • Yu Su
  • Percy Liang
  • Arvind Narayanan
  • http://arxiv.org/licenses/nonexclusive-distrib/1.0/
  • cs.AI
  • cs.CL
  • <?xpacket end="w"?> endstream endobj 4 0 obj << /Dests 8 0 R >> endobj 5 0 obj << /D [ 9 0 R /Fit ] /S /GoTo >> endobj 6 0 obj << /Count 15 /First 10 0 R /Last 11 0 R /Type /Outlines >> endobj 7 0 obj << /Count 66 /Kids [ 12 0 R 13 0 R ] /Type /Pages >> endobj 8 0 obj << /Kids [ 14 0 R 15 0 R ] /Limits [ (Doc-Start) (table.caption.99) ] >> endobj 9 0 obj << /Annots [ 16 0 R 17 0 R ] /Contents [ 18 0 R 19 0 R 20 0 R 21 0 R ] /MediaBox [ 0 0 612 792 ] /Parent 22 0 R /Resources 23 0 R /Type /Page >> endobj 10 0 obj << /A 24 0 R /Next 25 0 R /Parent 6 0 R /Title 26 0 R >> endobj 11 0 obj << /A 27 0 R /Count -10 /First 28 0 R /Last 29 0 R /Parent 6 0 R /Prev 30 0 R /Title 31 0 R >> endobj 12 0 obj << /Count 36 /Kids [ 22 0 R 32 0 R 33 0 R 34 0 R 35 0 R 36 0 R ] /Parent 7 0 R /Type /Pages >> endobj 13 0 obj << /Count 30 /Kids [ 37 0 R 38 0 R 39 0 R 40 0 R 41 0 R ] /Parent 7 0 R /Type /Pages >> endobj 14 0 obj << /Kids [ 42 0 R 43 0 R 44 0 R 45 0 R 46 0 R 47 0 R ] /Limits [ (Doc-Start) (page.45) ] >> endobj 15 0 obj << /Kids [ 48 0 R 49 0 R 50 0 R 51 0 R ] /Limits [ (page.46) (table.caption.99) ] >> endobj 16 0 obj << /A << /D (figure.caption.1) /S /GoTo >> /Border [ 0 0 1 ] /C [ 1 0 0 ] /H /I /Rect [ 399.402 515.14 406.376 526.084 ] /Subtype /Link /Type /Annot >> endobj 17 0 obj << /A << /S /URI /URI (https://arxiv.org/abs/2510.11977v1) >> /BS << /W 0 >> /NM (fitz-L0) /Rect [ 12 225.46997 32 566.53 ] /Subtype /Link >> endobj 18 0 obj << /Length 10 /Filter /FlateDecode >> stream x?+? ? | endstream endobj 19 0 obj << /Filter /FlateDecode /Length 2491 >> stream xڅ?v?F??_?sv????ҧ??8N?8?Xi?q?0&G??G?Ŏ?? E)t?E???km,?z}r?:??"???b??Vk?sS+?b'?Ⱥ???l?h??߫?'?V'_N<`?Z??4??|{r??k?xk?N??֣!?ZA?eݜ?v?L?l??%??E!焁g??' ˇ?%??I?@??e???س??׷ߘ?YM /x??1?K???g?6l^??w??g??????????`~? t?S????#??Qaf??DŽN?>b???&?????~D&[? ='DC?D??k?)?n~gp`ߛ?}?????9?׆??D? ??Y????ɽo????M?$dF??8p?6???4&?y?`?????" صW??G???|I???}=??>Y#'?2kx?????$ڒ2?W?Ӻ??6B>?N?)nL *?? `L??5?b&?]N??e-u?%?k`-?*s??L3'?_b?g,??k?N6,??M??Q?b?!?ZuO?pע? ???D^?扏??%7?b?s????"?l??B???s????K??{Q?ڈ첰 {???S??N?h?P??̂~?v???h?{֠????󲔲? ???7??~zW*ɵ?o?աtH??p??????Avơ7???J???WTl????hj?(??$~?$(?9ꗟ??????u`???" {'1?pUH??Uz'ڲ? ?s??U?_ ?HQ-ь?? ?u???֍?;e 4a?]?a? 3???[??Ե[?\:???>?;?^?E ?~DՋN?%??_#t=?d??-xTy"8????䐼?+??1??h??nU?v?!???fCۉ?u?[? Uj?>eU?jt??M?'? ????_n?^?M??? D?cWu????uK? ?h2Wi8̕??Ε??^?]?iD?D?(?F?-K?J??m ?%^8J?t[qo?ڼp V????S?a?>?=??"S?F>?( SҮ]?ѧ???Y?xR?0??}?%t?u??C"??69h?5??pA???\??l4"B?ny]B?4??/????]{?.? ??&??g9?8F? ?۩-Z??8??(%?AkBB1o?X?J??/5Hm?2H???Ya?B???i? ?eF? ?+P%?3n ????>?ӗ?\??ST?J\=??V-ڝ?k ?C?j?o??r]>'ϝ??r+???Tw?9I2?W?X??t?????3?=??%??&?K?8???U???7sLeKI??~ª?Z????ŧoL?9????????x`?;"?????C?̦????((?C ?????3ƈo????????#???lmG@?Hs?{?;tO]?g?0B??j(????]?3??$*?}?kT?a?)Z?);????m??x?B\??_15? k.??F??LMG??s??????`4d辽???m& ??U?u?nT?h?'?{o?\ٷ*US??d?)????.\???-???????,??9',?Q???uN???!䎍?t?b"?F???;?c??͏?@d??G?&y?f?@85$???Mp#,?/??v]??„? ???6E&?????OL??Y??a???9??+???:^C?hi?^???\7???;??ǝf??E??ow o?p?RT/???|D?Sx;?d? ?Nt?+8PS5????L?=?@T??"???YG ???)r?7޳?ΤR@?7??a猌?FPw????"m???ĥz???l???C???eU??QC5]??ƹ)\*_HS ?層c3s]{??5? C?J1f?@???S?@&????I?V?W=?Zw2? l????{?pm??9??|G=(???????0?Za5᫁?.?*??z??ZT۪?9?v?Z?_?‰?ln?@Y̷'_N/p???'K??0??I????l=?>?m??8?_2?????G?n;ad????9NƠ??M&/?C?D?c?wes?82????/?gW??h???֙??MLy?L?:i????????? )$??Y??gV?:^?ԟH?W_???Ǔ?C?+L?n??1-?4 ??????=w\?I??R#?`-?@#wB|k\?Zi8E٭y.?? ^????f~??0B??@??3R??4?BH?Y ?0?????Jͯ 4

    Read Entire Article