Too Long; Didn't Read
This article presents benchmark results for assessing the empathetic capabilities of generative AI models using psychological and purpose-built measures. The tests include TAS-20, EQ-60, SQ-R, and IRI. The measure AEQ (Applied Empathy Quotient) was introduced. Most raw LLMs struggle to connect empathetically with users due to their balanced empathetic and systemized thinking capabilities. The closed model Willow demonstrates the highest empathetic capacity, while ChatGPT does not stand out significantly among other LLMs. Claude v3 Opus showed a decline in empathetic ability compared to its previous version. More specialized tests need to be developed.
@anywhichway
Simon Y. Blackwell
Working in the clouds around Seattle on open source projects. Sailing when it's clear.
Receive Stories from @anywhichway
RELATED STORIES
L O A D I N G
. . . comments & more!