Testing the Depths of AI Empathy: Q1 2024 Benchmarks

Too Long; Didn't Read

This article presents benchmark results for assessing the empathetic capabilities of generative AI models using psychological and purpose-built measures. The tests include TAS-20, EQ-60, SQ-R, and IRI. The measure AEQ (Applied Empathy Quotient) was introduced. Most raw LLMs struggle to connect empathetically with users due to their balanced empathetic and systemized thinking capabilities. The closed model Willow demonstrates the highest empathetic capacity, while ChatGPT does not stand out significantly among other LLMs. Claude v3 Opus showed a decline in empathetic ability compared to its previous version. More specialized tests need to be developed.

featured image - Testing the Depths of AI Empathy: Q1 2024 Benchmarks

Simon Y. Blackwell HackerNoon profile picture

@anywhichway

Simon Y. Blackwell

Working in the clouds around Seattle on open source projects. Sailing when it's clear.

Receive Stories from @anywhichway

react to story with heart

Too Long; Didn't Read

@anywhichway

RELATED STORIES