Article | REF: R727 V1

Artificial intelligence evaluation

Author: Guillaume AVRIN

Publication date: February 10, 2023

You do not have access to this resource.
Click here to request your free trial access!

Already subscribed? Log in!

Overview

Français

ABSTRACT

Artificial intelligence (AI) is rapidly growing, questioning all audiences, individual, professional, academic. Rational and shared principles and practices to measure the performance and limits of intelligent systems have to be set up.

A methodical approach that complies with the rules of metrology allows us to draw the broad outlines: metrics to carry out quantitative and repeatable performance measurements, physical and virtual testing environments to perform reproducible experiments that are representative of the real operating conditions of the AI being evaluated, and organizational tools (benchmarking, challenges, competitions) that meet the needs of the entire ecosystem.

Read this article from a comprehensive knowledge base, updated and supplemented with articles reviewed by scientific committees.

Read the article

AUTHOR

Guillaume AVRIN: Head of AI Evaluation Department - Laboratoire national de métrologie et d'essais, Paris, France

INTRODUCTION

Since 2017, artificial intelligence (AI) has seen major developments in many professional sectors (diagnostic assistance, biometric identification, chatbots, detection of vulnerabilities and cybersecurity threats, collaborative industrial robots, inspection and maintenance robots, autonomous mobility systems, etc.) and at home (personal assistance robots, medical devices, personal assistants, etc.). It is therefore one of the top European and international priorities for technological and industrial development and the health breakthrough of 2020 contributes to this transformation towards a more "virtualized" society, less exposed to biological vulnerabilities.

To ensure that the market is not driven solely by supply, and that the conditions are in place for matching supply with demand, scientific and technical methods are needed to evaluate AI . This promises to provide reliable, quantitative results concerning the levels of performance, robustness and explainability achieved by different AI systems. This will provide end-users with the guarantees that determine the acceptability of these technologies. They will be able to choose between different existing solutions thanks to objective and unambiguous common references. Developers, for their part, will benefit from benchmarks to guide their R&D and quality control efforts, as well as tools to demonstrate their lead and stand out from the competition. Evaluation will therefore build the confidence needed to make the transition from developing AI to marketable AI.

Standardization work is underway to adapt existing software development standards (IEC 62304 for medical devices, ISO 26262 for road vehicles, etc.) to the specificities of AI (notably Cen-Cenelec JTC21 and ISO/IEC JTC1/SC42).

This work will focus in particular on assessment tools and methods, of which two generic approaches can be distinguished (cf. ISO/IEC 17011): auditing and testing. Audits consist in analyzing verifiable evidence of compliance, whether qualitative or quantitative, such as records, statements of fact and so on. The implementation of audits for AI is similar to that for other products and technologies. For example, LNE has proposed a certification standard for AI feature development processes,...

You do not have access to this resource.

Exclusive to subscribers. 97% yet to be discovered!

You do not have access to this resource.
Click here to request your free trial access!

Already subscribed? Log in!

The Ultimate Scientific and Technical Reference

A Comprehensive Knowledge Base, with over 1,200 authors and 100 scientific advisors

+ More than 10,000 articles and 1,000 how-to sheets, over 800 new or updated articles every year

From design to prototyping, right through to industrialization, the reference for securing the development of your industrial projects

KEYWORDS

CAN BE ALSO FOUND IN:

Home Industrial engineering Industry of the future Artificial intelligence evaluation

Home Innovations Technological innovations Artificial intelligence evaluation

Home Measurements - Analysis Instrumentation and measurement methods Artificial intelligence evaluation

This article is included in

Software technologies and System architectures

This offer includes:

Knowledge Base

Updated and enriched with articles validated by our scientific committees

Services

A set of exclusive tools to complement the resources

Practical Path

Operational and didactic, to guarantee the acquisition of transversal skills

Doc & Quiz

Interactive articles with quizzes, for constructive reading

Subscribe now!

Ongoing reading
Evaluating artificial intelligence

General principle of AI evaluation

Bibliography

(1) - EUROPÉENNE (C.) - Intelligence artificielle – Une approche européenne axée sur l'excellence et la confiance - (2020).
(2) - TEAM (A.P.) - Artificial Intelligence Measurement and Evaluation at the National Institute of Standards and Technology -...

Standards and norms

BIPM: International Vocabulary of Metrology – Fundamental and general concepts and associated terms (VIM) 3rd edition - JCGM 200 - 2012

https://www.bipm.org/utils/common/documents/jcgm/JCGM_200_2012.pdf

Events

METRICS project (2020-2023, funded by H2020) – Metrological evaluation and testing of robots in international competitions

The aim is to organize intelligent robot competitions in four fields: healthcare, agri-food, infrastructure inspection and maintenance, and agile production. In particular, the aim is to build a permanent structure bringing together all European skills to jointly provide a satisfactory...

Artificial intelligence evaluation

Bibliography

Standards and norms

Events

Directory