Overview
ABSTRACT
Artificial intelligence (AI) is rapidly growing, questioning all audiences, individual, professional, academic. Rational and shared principles and practices to measure the performance and limits of intelligent systems have to be set up.
A methodical approach that complies with the rules of metrology allows us to draw the broad outlines: metrics to carry out quantitative and repeatable performance measurements, physical and virtual testing environments to perform reproducible experiments that are representative of the real operating conditions of the AI being evaluated, and organizational tools (benchmarking, challenges, competitions) that meet the needs of the entire ecosystem.
Read this article from a comprehensive knowledge base, updated and supplemented with articles reviewed by scientific committees.
Read the articleAUTHOR
-
Guillaume AVRIN: Head of AI Evaluation Department - Laboratoire national de métrologie et d'essais, Paris, France
INTRODUCTION
Since 2017, artificial intelligence (AI) has seen major developments in many professional sectors (diagnostic assistance, biometric identification, chatbots, detection of vulnerabilities and cybersecurity threats, collaborative industrial robots, inspection and maintenance robots, autonomous mobility systems, etc.) and at home (personal assistance robots, medical devices, personal assistants, etc.). It is therefore one of the top European and international priorities for technological and industrial development and the health breakthrough of 2020 contributes to this transformation towards a more "virtualized" society, less exposed to biological vulnerabilities.
To ensure that the market is not driven solely by supply, and that the conditions are in place for matching supply with demand, scientific and technical methods are needed to evaluate AI . This promises to provide reliable, quantitative results concerning the levels of performance, robustness and explainability achieved by different AI systems. This will provide end-users with the guarantees that determine the acceptability of these technologies. They will be able to choose between different existing solutions thanks to objective and unambiguous common references. Developers, for their part, will benefit from benchmarks to guide their R&D and quality control efforts, as well as tools to demonstrate their lead and stand out from the competition. Evaluation will therefore build the confidence needed to make the transition from developing AI to marketable AI.
Standardization work is underway to adapt existing software development standards (IEC 62304 for medical devices, ISO 26262 for road vehicles, etc.) to the specificities of AI (notably Cen-Cenelec JTC21 and ISO/IEC JTC1/SC42).
This work will focus in particular on assessment tools and methods, of which two generic approaches can be distinguished (cf. ISO/IEC 17011): auditing and testing. Audits consist in analyzing verifiable evidence of compliance, whether qualitative or quantitative, such as records, statements of fact and so on. The implementation of audits for AI is similar to that for other products and technologies. For example, LNE has proposed a certification standard for AI feature development processes,...
Exclusive to subscribers. 97% yet to be discovered!
You do not have access to this resource.
Click here to request your free trial access!
Already subscribed? Log in!
The Ultimate Scientific and Technical Reference
KEYWORDS
performance | metrology | artificial intelligence | Metrics | experiment | test | AI
This article is included in
Instrumentation and measurement methods
This offer includes:
Knowledge Base
Updated and enriched with articles validated by our scientific committees
Services
A set of exclusive tools to complement the resources
Practical Path
Operational and didactic, to guarantee the acquisition of transversal skills
Doc & Quiz
Interactive articles with quizzes, for constructive reading
Evaluating artificial intelligence
Bibliography
Standards and norms
- BIPM: International Vocabulary of Metrology – Fundamental and general concepts and associated terms (VIM) 3rd edition - JCGM 200 - 2012
https://www.bipm.org/utils/common/documents/jcgm/JCGM_200_2012.pdf
Events
METRICS project (2020-2023, funded by H2020) – Metrological evaluation and testing of robots in international competitions
The aim is to organize intelligent robot competitions in four fields: healthcare, agri-food, infrastructure inspection and maintenance, and agile production. In particular, the aim is to build a permanent structure bringing together all European skills to jointly provide a satisfactory...
Exclusive to subscribers. 97% yet to be discovered!
You do not have access to this resource.
Click here to request your free trial access!
Already subscribed? Log in!
The Ultimate Scientific and Technical Reference