The United States Department of Defense (DOD) and intelligence community require computational systems that can robustly and automatically analyze large amounts of multimedia. These systems must also communicate and cooperate with people to resolve ambiguities and improve performance over time.
However, today’s machine learning approaches result in artificial intelligence (AI) agents that cannot interact with humans through conversation except in limited, specifically designed applications. Current computational paradigms rely on statistical methods and lack sufficiently diverse and representative, annotated data for training to achieve the accuracy required for successful implementation. Moreover, these agents lack the ability to understand concepts, such as the properties and capabilities of objects, which prevents them from being able to handle previously unseen objects, activities, scenes, or entities.
DARPA’s Environment-driven Conceptual Learning (ECOLE) program aims to radically improve these technologies by creating AI agents capable of continually learning from linguistic and visual input. The goal is to enable human-machine collaborative analysis of image, video, and multimedia documents during time-sensitive, mission-critical DOD analytic tasks where reliability and robustness are essential.
“Today’s multimedia analysis systems lack introspection,” said Dr. William Corvey, ECOLE program manager in DARPA’s Information Innovation Office. “Furthermore, symbolic representations as they’ve been constructed in the past simply do not scale. The core innovation in ECOLE will be teaching the AI to learn representations that are faceted and conceptual in nature – such as representations that can be iterated on with a human partner; representations that can be reasoned over; and representations that can be readily generalized.”
The results of ECOLE will be broadly applicable to a range of technology sectors – from the semantic web community, commercial companies that reason over information on the internet, and the robotics industry, to public safety organizations processing images or video for object and activity recognition, and to anyone requiring robust, automatic reasoning over image and video data, as is required by autonomous vehicles, for example.
Previously, Corvey managed an exploratory effort called Grounded Artificial Intelligence Language Acquisition (GAILA) that investigated aspects of human language acquisition in children. As a result, researchers developed technologies that mirrored a child’s approach to learning. ECOLE seeks to expand upon that research area with applications specific to multimedia analysis. The program’s scope will include developing algorithms that can identify, represent, and ground the attributes that form the symbolic and contextual model for a particular object through interactive learning with a human analyst.
“Representing the holistic and extensible representation of multimedia content will require innovation in both unsupervised learning and the elevation of that acquired knowledge to the symbolic level,” said Corvey. “Our goal is to bridge the gap between state-of-the-art symbolic reasoners that rely on manual input of features and fully unsupervised learning, which at present can only perform tasks like course-grained image captioning.”
ECOLE is a four-year program divided into three phases. The first two, 18-month phases will entail fundamental research to make neuro-symbolic scaffolding that will move the field forward across computational analysis of multimedia for all AI application areas. In its 12-month phase three, ECOLE will concentrate on the development of concerns related to geospatial intelligence workflows.
Please see the Broad Agency Announcement for more information.