AI is often perceived as a ‘black box problem’ - mysteriously complex, poorly understood and feared. It is also sometimes dismissed as pie on the sky; another new-fangled idea that never quite materialises into reality.
AI was the topic of the 2019 UKeiG Members’ Day event featuring three speakers who focused on the impact of AI on the information sector.
Hype or game-changer?
Michael Upshall’s presentation explored the role of AI in digital academic publishing, content enrichment and knowledge identification. Far from being blue sky hype, AI is already a game changer. He articulated the work of the UNSILO project: “Rethinking publishing with AI.” UNSILO goes beyond traditional string matching and keyword extraction using fully automated concept matching to extract meaning and context. The project utilises a mathematical algorithm to analyse a huge corpus of text identifying descriptive “significant phrases” within a document. It creates clusters of concepts and identifies semantic relationships by processing the proximity of words surrounding a term. The word “bridge”, for example, has many meanings and synonymous alternatives. It could allude to a connecting structure, part of a ship, a partial denture or a part of a stringed instrument. The terminology that surrounds it is what imparts context and meaning.
Upshall is using this approach to build “semantic profiles” of scholarly journals, linking the technology to the academic workflow; the complex “wheel of scholarship” (the six areas of activity in the research cycle: Discovery; Analysis; Writing; Publication; Outreach: Assessment), and hundreds of the tools that support the research process.
Eliminating the ambiguity of human language
Machine learning is crucial in scientific publishing where there are currently twenty-four thousand journals and three thousand papers published a day. Manual classification schemes, vocabularies, taxonomies and ontologies have always played an essential role in information retrieval but Upshall argues that they are expensive and fundamentally flawed; reactive not proactive. “They will never be complete. They will never be large enough.” The pervasive controlled vocabulary MeSH (Medical Subject Headings) maps the paradigm of biomedical science - neologisms and synonymous relationships - but humans are required to build and maintain them. This human imposition of terminology can distort context and create an artificial language. Moreover, the multiplicity of controlled vocabularies, ontologies, cataloguing standards frameworks and classification schemes makes interoperability and translation between schemes incredibly difficult. UNSILO, Upshall argues, eliminates the ambiguity of human language. By linking data and analysing the proximity of phrases it enables the disambiguation of problematic terminological conundrums like abbreviations and synonymous phrases. “Semantic enrichment” is the way forward argued Upshall. More controversially, in devil’s advocate mode, he announced: “Why even bother building a taxonomy?”
UNSILO’s approach is far from pie in the sky and already has several practical, real-life uses, enabling, for example, AI to build profiles of specific academics and researchers, journal and article level analytics. A corpus of twenty-eight million abstracts from the PubMed database has been analysed to develop a “Reviewer Finder.” Upshall noted that in 2016 26% of US academics declined requests to peer review papers as they were irrelevant to their research expertise. By identifying the juxtaposition and overlapping of concepts the project has made substantial inroads into improving the peer review workflow, much more easily identifying the most relevant organisations and researchers to submit papers to for review. Similarly, “Journal Analysis” supports the identification of the most relevant journals to publish in. AI is facilitating a much more sophisticated level of data analysis; the notion of “concept curation” way beyond information retrieval based on text. The technology is also capable of translating across subject domains even when there are significant differences and variations in terminology.
Information resource management
David Haynes, City, University of London, presented on the potential role for AI in information resource management arguing that, if anything, ontological/typological models were on the ascendant, and that human intervention was key to the implementation of AI this this area. He agreed that concept management was fundamental and extracting meaning and context from complex linguistic relationships and associations between ideas goes way beyond the traditional mapping of hierarchical associations between words.
The key consideration in David’s discussion was the nebulous nature of AI across the library and information community. Was it synonymous with automation? Was it the replacement of cognitive processing by machines? There is a multiplicity of definitions that change every day, but the consensus is that AI equates to decision making capability utilising iterative systems that can learn and modify their behaviour.
In order to address the impact of AI on information resource management the first step is to articulate the IRM cycle.
- Identifying information needs
- Defining scope
- Collecting resources
- Organising the resources
- Storage and retrieval
- Make resources discoverable
- Feedback and evaluation
- Disposal
In a workshop format Haynes introduced a practical approach to assessing each aspect of the cycle. What role could AI play in each area of activity?
- Could AI enhance human activity in each area or not?
- Could AI replace this human activity?
- Is there anything uniquely human about these activities?