The case of the missing datasets
In late June 2021, a mini-scandal, documented by Ewen Callaway in Nature, erupted regarding a dataset of Covid-19 genetic sequences that had been "mysteriously deleted”. Researchers at Wuhan University first published a preprint of their article “Nanopore target sequencing for accurate and comprehensive detection of SARS-CoV-2 and other respiratory viruses” in MedXiv. The full article was published in the OA journal Small. Missing, apparently at the request of the authors, was some data on the SARS-CoV-2 genome sequences. Combine the suggestion of scandal with the nerd appeal of issues regarding information management and retention, and what self-respecting librarian isn't going to get curious?
In the case of the mysteriously deleted data, it appears that a spreadsheet of genetic sequences was submitted for posting in the Sequence Read Archive (SRA), a public repository for DNA sequencing managed by the (U.S.) National Library of Medicine. Then, three months later, the investigator who had submitted the data asked that it be withdrawn; ostensibly, the sequences were being updated and the information would be submitted to another repository at a later date. (Interestingly, a different researcher, at the Fred Hutchinson Cancer Research Center in Seattle, Washington, was able to recover some of the withdrawn sequences by searching Google Cloud for back-up copies of SRA data.)
Becoming research literate
Whether or not there were sinister motives behind the withdrawal of the dataset, it got me thinking about the concerns of information professionals—information literacy, information management, copyright, and licensing of information—and our responsibility in raising the information consciousness of our clients and stakeholders. Coincidentally, a colleague recently asked me to help brainstorm an enticing title for a presentation she was giving on information literacy. We realized that, while that phrase is both meaningful and value-connoting to us info pros, most "civilians" are going to recognize neither what information literacy entails nor that they lack information literacy.
The news coverage of the deleted Covid-19 dataset could provide an opportunity to talk with library users about a number of issues:
* How OA journals work and why libraries still incur high licensing fees for digital content
* How repositories of open data are built and maintained, and what kinds of information are available
* How data that one might think has been permanently removed from a repository may still be accessible with minimal sleuthing
None of these concepts are foreign to info pros, but we sometimes forget that they are not intuitive to most library users. Most folks are not thinking about information architecture, data retention policies, FAIR (findable, accessible, interoperable and reusable) data principles, or intellectual property concerns when they are looking for information, bless their hearts.
Unfortunately, that means that they lack understanding of what's behind the information they retrieve. One consequence is that non-info pros believe that they are all above average searchers. Their Google searches always turn up "relevant" material—which often translates to their confirmation bias leading them to browse and read only those results that support their existing assumptions—so they must be skilled researchers.
Challenging the notion of research
This is one reason why I no longer describe what I do as "research;" I have heard too many people claim that their "research" has informed them that the Covid vaccine alters their DNA, that humans never landed on the moon, or that the world is flat. This is particularly important as I find myself using open web (that is, Google-able) content to address a client's information needs. Yes, I can provide a URL for each of the resources I identified, but that does not necessarily mean that my client could have Googled/"researched" the topic himself.
This highlights the need for info pros to have more conversations about what goes into real research. To most people, the research process includes (1) having a question, (2) typing or speaking the question to Google, and (3) reviewing the first five or six results.
Search professionals take similar steps, but each step is far more nuanced. We start with the client's question and then conduct a reference interview to elicit the question behind the question, to uncover any hidden assumptions, and to establish the depth and breadth of the information need. We conduct initial research to identify the best approaches, refine our strategy, try another couple of resources, regroup, then fill in whatever is missing. Finally, we organize and distil the results so that they are as frictionless as possible for the client. While the two research processes appear similar on their face, the depth of understanding that info pros bring requires a new way of talking about what we do and how we do it.
My approach now is to focus on outcomes rather than the search process itself when talking about what info pros do. We "bring fresh perspectives to a question" or "support strategic decision making with better information" or "provide access to information you can't find anywhere else." Information literacy initiatives are as critical as ever, but we need to use language that challenges users' search complacency—“Using the Web to Get a Better Job,” or “Black Belt Google Searching.”
Mary Ellen Bates (mbates@BatesInfo.com, Reluctant-Entrepreneur.com) describes herself as a "funny kind of librarian."