[Before I was a librarian, before I was an exercise physiologist, I was a minister. I was recently asked, after many years, to serve as a guest preacher one Sunday. I would usually share this on my non-library-related blog, but as the subject came from my day-to-day work in scholarly communications and data, I thought some followers of this blog might find it of interest.
The scripture referenced is Proverbs 1:20-33. The sermon was delivered on September 12, 2021 at First Baptist Church, Worcester, MA. A recording of the service can be found on the Church’s website and/or Facebook page.]

When Brent first asked me if I would consider giving a sermon here this morning, I thanked him, told him I was very touched and humbled that he’d consider me, but said I didn’t think it was a very good idea. I haven’t preached a sermon in a very long time. Eight years, to be exact. I’ve not been to church in a good while, either. I struggle with reconciling a lot of the things I once believed about God and faith and Christianity, with what I believe about the world today. I struggle with Church. Capital “C” church. The institution of it. Organized religion. I shared all of this with Brent and he, in his very pastoral way, assured me that all of this is okay. And he convinced me that perhaps, just perhaps, I might still have something to share from this place.
Once I said yes, of course, I was stuck. I’ve been a librarian for a very long time. Now I found myself struggling over different things – wondering what the heck do I do in my day-to-day life that I could possibly translate into something you might find relatable, meaningful, or especially inspiring for a sermon. But then, lying in bed one night a couple weeks back, I got to looking at the books in my bedroom and I thought of something I’d read in a book called, “Living in Data: A Citizen’s Guide to a Better Information Future” by the engineer, artist, computer programmer, National Geographic Explorer, and really wonderful storyteller, Jer Thorp. I highly recommend this book and I’ll reference it throughout these thoughts this morning. But lying in bed, I thought of the opening to the second chapter of the book. It reads:
Open the window and let the words in. Let them flow into the room in a stream, all of the words, hundreds of thousands of them, let them fill the space, let them hang in the air, tiny sparkling motes of language.
And the next morning I got up and counted the number of books in my room.
There are 318 books in my bedroom. It’s not an unruly number – really! – and while there are a few stacks of them by my bed and on a dresser, most are neatly arranged in a bookcase and on a couple of shelves on the wall. The shortest one is 13 pages long. A Prairie Dog’s Life. The longest is a 2,000-plus page anthology of English literature. They probably average out though, as a whole, to around 300 pages. Estimating about 350 words to a page, that’s 33,390,000 words hanging out in my room. And this is just my bedroom. I could add to it all of the books in my home, in my office, in my Little Free Library outside of my house. And then multiply all of those out. Think about how many ideas these words generate; how many characters, real and imagined; how many interactions; how many emotions; how many more words they give rise to. They expand and expand and expand. Infinite.
Did you know that the word “data” comes from Latin where it meant, “a thing given, a gift delivered or sent”? Early in its appearance in the English language, it is tied to the fields of mathematics and theology. All the way back in 1614, a clergyman named Thomas Tuke called the Sacraments Data. With a capital “D”. Divinely given.
Today, of course, we think of data as numbers, words, bits and bytes, stuff collected in notebooks or spreadsheets, crunched by computers, analyzed and visualized. We don’t think of it as divine.
Or do we?
I receive a weekly newsletter from Educause, a nonprofit association whose mission is “to advance higher education through the use of information technology”. In an article last week, I read this sentence:
From improving student success to forming optimal strategies that can maximize corporate and foundational relationships, data analytics is now higher education’s divining rod.
Interesting description, wouldn’t you say? Divining, dowsing, doodlebugging – that pseudoscience where a stick – a divining rod – leads you to water, the biological requirement for life. Data analytics is now perceived as what will lead us to our life source.
That sure makes data sound divine to me. It sounds just like the thing that’s going to save us. And it’s hardly slick marketing for a certain college major or field of study. There’s real evidence all around us of the downright amazing – some might say miraculous – outcomes of harnessing big data.
The National Center for Biotechnology Information, NCBI, is part of the National Library of Medicine at the U.S. National Institutes of Health. One of the many things that NCBI does, as part of the Library of Medicine, is build, host, and manage a series of biomedical databases, including PubMed, one of the world’s largest bibliographic databases – a free resource to citations and abstracts in life sciences and biomedical literature. Since November 17, 2019, when the first case of the novel coronavirus we now know so well as COVID-19 was reported (just shy of 22 months ago), 175,593 peer-reviewed, published research articles on COVID-19 have been indexed in PubMed. The full-text versions of more than 200,000 articles are freely available via the public access site, PubMed Central. About 1.4 million nucleotide records have been uploaded and made available, along with a million sequence-related records. 317 articles on COVID-19 have been written by researchers just down the road at UMass Medical School, where I work. We house the full text of these in my library’s institutional repository and as of yesterday morning, those 317 papers had been downloaded more than 30,000 times by people all over the world.
It is this unprecedented open sharing of information and data that allowed us to watch science unfold over the past year at a pace hardly ever seen. The biomedical research community, worldwide, developed multiple vaccines to fight COVID with an effectiveness unheard of before. These vaccines were developed, tested, trialed, and delivered in a 12-month time period. Amazing. Miraculous. Divine?
There is no doubt that many see the hand of God at work in all of this. If you believe that God grants us with gifts – with skills and knowledge and wisdom – to create and use all of these towards the betterment of the world, then yes, divine. Data is divine. A gift from God.
But. That’s a bit easy, isn’t it? A bit simple.
In his book, The Promise of Access: Technology, Inequality, and the Political Economy of Hope, Daniel Greene defines and traces what he calls “the access doctrine,” a belief born out of the technology boom of the 1990s, where it seemed almost common sense that all one really needed to enter into the new information economy was access – access to technology (think the “laptop for every child” programs), access to the Internet (think broadband expansion), access to tech education (think charter schools and diploma programs with a hard focus on students’ use of and proficiency in different types of software and technology). Public libraries in particular have played a big role in propagating this promise. Holding true to their own belief that they exist to freely provide access to information, they were one of the first institutions to make computers freely available to the public.
Unfortunately, as Greene describes in his book, this promise has fallen short. Technology is a simple solution to the vastly complex problem of inequality. But like data, it’s a simple sell. And this simple selling of information and technology and data as some kind of commonsense cure to everything is powerful. It IS power. Much in the same way that the selling of a simplified idea of God or of faith or of religion is power.
Joel Osteen or Bill Gates, you may have vastly different opinions of these two powerful men, but they are strikingly similar in that they each preach a simple message with unwavering conviction. For Osteen it’s that a belief in God will grant one every bit of peace and prosperity. For Gates it’s the belief that every problem – from access to vaccines to climate change – can be solved via some form of technological invention – or intervention – almost always one that will generate the data, the information, the knowledge, and ultimately, the solution. It’s a great hope.
And data as a great hope starts to sound an awful lot like what we think of as religious faith. It holds in our minds and in our hearts this unfettered sense, this belief, that somehow, someway, somewhere within it is the key. The solution. The answer. To everything. If we can only write the right algorithm, if we can only spot the trends, the patterns, then what we once didn’t know, well, now we will. It harkens right back to its Latin origins that data is something out there already – just like God – something given, something true. We just have to see it and recognize it. The truth that is already there.
Simple. Powerful. Comforting, even.
But the concept that both ideas, data and religious faith, leaves out is a central and crucial one – that they are humanly constructed. As an aside, I’m not positing that God is a human construction. That’s an entirely different argument. But faith – what people believe and, to an extension, how they act on those beliefs – is certainly all tied up in the limits of what we can and do construct. Just like data.
If you return to NCBI’s SARS-CoV-2 resources web page, the site where I found many of those numbers on publications and genome sequence runs that I mentioned a few minutes ago, you’ll find a link to a resource called LitCovid, “a curated literature hub for tracking up-to-date scientific information about the 2019 novel Coronavirus.” There’s a chart that shows how many publications are added weekly to the database and there’s also a map of the world, shaded to show the countries mentioned in the abstracts of all of these publications. Darker shading means more mentions. No shading means none. The United States and China stand out as the darkest blue. Most of the rest of the world is a slightly lighter shade, but there are some noticeable blank spots – Central America, a few countries in South America, and a large swath of Central and Western Africa. Does no one have COVID in those places? No. Do they lack the expertise and resources for scientific research? In some cases, definitely yes. But why haven’t those with the expertise and resources focused their research on the people in these parts of the world? There are many answers to this question, of course. It’s complex. But it clearly highlights the flaw in that belief that data is this objectively, unbiased entity that need only be collected and curated and analyzed to bring us the solutions to our problems.
There is a chapter in Living in Data called “Data’s Dark Matter” and in it, Jer Thorp tells another story that highlights the limitations of data, even when one is trying their very best to avoid them. In 2009, he wrote a pair of algorithms to determine the placement of the almost 3,000 names of those killed in the 9/11 attacks on the World Trade Centers – what would be a significant part of the 9/11 Memorial. The designers were seeking what they called “meaningful adjacencies” – people related to one another, people who worked together. This is a mathematical problem that I cannot begin to fathom solving, let alone solve it. But Thorp did – at least to some degree. He admits his own shortcomings – or better put, the shortcomings of any data-driven solution – in this story:
Even in the meaningful adjacencies that my algorithm dutifully satisfied, there is much missing. Mohammad Salman Hamdani was a Pakistani American scientist and NYPD cadet who, like so many other first responders, rushed to the scene on September 11 determined to help. Like so many other first responders, he was killed. Hamdani’s name is inscribed on a parapet on the south pool of the memorial, on the very last panel dedicated to the victims who were killed in World Trade Center South. The algorithm placed Hamdani there, in part because there were no meaningful adjacencies recorded, no other names indicated by the data set for his name to sit beside. Why was this man, a police officer in training, not placed alongside the other first responders? According to memorial officials, Hamdani was not included with the other police officers because he wasn’t on active duty, an explanation that sits at odds with the fact that he was given a police funeral with full honors by the NYPD. We can find a more likely answer in a headline from the “New York Post” on October 12, 2001: “Missing – or Hiding? – Mystery of NYPD Cadet from Pakistan.”
Thorp’s algorithm could only run on the data he was provided. Data constructed by humans – from human stories, human reports, human experiences, and human biases. Sadly, but truthfully, also from human hatred, human fear, and human denial.
The artist and data scientist Mimi Onuoha created a mixed-media installation in 2016 entitled The Library of Missing Datasets. It is a white file cabinet filled with labeled, yet empty, file folders. In her artist’s statement on her website (you can see pictures of the piece there, along with photographs of the 2018 installation, The Library of Missing Datasets, 2.0) she says:
“The Library of Missing Datasets” is a physical repository of those things that have been excluded in a society where so much is collected. “Missing data sets” are the blank spots that exist in spaces that are otherwise data-saturated. Wherever large amounts of data are collected, there are often empty spaces where no data live. The word “missing” is inherently normative. It implies both a lack and an ought: something does not exist, but it should. That which should be somewhere is not in its expected place; an established system is disrupted by distinct absence. That which we ignore reveals more than what we give our attention to. It’s in these things that we find cultural and colloquial hints of what is deemed important. Spots that we’ve left blank reveal our hidden social biases and indifferences.
Some examples of missing datasets in Onuoha’s piece include:
- People excluded from public housing because of criminal records
- Trans people killed or injured in instances of hate crime
- Poverty and employment statistics that include people who are behind bars
- Muslim mosques/communities surveilled by the FBI/CIA
- Mobility for older adults with physical disabilities or cognitive impairments
- LGBT older adults discriminated against in housing
- Undocumented immigrants currently incarcerated and/or underpaid
- Firm statistics on how often police arrest women for making false rape reports
- Master database that details if/which Americans are registered to vote in multiple states
There are many more. Fortunately, one now-former missing dataset, thanks to the efforts of multiple citizen-led data collection projects around the US, is “Civilians killed in encounters with police or law enforcement agencies”. In 2015, when she began collecting the missing datasets for her piece, this wasn’t the case. 2015. Just 6 short years ago. A small grace from the growth of the Black Lives Matter movement and the tragedy of George Floyd’s murder.
The scripture reading this morning from the Book of Proverbs speaks of the wisdom and knowledge of God; from God’s mouth comes knowledge and understanding. I believe that the real kernel of wisdom within those words is the reminder to keep searching. Keep seeking knowledge, keep searching for information, keep collecting the data not because we simply haven’t found the right answer yet, but because we don’t yet possess enough of whatever it is that may yield the right answer. It is missing, if it even exists at all.
To my understanding, this is hope. It’s the hope of data, of science, of art, of technology, of education, of human relations, of human society, of all of creation. Perhaps it is a hope for the Church, too. Paul wrote to the Church in Corinth that faith, hope, and love abide; and the greatest of these is love. Me, I’ll take hope, for we can always hope to have faith, even when we have none. We can hope for love, even when there is no love to be found. And we can always hope to be better, even when we are far from it.
Amen