When I was defending my graduate thesis a number of years ago, I was asked by one of the faculty in attendance to explain why I had done “x” rather than “y” with my data. I stumbled for a bit until I finally said, somewhat out of frustration at not knowing the right answer, “Because that’s not what I said I’d do.” My statistics professor was also in attendance and as I quickly tried to backtrack from my response piped in, “That’s the right answer.”
As I’ve watched and listened to and read and been a part of so many discussions about data – data sharing, data citation, data management – over the past several years, I often find myself thinking back on that defense and my answer. More, I’ve thought of my professor’s comment; that data is collected, managed, and analyzed according to certain rules that a researcher or graduate student or any data collector decides from the outset. That’s best practice, anyway. And such an understanding always makes me wonder if in our exuberance to claim the importance, the need, the mandates, and the “sky’s the limit” views over data sharing, we don’t forget that.
I really enjoyed the panel that the Medical Library Association put together last week for their webinar, “The Diversity of Data Management: Practical Approaches for Health Sciences Librarianship.” The panelists included two data librarians and one research data specialist; Lisa Federer of the National Institutes of Health Library, Kevin Read from New York University’s Health Sciences Library, and Jacqueline Wirz of Oregon Health & Sciences University, respectively. As a disclosure, I know Lisa, Kevin and Jackie each personally and consider them great colleagues, so I guess I could be a little biased in my opinion, but putting that aside, I do feel that they each have a wealth of experience and knowledge in the topic and it showed in their presentations and dialogue.
Listening to the kind of work and the projects that these data-centric professionals shared, it’s easy and exciting to see the many opportunities that exist for libraries, librarians, and others with an interest in data science. At the same time, I admit that I wince when I sense our “We can do this! Librarians can do anything!” enthusiasm bubble up – as occasionally occurs when we gather together and talk about this topic – because I don’t think it’s true. I do believe that individually, librarians can move into an almost limitless career field, given our basic skills in information collection, retrieval, management, preservation, etc. We are well-positioned in an information age. That said, though, I also believe that (1) there IS a difference between information and data and (2) the skills librarians have as a foundation in terms of information science don’t, in and of themselves, translate directly to the age of big data. (I’m not fan of that descriptor, by the way. I tend to think it was created and is perpetuated by the tech industry and the media, both wishing we believe things are simpler than they ever are.) Some librarians, with a desire and propensity towards the opportunities in data science will find their way there. They’ll seek out the extra skills needed and they’ll identify new places and new roles that they can take on. I feel like I’ve done this myself and I know a good plenty handful of others who’ve done the same. But can we sell it as the next big thing that academic and research libraries need to do? Years later, I still find myself a little skeptical.
Moving beyond the individual, though, I wonder if libraries and other entities within information science, as a whole, don’t have a word of caution to share in the midst of our calls for openness of data. It’s certainly the belief of our profession(s) that access to information is vital for the health of a society on every level. However, in many ways it seems that in our discussions of data, we’ve simply expanded our dedication towards the principal of openness to information to include data, as well. Have we really thought through all that we’re saying when we wave that banner? Can we have a more tempered response and/or approach to the big data bandwagon?
Arguably, there are MANY valid reasons for supporting access in this area; peer review, expanded and more efficient science, reproducibility, transparency, etc. Good things, all. But going back to that lesson that I learned in grad school, it’s important to remember that data is collected, managed, and analyzed in certain ways for a reason; things decided by the original researcher. In other words, data has context. Just like information. And like information, I wonder (and have concern for) what happens to data when it’s taken out of its original context. And I wonder if my profession could perhaps advocate this position, too, along with those of openness and sharing, if nothing more than to raise the collective awareness and consciousness of everyone in this new world. To curb the exuberance just a tad.
I recently started getting my local paper delivered to my home. The real thing. The newsprint newspaper. The one that you spread out on the kitchen table and peruse through, page by page. You know what I’ve realized in taking up this long-lost activity again? When you look at a front page with articles of an earthquake in Nepal, nearby horses attacked by a bear, the hiring practices of a local town’s police force, and gay marriage, you’re forced to think of the world in its bigger context. At the very least, you’re made aware of the fact that there’s a bigger picture to see.
When I think of how information is so bifurcated today, I can’t help but ask if there’s a lesson there that can be applied to data before we jump overboard into the “put it all out there” sea. We take research articles out of the context of journals. We take scientific findings out of the context of science. We take individual experiences out of context of the very experience in which they occur. And of course, the most obvious, we take any and every politician’s words out of context in order to support whatever position we either want or don’t want him/her to support. I don’t know about you, but each and every one of these examples appears as a pretty clear reason to at least think about what can and will happen (already happens) to data if and when it suffers the same fate.
Are there reasons why librarians and information specialists are concerned with big data? Absolutely! I just hope that our concern also takes in the big picture.
Sally, this is great — one of your best blog posts yet!
Thanks so much, Celia! I’m glad that you enjoyed it.
I think the context part of this is where librarians can play the biggest role: meta data, DOI, links between data and other contextual information, searchability . These are not concepts that researchers pay a lot of attention to but add a lot of value to sharing data.
You are totally correct in that there are issues with data reuse and possible misuse. Analysts need to be responsible with the data they are using, whether it’s primary data or a metanalysis, and context will help them do that.
That said, data science is definitely a field that requires a lot of training, and the pathways to get proper training as a librarian or otherwise can be fuzzy. Overall I think it’s an exciting time to be involved, and I’m very interested to see how the field evolves.
Thanks for the comment, tobinmagle. It IS an exciting time, isn’t it? I just posted a comment below that gives a good example of the problem.
As always, an excellent blog post Sally.
One thing I’d like to comment on is this sentence: “But can we sell it as the next big thing that academic and research libraries need to do? ”
I agree that sometimes libraries and other information professions need to be cautious when seeking to add new initiatives (know your institution’s mission!), but as new librarians and information professionals are hired, it does the profession good to not only look for the ‘traditional’ skills, but also at the new skills these new professionals are bringing into our libraries. This enables us to offer new initiatives, services and perhaps meet a need in our institution that other departments/services can’t, or won’t, meet. In this way we grow our skills, and our profession.
Indeed, Regina. I agree that it’s really important to both look for new staff with different abilities and allow staff to use all of their skills in the workplace. It’s a win-win that way.
You may find the IOM report that I note below of interest, too.
Here’s an excellent summary regarding health data that I just read. It’s from the newly-released report, “Vital Signs: Core Metrics for Health and Health Care Progress,” from the Institute of Medicine:
“Ironically, the rapid proliferation of interest in, support for, and capacity for new
measurement efforts for a variety of purposes—including performance assessment and
improvement, public and funder reporting, and internal improvement initiatives—has blunted the effectiveness of those efforts. This situation reflects in part the fragmentation of the health care sector, as well as the range of legislatively mandated activities that involve measurement of health and health care. Absent a shared strategy, the variation inherent in thousands of disconnected measurement and accountability systems frustrates understanding of health system performance and the accomplishment of shared goals.” (http://www.iom.edu/Reports/2015/Vital-Signs-Core-Metrics.aspx)
Addressing the notion of “absent a shared strategy” is right on target for what I’m stating is an important role that libraries and librarians can play in this data-intense arena, i.e. raise awareness of the lack of and need for shared strategies, and promote movement forward at a pace that addresses such.