Tag Archives: data

Is Big Data Missing the Big Picture?

27 Apr

Forest_for_the_Trees

When I was defending my graduate thesis a number of years ago, I was asked by one of the faculty in attendance to explain why I had done “x” rather than “y” with my data. I stumbled for a bit until I finally said, somewhat out of frustration at not knowing the right answer, “Because that’s not what I said I’d do.” My statistics professor was also in attendance and as I quickly tried to backtrack from my response piped in, “That’s the right answer.”

As I’ve watched and listened to and read and been a part of so many discussions about data – data sharing, data citation, data management – over the past several years, I often find myself thinking back on that defense and my answer. More, I’ve thought of my professor’s comment; that data is collected, managed, and analyzed according to certain rules that a researcher or graduate student or any data collector decides from the outset. That’s best practice, anyway. And such an understanding always makes me wonder if in our exuberance to claim the importance, the need, the mandates, and the “sky’s the limit” views over data sharing, we don’t forget that.

I really enjoyed the panel that the Medical Library Association put together last week for their webinar, “The Diversity of Data Management: Practical Approaches for Health Sciences Librarianship.” The panelists included two data librarians and one research data specialist; Lisa Federer of the National Institutes of Health Library, Kevin Read from New York University’s Health Sciences Library, and Jacqueline Wirz of Oregon Health & Sciences University, respectively. As a disclosure, I know Lisa, Kevin and Jackie each personally and consider them great colleagues, so I guess I could be a little biased in my opinion, but putting that aside, I do feel that they each have a wealth of experience and knowledge in the topic and it showed in their presentations and dialogue.

Listening to the kind of work and the projects that these data-centric professionals shared, it’s easy and exciting to see the many opportunities that exist for libraries, librarians, and others with an interest in data science. At the same time, I admit that I wince when I sense our “We can do this! Librarians can do anything!” enthusiasm bubble up – as occasionally occurs when we gather together and talk about this topic – because I don’t think it’s true. I do believe that individually, librarians can move into an almost limitless career field, given our basic skills in information collection, retrieval, management, preservation, etc. We are well-positioned in an information age. That said, though, I also believe that (1) there IS a difference between information and data and (2) the skills librarians have as a foundation in terms of information science don’t, in and of themselves, translate directly to the age of big data. (I’m not fan of that descriptor, by the way. I tend to think it was created and is perpetuated by the tech industry and the media, both wishing we believe things are simpler than they ever are.) Some librarians, with a desire and propensity towards the opportunities in data science will find their way there. They’ll seek out the extra skills needed and they’ll identify new places and new roles that they can take on. I feel like I’ve done this myself and I know a good plenty handful of others who’ve done the same. But can we sell it as the next big thing that academic and research libraries need to do? Years later, I still find myself a little skeptical.

Moving beyond the individual, though, I wonder if libraries and other entities within information science, as a whole, don’t have a word of caution to share in the midst of our calls for openness of data. It’s certainly the belief of our profession(s) that access to information is vital for the health of a society on every level. However, in many ways it seems that in our discussions of data, we’ve simply expanded our dedication towards the principal of openness to information to include data, as well. Have we really thought through all that we’re saying when we wave that banner? Can we have a more tempered response and/or approach to the big data bandwagon?

Arguably, there are MANY valid reasons for supporting access in this area; peer review, expanded and more efficient science, reproducibility, transparency, etc. Good things, all. But going back to that lesson that I learned in grad school, it’s important to remember that data is collected, managed, and analyzed in certain ways for a reason; things decided by the original researcher. In other words, data has context. Just like information. And like information, I wonder (and have concern for) what happens to data when it’s taken out of its original context. And I wonder if my profession could perhaps advocate this position, too, along with those of openness and sharing, if nothing more than to raise the collective awareness and consciousness of everyone in this new world. To curb the exuberance just a tad.

I recently started getting my local paper delivered to my home. The real thing. The newsprint newspaper. The one that you spread out on the kitchen table and peruse through, page by page. You know what I’ve realized in taking up this long-lost activity again? When you look at a front page with articles of an earthquake in Nepal, nearby horses attacked by a bear, the hiring practices of a local town’s police force, and gay marriage, you’re forced to think of the world in its bigger context. At the very least, you’re made aware of the fact that there’s a bigger picture to see.

When I think of how information is so bifurcated today, I can’t help but ask if there’s a lesson there that can be applied to data before we jump overboard into the “put it all out there” sea. We take research articles out of the context of journals. We take scientific findings out of the context of science. We take individual experiences out of context of the very experience in which they occur. And of course, the most obvious, we take any and every politician’s words out of context in order to support whatever position we either want or don’t want him/her to support. I don’t know about you, but each and every one of these examples appears as a pretty clear reason to at least think about what can and will happen (already happens) to data if and when it suffers the same fate.

Are there reasons why librarians and information specialists are concerned with big data? Absolutely! I just hope that our concern also takes in the big picture.

 

Do you REALLY want it all?

10 Apr
Feeling the Big Squeeze? Remember that even a squeeze box can make a pretty song.

Feeling the Big Squeeze? Remember that even a squeeze box can make a pretty song.

There’s a billboard across the street from my office building, promoting the hospital that’s affiliated with the medical school where I work. It features a friendly looking young woman with the words above her head, “I want it all.” The implication, of course, is that the medical center can meet all of the health needs of this person, indeed of anyone who uses the hospital and its network of health care providers.

This isn’t a criticism of their advertising campaign, but more just a few thoughts that come to my mind every time that I drive past that sign. Wanting it all is pretty much the American dream, is it not? Maybe it’s the dream of all people, everywhere. We all want whatever it is that we want, whether we necessarily need it or not. You may not subscribe to this belief personally, but you have to admit that it’s an awfully loud societal message.

From the perspective of a provider, be one a provider of health care services or a provider of information services, we want it all, too. We want to say that we can provide anything and everything to anyone and everyone who comes through our doors. Libraries, especially, have this idea deeply ingrained in their DNA. They exist for everyone.

But as we have become such a specialized world, I think we’d do well to face the facts that our ability to meet that mission anymore is dwindling, if not altogether extinct. I’ve been working on an evaluation of one of the research cores for the CCTS and in talking to those involved with it, I can’t help but notice they speak many of the same concerns that I long heard in my former home in the library; a handful of people simply cannot meet the needs and demands of everyone.

This imbalance causes us to rethink much of what we do, how we measure our success, and how we plan for the future. The reality of health care is that you really cannot have it all. A few weeks back, I was feeling really miserable and went to the walk-in clinic of the hospital next door only to learn that it’s really not a walk-in clinic, but rather a place for patients who see a certain group of doctors there. These patients can walk in for a last-minute appointment. If one is available. My doctor is a doctor within the same system, but while he has an office a few floors above the very clinic where I was seeking treatment, his clinical office is in another location, thus I wasn’t able to use the services provided there. Again, not a criticism of the provider network (though I am a big critic of the messed-up system that dictates these type decisions), but I share the story as an example of how claiming all can be provided to everyone ought to be a statement with an asterisk after it. Some restrictions DO apply.

One of the reasons that I chose to leave the library and work for the CCTS is that I felt the expectations in this new role were somewhat more realistic. Here was a defined group of programs and research cores for me to evaluate. It’s a lot, but still seems a manageable number. It allows me the ability to focus more, to feel less scattered, to feel less pulled, to feel less like I’m always falling short of meeting my goals, not because I’m not trying hard or working hard, but because I am only one person and trying to give time to everyone feels like a losing proposition. To me.

Sustainability is a key issue as we continue to work in institutions and businesses and governments that are constantly under the pressures of too little resources to meet all of the required needs. We are limited in people, certainly. Positions are cut or people leave posts and are never replaced. Everyone feels overworked as we try to fill holes and do more.

But we’re also limited by our current service models. Yesterday, I was able to attend the annual eScience Symposium hosted by the NN/LM NER. The afternoon session featured two speakers from different universities who described their particular programs for data services. Regarding their data repositories, one school allows self-deposit while the other offers a mediated service, i.e. researchers send their data to the library and then staff their deposit on their behalf, adding all of the proper metadata, annotation, etc. necessary in order for people to search and find the data sets in the said repository. During the Q&A, I asked the speakers about the differences between their models. I asked them some of the same questions that are asked in the process of evaluating research cores and programs:

How did you decide which path to follow? How did you decide which aspect of your repository to sacrifice; the quality of the content (enhanced by the mediation) or the ability to be a bigger service (because you’re not limited by the time/efforts of staff in the library)?

As one speaker said, “It’s a balancing act.” Indeed. And it’s also a clear example of how believing we can be all for all is misguided. It’s just not possible. We have to set priorities and make choices.

For good and bad, though, these are the realities of academic institutions, health care providers, research centers, and libraries. The one thing that we all really do have is the challenge to face these limitations, all the while trying to come up with the solutions for providing the best of whatever we can offer to as many as possible. Whether it’s what we really want or not, THAT is the “all” that we have.

Candy Cane 11: Are we leaping (like the lords) to conclusions?

11 Dec

 December 11 – Managing Information and/or Managing Data

I admit that I struggle greatly with how easily we librarians interchangeably use the terms information and data. I believe that there are significant differences between managing information and managing data. I also think that our history, professionally, is in the former more than the latter. That said, as we move more and more into the realm of data management, we’re making the argument that we also have a history of managing data. 

In a recent post on the e-Science Community Blog (a part of the e-Science Portal for New England Librarians), Nancy Glassman, Assistant Director for Informatics at the D. Samuel Gottesman Library, Albert Einstein College of Medicine, argues that Librarians are the Original Data Managers. I’m not sure that I wholeheartedly agree with Nancy, but what I do really like about this post is how she lays out the thesis for a class of students who attended a data management workshop she led. What I like best is that she convinced them that librarians do, in fact, have a role in this area. They understood her explanation and she gained credibility not just for herself, but for other librarians these attendees might encounter in the future. 

That’s a win-win for all!

INFORMATION

Tomorrow is Friday! What will the treat be? Check in to find out.