Tag Archives: data

Is Big Data Missing the Big Picture?

27 Apr

Forest_for_the_Trees

When I was defending my graduate thesis a number of years ago, I was asked by one of the faculty in attendance to explain why I had done “x” rather than “y” with my data. I stumbled for a bit until I finally said, somewhat out of frustration at not knowing the right answer, “Because that’s not what I said I’d do.” My statistics professor was also in attendance and as I quickly tried to backtrack from my response piped in, “That’s the right answer.”

As I’ve watched and listened to and read and been a part of so many discussions about data – data sharing, data citation, data management – over the past several years, I often find myself thinking back on that defense and my answer. More, I’ve thought of my professor’s comment; that data is collected, managed, and analyzed according to certain rules that a researcher or graduate student or any data collector decides from the outset. That’s best practice, anyway. And such an understanding always makes me wonder if in our exuberance to claim the importance, the need, the mandates, and the “sky’s the limit” views over data sharing, we don’t forget that.

I really enjoyed the panel that the Medical Library Association put together last week for their webinar, “The Diversity of Data Management: Practical Approaches for Health Sciences Librarianship.” The panelists included two data librarians and one research data specialist; Lisa Federer of the National Institutes of Health Library, Kevin Read from New York University’s Health Sciences Library, and Jacqueline Wirz of Oregon Health & Sciences University, respectively. As a disclosure, I know Lisa, Kevin and Jackie each personally and consider them great colleagues, so I guess I could be a little biased in my opinion, but putting that aside, I do feel that they each have a wealth of experience and knowledge in the topic and it showed in their presentations and dialogue.

Listening to the kind of work and the projects that these data-centric professionals shared, it’s easy and exciting to see the many opportunities that exist for libraries, librarians, and others with an interest in data science. At the same time, I admit that I wince when I sense our “We can do this! Librarians can do anything!” enthusiasm bubble up – as occasionally occurs when we gather together and talk about this topic – because I don’t think it’s true. I do believe that individually, librarians can move into an almost limitless career field, given our basic skills in information collection, retrieval, management, preservation, etc. We are well-positioned in an information age. That said, though, I also believe that (1) there IS a difference between information and data and (2) the skills librarians have as a foundation in terms of information science don’t, in and of themselves, translate directly to the age of big data. (I’m not fan of that descriptor, by the way. I tend to think it was created and is perpetuated by the tech industry and the media, both wishing we believe things are simpler than they ever are.) Some librarians, with a desire and propensity towards the opportunities in data science will find their way there. They’ll seek out the extra skills needed and they’ll identify new places and new roles that they can take on. I feel like I’ve done this myself and I know a good plenty handful of others who’ve done the same. But can we sell it as the next big thing that academic and research libraries need to do? Years later, I still find myself a little skeptical.

Moving beyond the individual, though, I wonder if libraries and other entities within information science, as a whole, don’t have a word of caution to share in the midst of our calls for openness of data. It’s certainly the belief of our profession(s) that access to information is vital for the health of a society on every level. However, in many ways it seems that in our discussions of data, we’ve simply expanded our dedication towards the principal of openness to information to include data, as well. Have we really thought through all that we’re saying when we wave that banner? Can we have a more tempered response and/or approach to the big data bandwagon?

Arguably, there are MANY valid reasons for supporting access in this area; peer review, expanded and more efficient science, reproducibility, transparency, etc. Good things, all. But going back to that lesson that I learned in grad school, it’s important to remember that data is collected, managed, and analyzed in certain ways for a reason; things decided by the original researcher. In other words, data has context. Just like information. And like information, I wonder (and have concern for) what happens to data when it’s taken out of its original context. And I wonder if my profession could perhaps advocate this position, too, along with those of openness and sharing, if nothing more than to raise the collective awareness and consciousness of everyone in this new world. To curb the exuberance just a tad.

I recently started getting my local paper delivered to my home. The real thing. The newsprint newspaper. The one that you spread out on the kitchen table and peruse through, page by page. You know what I’ve realized in taking up this long-lost activity again? When you look at a front page with articles of an earthquake in Nepal, nearby horses attacked by a bear, the hiring practices of a local town’s police force, and gay marriage, you’re forced to think of the world in its bigger context. At the very least, you’re made aware of the fact that there’s a bigger picture to see.

When I think of how information is so bifurcated today, I can’t help but ask if there’s a lesson there that can be applied to data before we jump overboard into the “put it all out there” sea. We take research articles out of the context of journals. We take scientific findings out of the context of science. We take individual experiences out of context of the very experience in which they occur. And of course, the most obvious, we take any and every politician’s words out of context in order to support whatever position we either want or don’t want him/her to support. I don’t know about you, but each and every one of these examples appears as a pretty clear reason to at least think about what can and will happen (already happens) to data if and when it suffers the same fate.

Are there reasons why librarians and information specialists are concerned with big data? Absolutely! I just hope that our concern also takes in the big picture.

 

Do you REALLY want it all?

10 Apr
Feeling the Big Squeeze? Remember that even a squeeze box can make a pretty song.

Feeling the Big Squeeze? Remember that even a squeeze box can make a pretty song.

There’s a billboard across the street from my office building, promoting the hospital that’s affiliated with the medical school where I work. It features a friendly looking young woman with the words above her head, “I want it all.” The implication, of course, is that the medical center can meet all of the health needs of this person, indeed of anyone who uses the hospital and its network of health care providers.

This isn’t a criticism of their advertising campaign, but more just a few thoughts that come to my mind every time that I drive past that sign. Wanting it all is pretty much the American dream, is it not? Maybe it’s the dream of all people, everywhere. We all want whatever it is that we want, whether we necessarily need it or not. You may not subscribe to this belief personally, but you have to admit that it’s an awfully loud societal message.

From the perspective of a provider, be one a provider of health care services or a provider of information services, we want it all, too. We want to say that we can provide anything and everything to anyone and everyone who comes through our doors. Libraries, especially, have this idea deeply ingrained in their DNA. They exist for everyone.

But as we have become such a specialized world, I think we’d do well to face the facts that our ability to meet that mission anymore is dwindling, if not altogether extinct. I’ve been working on an evaluation of one of the research cores for the CCTS and in talking to those involved with it, I can’t help but notice they speak many of the same concerns that I long heard in my former home in the library; a handful of people simply cannot meet the needs and demands of everyone.

This imbalance causes us to rethink much of what we do, how we measure our success, and how we plan for the future. The reality of health care is that you really cannot have it all. A few weeks back, I was feeling really miserable and went to the walk-in clinic of the hospital next door only to learn that it’s really not a walk-in clinic, but rather a place for patients who see a certain group of doctors there. These patients can walk in for a last-minute appointment. If one is available. My doctor is a doctor within the same system, but while he has an office a few floors above the very clinic where I was seeking treatment, his clinical office is in another location, thus I wasn’t able to use the services provided there. Again, not a criticism of the provider network (though I am a big critic of the messed-up system that dictates these type decisions), but I share the story as an example of how claiming all can be provided to everyone ought to be a statement with an asterisk after it. Some restrictions DO apply.

One of the reasons that I chose to leave the library and work for the CCTS is that I felt the expectations in this new role were somewhat more realistic. Here was a defined group of programs and research cores for me to evaluate. It’s a lot, but still seems a manageable number. It allows me the ability to focus more, to feel less scattered, to feel less pulled, to feel less like I’m always falling short of meeting my goals, not because I’m not trying hard or working hard, but because I am only one person and trying to give time to everyone feels like a losing proposition. To me.

Sustainability is a key issue as we continue to work in institutions and businesses and governments that are constantly under the pressures of too little resources to meet all of the required needs. We are limited in people, certainly. Positions are cut or people leave posts and are never replaced. Everyone feels overworked as we try to fill holes and do more.

But we’re also limited by our current service models. Yesterday, I was able to attend the annual eScience Symposium hosted by the NN/LM NER. The afternoon session featured two speakers from different universities who described their particular programs for data services. Regarding their data repositories, one school allows self-deposit while the other offers a mediated service, i.e. researchers send their data to the library and then staff their deposit on their behalf, adding all of the proper metadata, annotation, etc. necessary in order for people to search and find the data sets in the said repository. During the Q&A, I asked the speakers about the differences between their models. I asked them some of the same questions that are asked in the process of evaluating research cores and programs:

How did you decide which path to follow? How did you decide which aspect of your repository to sacrifice; the quality of the content (enhanced by the mediation) or the ability to be a bigger service (because you’re not limited by the time/efforts of staff in the library)?

As one speaker said, “It’s a balancing act.” Indeed. And it’s also a clear example of how believing we can be all for all is misguided. It’s just not possible. We have to set priorities and make choices.

For good and bad, though, these are the realities of academic institutions, health care providers, research centers, and libraries. The one thing that we all really do have is the challenge to face these limitations, all the while trying to come up with the solutions for providing the best of whatever we can offer to as many as possible. Whether it’s what we really want or not, THAT is the “all” that we have.

Candy Cane 11: Are we leaping (like the lords) to conclusions?

11 Dec

 December 11 – Managing Information and/or Managing Data

I admit that I struggle greatly with how easily we librarians interchangeably use the terms information and data. I believe that there are significant differences between managing information and managing data. I also think that our history, professionally, is in the former more than the latter. That said, as we move more and more into the realm of data management, we’re making the argument that we also have a history of managing data. 

In a recent post on the e-Science Community Blog (a part of the e-Science Portal for New England Librarians), Nancy Glassman, Assistant Director for Informatics at the D. Samuel Gottesman Library, Albert Einstein College of Medicine, argues that Librarians are the Original Data Managers. I’m not sure that I wholeheartedly agree with Nancy, but what I do really like about this post is how she lays out the thesis for a class of students who attended a data management workshop she led. What I like best is that she convinced them that librarians do, in fact, have a role in this area. They understood her explanation and she gained credibility not just for herself, but for other librarians these attendees might encounter in the future. 

That’s a win-win for all!

INFORMATION

Tomorrow is Friday! What will the treat be? Check in to find out.

TEDMED at Home

17 Apr

My workplace is live streaming the terrific annual event, TEDMED, this week. Many of the talks eventually become available through the TED website, so if you’re not able to watch now, do check in at a later date to see what gets posted. In particular, you might want to watch Larry Smarr describe his hard-to-imagine quest for gathering, tracking, and analyzing every kind of microbe living in his colon. Perhaps it sounds a bit dry, but trust me, it was a fascinating talk.

If you’re interested in mobile health, don’t miss Deborah Estrin’s talk on the work she is doing at Cornell towards an “Open mHealth” movement. Assessing our “social pulse,” she argues, can tell as much about our health as anything, and doing such a thing is becoming more and more possible with the advent of so many tools and apps available for mobile devices. (Visit Small Data to use/see your own small data.)

EVERY academic librarian, along with every single person who utilizes the resources of an academic library, needs to watch Elizabeth Marincola speak on, “What happens when science, money, and freedom of information collide?”  Marincola is a business person and a publisher… and a VERY strong advocate for making published scientific research available to all. “I don’t know anyone who believes that the mission of science is the comodification of data.” GREAT quote!

Max Little spoke of the role of applied mathematics and “prediction competitions” to drive science forward. Amy Abernathy proposes the wonderful idea of Info Data Drives, based on the model of blood drives, where individuals can donate their health data to build the kind of data sets needed to solve complex medical mysteries. Mick Cornett, the mayor of Oklahoma City, talked about how his city redesigned itself for people, as opposed to automobiles, and in doing so went from being on the list of “Most Obese Cities” to “Most Fit Cities” in a matter of a couple of years. Even more, building infrastructure that focuses on community, recreation, and other healthy social activities has made Oklahoma City a destination for many young adults and families, bringing with them the talent and skills needed to keep a city thriving. Sally Okun is the first nurse to grace the TEDMED stage and, not surprisingly to me, she was the one speaker so far who hit home the importance of listening to what patients say. She’s involved in some really interesting contextual language research, trying to develop a lexicon of patient language. I’ve made a note to follow-up on it.

The morning also brought a couple of terrific interludes; Jill Sobule (I loved her already, but now that I know she’s the TEDMED troubadour…) sang a song with fantastic lyrics that I’m afraid I can’t provide here on this family/work-oriented blog. Let’s just say, in the wake of bombs going off at the Boston Marathon, politicians arguing over gun control, and every eye focused on immigration reform, Sobule gives me a nice little refrain to sing over and over again in my head (“When they say, ‘We want our America back’…). Thank you, Jill. And if you’ve never seen Zubin Damania’s alter ego, “ZDoggMD” and his PSAs for different health issues, well you’ve just never seen an internist rapper before, have you? Check him out!

Finally, our very own Myrna Morales, Technology Coordinator for the NN/LM NER, worked with the students organizing today’s streaming to make it possible for a few of us to give our own TED Talks during the breaks! I’m really pleased and honored to work in a library where six people stepped up to the plate and spoke. I captured them on video and after editing (and if I receive permission from the individual speakers), I’ll share their talks on my blog. In the meantime, here is my own and very first TED Talk. Not quite ready for the big leagues, but it was awfully fun to do. Hope you enjoy it!

Video

Data Hoarder

10 Apr

Next time you have to teach data management to a group of researchers or students, here’s a very funny piece you can share (with the right audience, of course). Thanks to my colleague, Katie Houk, at Tufts Medical School for bringing it to my attention. Enjoy! 🙂

Repeat After Me

13 Mar

Quote from Science

Preparing for some upcoming work, I took part in a webinar on systematic reviews yesterday morning. It was a brief, but good, review/overview of the process and the roles librarians and/or information scientists have in it. One thing that stuck out for me was the reminder by Dr. Edoardo Aromataris of the Joanna Briggs Institute, one of the program’s speakers, that a systematic review is a type of research and as such, it needs to be reproducible. He noted that the search strategy ultimately constructed in a review should yield pretty much the same results for anyone who repeats it.

Replication is a hallmark of the scientific method. As Jasny et al state in the above-referenced quote from a special issue of Science on data replication and reproducibility, it is the gold standard of research. Science grows in value as it builds upon itself. Without the characteristic of replication, such growth is thwarted and findings become limited to a study’s specific subject pool. If a study’s design becomes so complicated and the research question(s) keep changing along the way, the study’s value gets clouded, if it remains at all.

I remember during my master’s thesis defense, one of my advisers asked me why I hadn’t done a particular statistical analysis to answer another question about the data I collected. I admit that the question threw me, but after thinking about it for a moment, I said, “Because that isn’t what I said that I would do.” My statistics professor, who was also sitting in on the defense, said calmly, after I hemmed and hawed and tried to defend my answer in a long and drawn out way, “That’s the right answer.” In other words, when I proposed my study and laid out my methodology, I stated that I would do “x, y, and z.” If I later decided to do “q” simply because I thought “q” was more interesting, I wouldn’t have necessarily answered the research question that I set out to answer, nor would my methods be as strong as I initially put forward.

I bring all of this up this week because as I’ve been sitting in on the weekly meetings of my research team these past months, I can’t help but notice how often new questions are asked and how often those questions result in an awareness that the data needed to answer them is missing. This fact then leads to a lot of going back and gathering the missing data. Sometimes this is possible and sometimes it isn’t. For instance, you might go to see your doctor one time and you’re asked the question, “Do you smoke?” But the next time you visit, the nurse doesn’t ask you that same question. Usually, you’re asked something like, “Are you still taking (name the medication)?” You answer, “Yes,” but you fail to mention that you’ve changed dosage. Or that your doctor changed the dosage sometime during the past year. Is that captured in the record? Maybe, maybe not. And further, some insurance carriers require certain patient information while others do not. If you’re drawing subjects for a study from multiple insurance carriers, you’d better be sure that each is collecting all of the data that you need, otherwise you cannot compare the groups. As the analyst on our study said yesterday, “If you can’t get all of the data, you might as well not get any of it.”

Now please remember that I am working as an informationist on a study led by two principal investigators and a research team that has being doing research for a very long time. They have secured any number of big grants to do big studies. They are well-respected and know a whole helluva lot more about clinical research than me and my little master’s-thesis-experienced self. I’m not questioning their methods or their expertise at all. Rather, I’m pointing out that this kind of research – research that involves a lot of people (25+ on the research team), thousands of subjects, a bunch of years, several sources of data (and data and data and data…), and a whole lot of money over time – is messy. Really, really messy! In other words, an awful lot, if not the majority, of biomedical and/or health research today is messy. And as an observer of such research, I cannot help but wonder how in the world these studies could ever be replicated. As that issue of Science noted, research today is at a moment when so many factors are affecting the outcomes that it’s a time for those involved in it to stop and evaluate these factors, and to insure that the work being done – the science being done – meets high standards.

More, as a supposed “expert” in the area of information and a presumed member of the research team, I’m feeling at a loss as to what I can do, at this point in the study, to clean it up. Yes, I admit that yesterday just wasn’t my best day on the study and maybe that’s coloring part of my feelings today. I didn’t have anything to offer in the meeting. I didn’t feel like much of a part of the team. It happens.

So can I take a lesson from the day’s events? The answer to that is equivocally “YES!” and here’s why…

In the afternoon, I had a meeting with a different PI for a different study. We’re exploring areas where I can help her team; writing up a “scope of work” to embed me as an informationist on the study. It’s a very different kind of study and not as big as the mammography study (above), but it still involves multiple players across multiple campuses, and it ultimately will generate a whole bunch of data from a countless number of subjects. The biggest difference, though, is timing. And this is the take-away lesson for me in regards to what brings success to my role. When a researcher is just putting together his/her team, when s/he is just beginning to think about the who and what and where and why of the study, if THEN s/he thinks of including an individual with expertise in information, knowledge, and/or data management, the potential value of that person to the team and to the work is multiplied several fold.

This is because it’s in the beginning of a study when an informationist can put his/her skills to use in building the infrastructure, the system, and/or the tools needed to make the flow of information and data and communication go much more smoothly. It’s hard to go back and fix stuff. It’s much easier to do things right from the beginning. Again, I’m not saying that the mammography study is doing anything wrong, but building information organization into your methods from the get-go can surely help reduce the headaches down the road. And fewer headaches + cleaner data = better science, all the way around.

Follow Along

7 Jan

blog bubbleI’m a HUGE fan of Twitter. I know that many of my colleagues, associates, and people in general still don’t get it. They don’t understand how a continuing stream of bits of information could be relevant to anyone. Mostly, I find that those who either don’t get or don’t like the social media tool always sum up their feelings by stating, “I don’t care if you brushed your teeth today.”

Concerns for halitosis and dental hygiene aside, these short-sighted and shallow accusations of Twitter are just that. But this isn’t a blog post to share the merits of Twitter. I need to write that piece for another blog (NAHSL) later this week. Instead, this is a very quick collection of BLOGS that, in many cases, Twitter led me to. In other words, the 140 characters shared by someone on Twitter ultimately took me to the following substantive resources that I check daily. The blogs themselves are not all updated on a daily basis, but I decided that this year I would put them into a folder on my bookmarks toolbar and look at them each morning. Anything new that these people write never ceases to inform, inspire, energize, and/or entertain. I share them with you here in the hopes that you will choose to either follow them as well, or perhaps create your own “Top Ten” to share with others.

  • Get Moving: Fitting Fitness into Your Day is the blog of Boston.com’s senior health and wellness producer, Elizabeth Comeau. You can follow along with Elizabeth on her own journey to live a healthy life, as well as find many links to important news stories related to health and wellness. Elizabeth gets the first listing in this list because today marks her one year “blogiversary”. Congrats, Elizabeth! You can also follow Elizabeth on Twitter at @BeWellBoston.
  • FUDiet is the blog of, admittedly, my favorite researcher at UMass Medical School. Librarians are not supposed to choose favorites (I think I’ve typed this before), but I have a bias towards Sherry Pagoto, PhD, a clinical psychologist and researcher in the areas of health, nutrition, fitness, depression and obesity. She lets me work with her, she planks in the Library, she makes me laugh. Ranking #1 for sure! Her blog and her social media movement, #PlankADay, are not to be missed. If you want to know the FACTS about health and fitness, follow an expert. Follow Sherry! @DrSherryPagoto
  • The Brilliant Blog is home of the musings of author, journalist, consultant and speaker, Annie Murphy Paul. Annie is a regular contributor to numerous news sources including Time, CNN, Forbes, MindShift, Psychology Today, and The New York Times, to name a few. She writes fascinating and thought-provoking pieces on the science behind learning and intelligence. You can also find Annie on Twitter at @anniemurphypaul
  • I started following Laura Vanderkam’s blog after reading her book, 168 Hours. I need all the help I can find, all the tips offered, to help me manage the multiple projects I have going on in my life, both at work and away from here. Laura provides these through her books, her videos, and her blog. Felling overwhelmed? Take a few minutes to read her stuff. You really DO have more time than you think. @lvanderkam
  • Librarians know Daniel Pink. Members of the Medical Library Association were lucky to have Dan speak at our annual meeting a few years back, as well as host a webcast just for us! When it comes to understanding people and how to put that understanding to practice in my people-oriented work, his books are at the top of my list. And his blog is a great way to keep those ideas going in between the publication of said books. @DanielPink is also on Twitter.
  • I would be remiss if I didn’t include my colleague, Donna Kafel’s, blog in this list. Donna oversees the e-Science Community Blog, a multi-contributor source for all information related to librarians, eScience, and data. I slip in a post there myself, from time to time. If you’re an informationist, a research librarian, any kind of librarian working with data, you can find a lot of relevant information here. The NER eScience Portal tweets, too – @NERescience.
  • Speaking of data, David McCandless and Omid Kashan’s website and blog, Information is Beautiful, is… beautiful! Leaders in data visualization, these guys regularly publish amazing pieces on all kinds of topics. It’s a fun stop in your busy day. Info=Beautiful, @infobeautiful
  • The Chronicle of Higher Education hosts a number of great blogs, but the one I choose to list here is Percolator: Research that Matters. From politics to morality to academia, Percolator is worth your attention. Grab a cup of Joe(sephine) and enjoy! You can keep up with all news from The Chronicle on Twitter at @chronicle.

And now, perhaps the two most important blogs to follow (save my own, of course!):

  • Because a life without music is no life at all, read Kim Ruehl’s blog for great writing on music and community. Kim writes regularly for No Depression, FolkAlley, NPR, and Yes! magazine. Though you can find her work at each of these places, I like to follow her own website. One-stop reading.
  • Ask Amy. Go ahead, ask her! She will answer. The Chicago Tribune’s nationally syndicated advice columnist, Amy Dickinson, is a sure thing for a 2-minute daily ponder regarding some important life lesson. Wondering what to say to your tacky neighbors (nothing, you McSnippy!), your whining children (just do the chores, you lazy kiddos!), the last guy to not return your calls after a date (seriously?! move on!)? No worries, someone has surely asked Amy and she’s provided just the right advice. If you work in a cubicled environment with other people (as opposed to being a zoo keeper), Amy can help you get through the days a little bit easier. Her memoir, The Mighty Queens of Freeville, is also worthy of a list, just not this one. Even better, buy the audio version and Amy will read it to you herself. Follow Amy on Twitter @AskingAmy and catch her from time to time as a regular panelist on NPR’s Wait Wait… Don’t Tell Me!

Yes, I can see that you’re hard-pressed to make an argument that each of these blogs is relevant to the librarian life, but this librarian’s life would be much less of what it is without them. Thanks to each of the writers for writing them!