Tag Archives: data sharing

All of the Data that’s Fit to Collect

28 Jul

My graduate thesis in exercise physiology involved answering a research question that required collecting an awful lot of data before I had enough for analysis. I was comparing muscle fatigue in males and females, and in order to do this I had to find enough male-female pairs that matched for muscle volume. I took skin fold measurements and calculated the muscle volume of about 150 thighs belonging to men and women on the crew teams of Ithaca College. Out of all of that, I found 8 pairs that matched. It was hardly enough for grand findings, but it was enough to do the analysis, write my thesis, successfully defend it, and earn my degree. After all, that’s what research at this level is all about, i.e. learning how to put together a study and carry it all the way through to completion.

During my defense, one of my advisers asked, “With all of that data, you could have answered ___, too. Why didn’t you?” I hemmed and hawed for a bit, before finally answering, “Because that’s not what I said that I was going to do,” an answer that my statistics professor, also in attendance, said was the right answer. Was my adviser trying to trick me? I’m not sure, but it’s an experience that I remember often today when I read and talk and work in a field obsessed with the “data deluge.”

The temptation to do more than what you set out to do is ever present, maybe even more today than ever before. We have years worth of data – a lot of data – for the mammography study. When the grant proposal was written and funded, it laid out specifics regarding what analysis would be done; what questions would be answered. Five years down the road, it’s easy to see lots of other questions that can be answered with the same data. A common statement made in the team meetings is, “I think people want to know Y” or “Z is really important to find out.” The problem, however, is that we set out to answer X. While Y and Z may well be valuable, X is what the study was designed to answer.

LOD_Cloud_Diagram_as_of_September_2011

“LOD Cloud Diagram as of September 2011” by Anja Jentzsch – Own work. Licensed under Creative Commons Attribution-Share Alike 3.0 via Wikimedia Commons

I see a couple of issues with this scenario. First, grant money is a finite resource. In a time when practically all research operates under this funding model, people have a certain amount of time dedicated, i.e. paid for, by a grant. If that time gets used up answering peripheral questions or going down interesting, but unplanned, rabbit holes, the chances of completing the initial work on time is jeopardized. As one who has seen my original funded aims change over time, this can be frustrating. And don’t hear me saying that it’s all frustrating. On the contrary, along with the frustration can come some pretty cool work. The mini-symposium on data management that I described in earlier posts was a HUGE success for my work, but it’s not what we originally set out to do. The ends justified the means, in that case, but this isn’t always what happens.

The second issue I see is one that I hear many researchers express when the topics of data sharing and data reuse are raised, i.e. data is collected a certain way to answer a certain question. Likewise, it’s managed under the same auspices. Being concerned about what another researcher will do with data that was collected for another reason is legitimate. It’s not a concern that can’t be addressed, but it’s certainly worth noting. When I was finished with my thesis data, a couple of faculty members offered to take it and do some further research with it. There were some different questions that could be answered using the larger data set, but not without taking into account the original research question and the methods I used to collect all of it. Anonymous data sharing and reuse, without such context, doesn’t always afford such, at least not in the current climate where data citation and identification is still evolving. (All the more reason to keep working in this area.)

We have so many tools today that allow faster and more efficient data collection. We have grant projects that go on for years, making it difficult to say “no” to ask new questions of the same project that come up along the way. We are inundated with data and information and resources that make it virtually impossible to focus on any one thing for any length of time.

The possibilities of science in a data-driven environment seem limitless. It’s easy to forget that some limits do, in fact, exist.

Back to the Starting Square

2 May

One of my favorite singer songwriters is Lucy Wainwright Roche. Fans of folk music who don’t know Lucy may well know her familiar last names. The daughter of Suzzy Roche and Loudon Wainwright III, she comes honestly to her musical gifts. One of my favorites of her songs is, “Starting Square.” It’s a song about seeing an old love again and taking note of the changes that happen after relationships end. That’s my take, anyway. And it’s summed up in the line,

I can tell you can tell it from there
That I may have been everywhere
But I’m back
Back to the starting square

Enjoy Lucy singing it.

I may not have been everywhere in the first round of informationist work, but as I met with the principal investigator of my latest grant-funded project this week, I did feel like I’m back at square one. This latest project is really very different from the mammography study that I’ve worked on for the past couple of years. This supplemental grant is to provide informationist services to the larger grant entitled, “A Knowledge Environment for Neuroimaging in Child Psychiatry.” Our ultimate goal (and there are more than a few steps to take before we’ll get there) is “to establish best practices and standards around data sharing in the discipline of neuroinformatics so that it becomes possible to generate accurate, easy to obtain quantitative metrics that give credit to the original source of data.” In short, it’s a project that will hopefully deliver a means for researchers to cite their data for both the purpose of data sharing and to make the science reproducible. I’ll work on determining the proper level of identification for neuroimages, the best identifier for the images (is it a DOI?), and the most efficient means of organizing and naming new data sets that are derived from bits and pieces of multiple other data sets.

During our first meeting, the PI showed me a whole bunch of really interesting websites and told me of many interesting projects happening in this area (directly and tangentially). I came back to my desk and promptly created a new folder of bookmarks for this work. So now… I’m back to the starting square. I’ve got a mountain of stuff to read and watch and become familiar with. It’s like the first day of class. The first assignments. And I need a new notebook!

I include a few of the resources below, if you’re interested in the topic and want to play a little catch up, too. Enjoy!

Two and Two and Two: Making Connections

24 Oct

Two meetings with two principal investigators about two grant proposals over two days lead me to two observations and thoughts about the state of our profession and the work that we do:

1. Is the library a silo, too?

We speak a good bit in the profession about how often those that we serve, our patrons, live and work in silos. Scientists do research in specific areas. Departments treat diseases within a specialized field. Administrators make decisions within the context of the the top level that they know best. It’s very common. And it makes us quite frustrated because the reality of the world is that we rarely function in a world that doesn’t (or couldn’t) benefit from other areas, if only we knew about them. However, “Nobody knows what I do!” is a common cry not just from librarians, but across the board. Is this perhaps a glimpse that we, like our patrons, are living in a silo that we’ve created for ourselves? 

Yesterday, I sat down with a researcher to do some work on the proposal that we’re submitting for the next round of informationist grants from the National Library of Medicine. It is an absolutely fantastic project and each time that I come away from talking with Dr. Kennedy, I can’t help but think how refreshing it is to speak with a researcher who knows as much, no, more than I do, around the issues related to data sharing. Turns out that he’s internationally known as a proponent of data sharing in his field (neuroimaging), leading projects and initiatives and working groups and all sorts of attempts at advocating among his peers for the necessity of this practice. It is by chance – pure chance – that our paths crossed and that this crossing led us to work on the grant proposal together. You see, he knew of the RFP for the informationist supplement grants because of his connections to colleagues at the National Library of Medicine. I happened to give a talk at one of his lab’s meetings awhile back on an unrelated topic and he noted that the title on my signature line includes “informationist.” Thus, he asked me what this meant, what I did, what I was doing related to the supplement awards, and if I’d be interested in helping him on a project idea that he had. This is how we came to yesterday.

What I want to point out, however, is that Dr. Kennedy came across this information with no connection to the library. He learned of it from a colleague at the National Library of Medicine, yet that colleague, evidently, didn’t think to point him to his library as a place to find an informationist. 

Are you following me?

There’s a chat happening on the MEDLIB-l listserv today (and other days and in other circles of our profession, too, of course) regarding our name, i.e. should we incorporate “knowledge” into our job titles, use it in some form instead of “library” to describe our workplace, etc. I’m not going to get into that discussion here, but I bring it up because a consistent thread in these discussions is that if our patrons don’t know the value of the library, then we are evidently doing something wrong in our work.

To this I say, “Yes and no.” 

Yes, sometimes we haven’t done the best job at getting out and letting people know how we can build partnerships, collaborate on research projects, embed ourselves in curriculum, teach classes on a variety of relevant subjects, and much more. Our history is as a passive profession. For years and years and years we were able to meet our patrons here, in the library. They had to come to us to use our resources. Once here, they made the association that librarians were important because libraries had resources. But those days have been gone for decades now and we haven’t always been the best at getting out and helping people associate us less with the library and more with our skills. WE are the resource that we really need to save now, not the library or the journal collections or the subscription to UpToDate. We cost the administration more than those other resources, thus we best be able to prove that we are the resource worth keeping the next time the forced budget cuts come along.

But I also say no to the belief that if people don’t know our value, we’re doing something wrong. I’ve done a ton of right things over the past year and a half as an embedded informationist that have led me to all sorts of fantastic new opportunities, yet still it’s only by chance that I discover someone right here on my very campus who has been working on and advocating for many of the same things we’ve been talking about here in the Library. We work in different worlds, all of us, and despite the forward strides and promise of networked science, it remains so darned impossible to be able to make all of the connections that we could make that would ultimately lead to better work, e.g. science, medicine, information management. Work that would prove our value.

To me, that realization really hit home when yesterday when I thought about how someone who works for the National Library of Medicine, the funding agency behind these informationist grants (the National LIBRARY of Medicine) didn’t associate the library with those awards. I don’t say that out of any place of judgment, either. Well… maybe a little, but the truth is that there’s no point in judging and/or blaming and/or pointing fingers. It is simply our reality. We all live and work in some degree of a silo, but if we want to be associated with value, we need to be valuable. Visible and valuable. Both.

2. “You have a unique skill that only a handful of people on this campus have.”

I was told this today by another principal investigator as we discussed the rewriting of another grant proposal. The skill she refers to is my knowledge of how to use and leverage social media for all sorts of positive things. Her point was that when you have something that few others have, you’ve got to use it. Social media is trendy in medical research today, but few medical researchers actually use social media. They want the money to do the research, yet don’t have the expertise in the products to know how to use them effectively. Thus, when you do have the expertise, you have value. Research teams need you on their team. This is terrific!

Yet I felt myself hesitating at the thought that as a librarian, the skill I would bring to a research team lies in social media. Is that a librarian skill? As we talked though, the researcher described to me how knowing the social media tools and the social media landscape affords you the skill of knowing better how to collect and manage the data that’s generated from the use of these tools. Novices don’t have that. And data management… now THAT is a skill that the library is clamoring to get into. But even for me and my “out of the library box” thinking, making this connection took a few minutes. Even for me! 

It surprised me, but I wonder if as we break out of our silos and work closely with others, perhaps one of the things that gets blurry is the answer to the question, “Who knows what?” What are librarians supposed to know? What are researchers supposed to know? What do doctors know? And who does what? I think that it’s this vagueness that makes us argue over (or more politely, discuss) what we call ourselves, what services we provide, and what our value really is. Silos and walls keep us separated, but they also keep us neat and orderly. We say that they need to go. Are we ready for the flood of uncertainty that all the mixing-up to come will bring? 

National Preparedness Month was last month, but you can still celebrate it today.

life-preserver