Archive | Aim 1 RSS feed for this section

Let’s Ask the Expert

26 Mar

Normal Distribution

The research team has a new statistician; not a new analyst, but new statistician. If you look at it as a pecking order, the statistician oversees the analyst. Our former statistician retired recently, leaving the team to find a replacement. The University has a relatively new Quantitative Health Sciences Department and many of the services once procured through individual department statisticians are now going through QHS. Or at least this is how I think it’s going. These are things that I don’t necessarily need to know and as I have plenty of things occupying my “need to know” gray matter right now, I can just follow along here.

The significance of the new team member, to me, was that it generated the need for a meeting so that he could be brought up to speed on the project. This meeting happened this afternoon. I believe it was good for him (as well as the Chair of the Quantitative Methods Core, his boss, also in attendance). I know that it was good for me. I’ve now heard the project and its various aspects described on a number of occasions, and each time gain some new insight. Today, that insight was that I have a pretty good grasp on where the data for this study comes from, the different sources that generate it, how it’s stored, where it’s stored, who’s managing it, and so forth. I also had a pretty clear understanding of where the problem spots and/or issues with it are (mostly gone over, yet again, in today’s morning meeting).

I decided to pay close attention during the meeting on the questions that the statistician asked. I imagine that these are the kinds of questions that an informationist, embedded librarian, or anyone concerned with data management and planning would ask a research team. Here are some that I noted. If you’re doing an interview with a researcher about his/her data, are you asking these questions?

  • Is the data in one place or multiple places? 
  • Do the different sources merge together easily?
  • Are the variable names consistent across the sources?
  • Where is the merged data stored and how?
  • When and/or how often do you do data pulls from the sources?

Additionally, the statistician said that he wanted to be walked through the process. He wanted to generate a visual for himself of how everything works together. I found this request confirmation of much of what I’ve been reading and thinking about in terms of how we best see, understand, and communicate systems and processes. Visuals are important. I remember meeting with one of the chief programmers a few months back and how helpful it was when he pulled out a marker and drew us a picture on the whiteboard to explain all of this.*

*NOTE: If you’re interested in the art of explanation, check out The Art of Explanation by Common Craft founder, Lee Lefever. I’m pretty sure I mentioned this a few posts back, but in case you missed it… Also, Common Craft has made wonderful templates of their cut-out characters available for free to download and use in your own creations. Give it a try and see how well you do at explaining a concept or problem. Make a little video and share it with me.

So, if you’re keeping up with the process of the research study, the next step for the statistician is to collect data from the first cohort and start to play with it; see what it shows so far; see if it identifies any gaps of missing data and/or holes in the process that need to be addressed. It’ll be a couple of months, at least, before we hear back, but it was obvious that the team was excited about this move.

A few questions that I’m left with, following today, are:

  • What’s the difference between an analyst and a statistician?
  • What is my role, if any, in this aspect of the study?

One last interesting aside – When we went around the table to introduce ourselves and I said, “I’m from the library, serving as the informationist,” Dr. Barton, the Director of the Quantitative Methods Core said, “Oh, good.” I’m the only one who got an “Oh, good.” I’ve no idea what he meant by it, but I like to see it as a positive sign that my library is engaged in this kind of work. Regardless, it was a nice gesture.

Repeat After Me

13 Mar

Quote from Science

Preparing for some upcoming work, I took part in a webinar on systematic reviews yesterday morning. It was a brief, but good, review/overview of the process and the roles librarians and/or information scientists have in it. One thing that stuck out for me was the reminder by Dr. Edoardo Aromataris of the Joanna Briggs Institute, one of the program’s speakers, that a systematic review is a type of research and as such, it needs to be reproducible. He noted that the search strategy ultimately constructed in a review should yield pretty much the same results for anyone who repeats it.

Replication is a hallmark of the scientific method. As Jasny et al state in the above-referenced quote from a special issue of Science on data replication and reproducibility, it is the gold standard of research. Science grows in value as it builds upon itself. Without the characteristic of replication, such growth is thwarted and findings become limited to a study’s specific subject pool. If a study’s design becomes so complicated and the research question(s) keep changing along the way, the study’s value gets clouded, if it remains at all.

I remember during my master’s thesis defense, one of my advisers asked me why I hadn’t done a particular statistical analysis to answer another question about the data I collected. I admit that the question threw me, but after thinking about it for a moment, I said, “Because that isn’t what I said that I would do.” My statistics professor, who was also sitting in on the defense, said calmly, after I hemmed and hawed and tried to defend my answer in a long and drawn out way, “That’s the right answer.” In other words, when I proposed my study and laid out my methodology, I stated that I would do “x, y, and z.” If I later decided to do “q” simply because I thought “q” was more interesting, I wouldn’t have necessarily answered the research question that I set out to answer, nor would my methods be as strong as I initially put forward.

I bring all of this up this week because as I’ve been sitting in on the weekly meetings of my research team these past months, I can’t help but notice how often new questions are asked and how often those questions result in an awareness that the data needed to answer them is missing. This fact then leads to a lot of going back and gathering the missing data. Sometimes this is possible and sometimes it isn’t. For instance, you might go to see your doctor one time and you’re asked the question, “Do you smoke?” But the next time you visit, the nurse doesn’t ask you that same question. Usually, you’re asked something like, “Are you still taking (name the medication)?” You answer, “Yes,” but you fail to mention that you’ve changed dosage. Or that your doctor changed the dosage sometime during the past year. Is that captured in the record? Maybe, maybe not. And further, some insurance carriers require certain patient information while others do not. If you’re drawing subjects for a study from multiple insurance carriers, you’d better be sure that each is collecting all of the data that you need, otherwise you cannot compare the groups. As the analyst on our study said yesterday, “If you can’t get all of the data, you might as well not get any of it.”

Now please remember that I am working as an informationist on a study led by two principal investigators and a research team that has being doing research for a very long time. They have secured any number of big grants to do big studies. They are well-respected and know a whole helluva lot more about clinical research than me and my little master’s-thesis-experienced self. I’m not questioning their methods or their expertise at all. Rather, I’m pointing out that this kind of research – research that involves a lot of people (25+ on the research team), thousands of subjects, a bunch of years, several sources of data (and data and data and data…), and a whole lot of money over time – is messy. Really, really messy! In other words, an awful lot, if not the majority, of biomedical and/or health research today is messy. And as an observer of such research, I cannot help but wonder how in the world these studies could ever be replicated. As that issue of Science noted, research today is at a moment when so many factors are affecting the outcomes that it’s a time for those involved in it to stop and evaluate these factors, and to insure that the work being done – the science being done – meets high standards.

More, as a supposed “expert” in the area of information and a presumed member of the research team, I’m feeling at a loss as to what I can do, at this point in the study, to clean it up. Yes, I admit that yesterday just wasn’t my best day on the study and maybe that’s coloring part of my feelings today. I didn’t have anything to offer in the meeting. I didn’t feel like much of a part of the team. It happens.

So can I take a lesson from the day’s events? The answer to that is equivocally “YES!” and here’s why…

In the afternoon, I had a meeting with a different PI for a different study. We’re exploring areas where I can help her team; writing up a “scope of work” to embed me as an informationist on the study. It’s a very different kind of study and not as big as the mammography study (above), but it still involves multiple players across multiple campuses, and it ultimately will generate a whole bunch of data from a countless number of subjects. The biggest difference, though, is timing. And this is the take-away lesson for me in regards to what brings success to my role. When a researcher is just putting together his/her team, when s/he is just beginning to think about the who and what and where and why of the study, if THEN s/he thinks of including an individual with expertise in information, knowledge, and/or data management, the potential value of that person to the team and to the work is multiplied several fold.

This is because it’s in the beginning of a study when an informationist can put his/her skills to use in building the infrastructure, the system, and/or the tools needed to make the flow of information and data and communication go much more smoothly. It’s hard to go back and fix stuff. It’s much easier to do things right from the beginning. Again, I’m not saying that the mammography study is doing anything wrong, but building information organization into your methods from the get-go can surely help reduce the headaches down the road. And fewer headaches + cleaner data = better science, all the way around.

He Said, She Said (and who can possibly remember?)

13 Feb

One of the tasks I have as an informationist on the study team is to help improve communication. In fact, it’s Aim #1 in the proposal we wrote to the National Library of Medicine for the grant: “Develop tools to improve data specification and communication.” For most of the past month or so, I’ve been working on a data request form. Back and forth and back and forth we go with iterations of it. Last week, it finally went through a test-drive as one of the principal investigators used it to request several analyses from our analyst. (Isn’t it convenient for an analyst that s/he does analyses? So clear. An analyst analyzes. A librarian… librarianizes? We should be so lucky.)   It’s back in my hands now to make a few more tweaks based upon her feedback, but it’s coming along nicely. Hopefully, it will become a well-used tool in the future, making the communication of statistical analyses between requester and analyst  more efficient.

As I sat in on yesterday’s meeting, I heard in the conversation another area where a tool would help improve communication between team members. Much of the history of this study can be found in email correspondence. Often, someone will say something like, “I remember that we changed such and such to so and so back in 2010,” and the indication is that somewhere in the virtual mound of emails of 2010, there exists documentation of this change. Everyone remembers the email, the discussions during team meetings, the outcome, etc. but the details are sometimes lacking. When it comes to writing articles, however, a lot of these details become very important pieces of information needed to describe exactly what happened and when. I began to wonder if we had a searchable archive of all of the email involved in the study, would it be a useful tool for the team. I posed the question later in the afternoon (via an email, of course!) and heard back from several people that they agreed.

To figure out how to accomplish this task, I began searching for things like communication log software, email exporters, and tools for Outlook. I also revisited Zoho Creator to see how and if it could work to create a database for these things. Basically, my thinking was to export pertinent fields like date, sender, and body of the email; index them (using tags); and make them searchable. Then, if someone was curious about the development of the phone counseling system, s/he could do a search for “MCRS” in all of the emails and receive a nice, chronological report of everything communicated about the process during the software development. “This is good!” I thought.

Screen Capture of search results.

Screen Capture of search results. A mini test.

I set to work downloading the add-on tool for Outlook that I decided on, Code Two Outlook Export. It was pretty straightforward, no hiccups or frustrations. Then I practiced exporting the “Informationist” folder in my email inbox. The export gave me a csv file that I then opened in Excel. I didn’t get exactly what I wanted, so I tried a few other export field options until it looked right. At this point though, I could tell there will be a good bit of cleanup to do in the Excel file. We have a lot of stuff in the body of emails – stuff that runs all together in an Excel cell. I decided to delete content in the body of the emails that was irrelevant and/or redundant. This helped a lot. Once I had the spreadsheet the way I wanted it, I then uploaded it into a new application in Zoho Creator, did some more tweaking here and there, and eventually got something that worked!

Admit it. It’s always a rush when you create something, isn’t it? 

I sent some screen shots to the team members and asked for feedback. Already I’ve heard from several who think it’s a great idea! It will take some doing to collect and cleanup several years of emails related to the study from everyone involved, but I think it will be a real help. Also, the system will be in place for future studies. As a matter of fact, I already have laid out in my mind how I can use this with the new CER group that I’m going to be embedded in soon. As their email list is fairly new, it will be a much easier start-up.

If you decide to try either of these tools – or if you’ve instituted a similar email archive to help with communication within a group – I hope you’ll share your experience in the comments section here. It will be great to hear what works for others.