My second full day of on-the-job informationist-ing leaves me reflecting upon two things:
- Language is one big, messy, pot of mess and,
- Everything I needed to know about data I learned in statistics.
Let’s look at these a little closer. Those who also know me on Facebook can tell you that my status update yesterday afternoon read, “When you really stop and think about communication, you realize that it’s a MIRACLE that we understand one another even half the time.” I can’t share the conversation verbatim, but it’s close enough to say that part of the process evaluation meeting yesterday morning went something like:
So X represents those eligible. Okay. Now Y should be the number of eligible less those eligible for study. And then Z, those eligible and approved, represents a subgroup that should add up to those eligible for the intervention plus those approved, less X. Aha! So now I see that our final N is correct!
Confused? How could you not be? At first I thought it was only me, being new to the process and all, but I admit that I felt a lot better when I noticed others around the table also had that crinkly look on their foreheads. When Dr. Costanza asked, “Would you like me to draw that out for you?” I said with a little too much enthusiasm, “YES, PLEASE!”
The good thing about working with this group is that everyone is in agreement that the biggest obstacle in their study right now is related to communication. In fact, it’s THE reason they were so excited to have me be on their team. I shared last week about the inherent complexities of multiple data sources, many people on the team, several sites/locations involved, tens of thousands of subjects, etc. Trouble communicating between and within all of these is expected. So where do we begin in fixing the problem?
Specifically, definitions of words.
And a mandate to quit using the same word to describe multiple things.
Controlled vocabulary is a librarian’s forte. We cringe when we hear it, but Dewey Decimal did indeed go a long way in helping us make our mark as a profession. Organizing, indexing, cataloging… these things work when we create and/or implement some rules that everyone can follow. God knows I hate the Barnes and Noble method of “cataloging.” You need more than “Philosophy” and “Business” and an alphabet, for heaven’s sake. What my research team wants – and desperately needs – is a data dictionary. They need a way to know what “eligible” means and, if there are multiple levels of eligibility, then we need to give each of these a different name and a definition. Either that, or I’m going to re-introduce cave drawings. I think they might work better.
So, tasked with creating said data dictionary, I began (last week and most of yesterday) identifying and collecting any existing code books and/or dictionaries. Once I have them all, I can then merge them together, look for commonalities, create unique identifiers where needed, clear up the fuzzy language, and then, ultimately, implement the use of the dictionary in future communications. Goal: When someone fills out a data request form for a specific set of data elements, the analysts will know just what the researcher wants.
Which brings me to Reflection #2: Everything I needed to know about data, I learned in statistics. While one might think that the foundation for building a data dictionary, i.e. a code book, is learned in information science, my experience is different. I learned about how to create a code book when I learned about how to do statistics. Before you can collect the first bit of data, you have to have a code book in place, defining each element and/or variable that you’re collecting. You need to be clear that this field in this form is answering this question and in this way. The “this” is really important. I learned a lot about how to organize information in library school, but I learned about collecting information in … statistics.
And I didn’t take statistics in library school.
I admit that I entered into my informationist role with a bias. I’m convinced that library schools need – must – start requiring those students who wish to become academic or research librarians of any sort to do original research. Along with research methods, statistics is the foundation for working with data. We’re simply ill-prepared to embed ourselves into a research team and work with data effectively, to help solve issues related to data, if we don’t know much about it. Yes, you can do it otherwise, but I fear the learning-curve is awfully steep and given all of the other stressors that come simply from trying to get everything done at work nowadays, the fewer hills you have to climb, the better.
Librarians have a head start in that we understand information, but I worry that we too often use the words “data” and “information” interchangeably. That’s a mistake. They have different definitions. They mean different things. And they require different skills when dealing with them.
You could look it up.