Archive | March, 2013

So DO You Want to be a Data Scientist?

28 Mar

Last week, a colleague that I follow on Twitter retweeted a post from the blog, NatureJobs, titled , So you want to be a data scientist by Michael Koploy of SoftwareAdvice.com. The colleague who originally brought the piece to my attention, Kristi Holmes, PhD, is a bioinformaticist at Becker Medical Library at Washington University in St. Louis School of Medicine. She’s also an all-around good egg and one of my absolute favorite colleagues in the field, but that’s beside the point. I would have read the piece regardless of who tweeted it to my attention. However, because it came from Kristi, we then engaged in a mini tweetchat that we’ve had before, i.e. Where and what is the intersection between data scientists and librarians, if there even is one?

One of the interesting things about this discussion, to me, is that Kristi is a scientist who happens to work in a library, while I am a librarian, trying to work in the arena of scientists. And from our different perspectives, she is the one who is routinely much more optimistic about librarians getting into the area of data than I. There’s probably a thing or two you can decipher from this, but that’s for another time.

Another thing that happened after I retweeted and commented on the post was that I got an email from Brittany Richards at Software Advice thanking me for the tweet and additionally, asking if I’d do a blog post of their article here on the Librarian Hats blog. Specifically, Brittany wrote, “You mentioned library science and I was interested to see your thoughts on how the two are related to each other.”

Now if you’ve read this blog for any time, you know my answer was an enthusiastic, “SURE!” So here goes – a recap of that article and some summarizing of conversations I’ve had with Kristi and other scientists on the topic:

scienceI once saw/heard a librarian give a presentation where he identified himself as a data scientist. I called him on it. I am a librarian with a graduate degree in library & information science. I also have a graduate degree in an applied biological science (exercise physiology). Given that background, I feel pretty comfortable stating that while the two share the word, there is a world of difference between the science that librarians do and that which takes place in laboratories, clinics, the field, etc. As I’ve stated in this blog before, my background in exercise physiology is what I feel gives me the extra tools that I need to be effective as an informationist. That’s the science background that is recognized in the sciences.

I hope you don’t hear me dissing my library degree, education, or career. I’m not at all. They are just different and when I read articles like Koploy’s, as well as many books on data, specifically library and librarians’ roles in working with data, I cannot help but keep this thought in mind. It’s what comes to my mind. Every time.

In his post, Koploy recalls the description of a data scientist that he got from Bruno Aziza, a big name in Big Data. Aziza called a data scientist a “business analyst-plus.” He highlights mathematics, statistics, and business strategy as their core skills. Koploy himself adds, “While programming and statistical expertise is the foundation for any data scientist, a strong background in business and strategy can help jettison a younger scientist’s career to the next level.” Further, he notes that successful data scientists are drawn from the fields of biostatistics, econometrics, engineering, computer science, and the like. I’ve read the article several times. Library or information science is not on the list.

Again, this isn’t a slight against my field, but rather an observation that there are different skill sets required for different jobs and the job of a data scientist is not the job of a librarian. And vice versa.

So the question then becomes, how much does a librarian – or an informationist – need to learn to become a data scientist? I say, “A lot.” However, that “a lot” comes with the assumption that one isn’t entering data science from one of those previously mentioned fields. If this is the case, then of course, that individual is well prepared. You’ll note though, that even with the background, Koploy points out that data science is (1) fast-growing, (2) extremely competitive, and (3) new. Even the most seasoned statistician needs to learn some new skills and/or subjects to keep up.

The optimistic among us – those who believe the cross-over between information and data science is broad – focus upon those characteristics that are, in fact, mentioned by experts in the data science field as ones that separate the exceptional data scientist from the average; inquisitiveness, the ability to spot trends, and the tendency (skill) to ask the right questions. It’s the latter where librarians, informationists, and information scientists both have experience and often excel. We know how to ask the right questions that get to the heart of information problems, e.g. How does the business work? How does it collect data? How will it use the data? (per Krishna Gopinathan, Global Analytics Holdings)

So, do you want to be a data scientist? If you’re a librarian or an informationist, depending upon your background, you may or may not have a little or a lot of work to do to get ready to take on the role. If you don’t have the background, I see two possibilities:

  • Get it (hit the books!)
  • Find the right partner(s) where your skills can be paired to produce a good data science team

We choose careers for a lot of different reasons, but I like to believe that in the best case scenario, we choose something that we’re both interested in and good at. Remember those aptitude tests you took in the guidance counselor’s office in high school? They were (and still are) meant to measure something. They measure what we like and what we have an aptitude for. They measure what career would fit us best. It means something to be a librarian. It also means something to be a scientist. I believe that it’a a sign of the times, and a bit of a challenging time at that, that careers and skills and tasks that once sat neatly within cubicles and labs and computer workstations are now all mixed up together. This melting pot of vocations is difficult to navigate. On the one hand, it opens a wealth of new opportunities. On the other, though, it means for everyone working with information and/or data, we will never enjoy sitting back and doing the same old same old for very long.

If you’re interested, I also encourage you to read the original piece that Michael Koploy wrote, along with some of the links he suggests for further reading. In particular, I really enjoyed Hilary Mason’s blog. Good stuff there. I also happened to notice, just this morning, that Coursera’s free Introduction to Data Science class that’s listed is starting up in the not too distant future. If it piques your interest, give it a go. You might well find that you have a hidden talent that will take you far in this new area.

Which brings me full-circle to the question I began with, i.e. Is this new area in the library? Well, quite obviously there are individuals like Kristi, bioinformaticists and data scientists who find their home in libraries*. There are also librarians or informationists with training in data science who find their homes outside of the library. And then there are librarians. And then there are data scientists. In other words, there’s a big mix of us. If you’re comfortable in the mix and you’re up to the task of getting and/or honing new skills, you’ll likely do really well wherever you are.

The times they are a changin’, sings Mr. Dylan, and we look to change with them. At the same time, though, we need to be realistic. We need to see clearly what we know, what we do well, what we like, and more. We need changes in graduate education across the board to address these issues, and likewise those of us working need to accept that we’ll be learning for a lifetime. These are the times we live in. You can’t just call yourself something different. You need to do something different. Or do things differently. Likely all of the above.

special agents rockin

Rockin’ out with my pals, The Special Agents, at Houghton Elementary School. Support art, music, and physical education in your public schools, people! You could get a band out of it.

Now I’m off to play drums with a friend’s band, dressed up like the Cat in the Hat. You’ve got to have a really big tool box o’ skills, friends. Really big!

* And then there’s the matter of money. If you have the chops to get a job as a data scientist, are you willing to work for about half of what you could make in business or industry than you will in a library? It’s a question that comes up in our professional discussions often. If you want to have at it in the comments section to this post, go for it!

Informationist Map: Climb Aboard!

27 Mar

eSci-Sym-Poster_GORE_web

 

My poster for next week’s eScience Symposium for Librarians. If you’re coming, you can pick up your own pocket-sized map!

Let’s Ask the Expert

26 Mar

Normal Distribution

The research team has a new statistician; not a new analyst, but new statistician. If you look at it as a pecking order, the statistician oversees the analyst. Our former statistician retired recently, leaving the team to find a replacement. The University has a relatively new Quantitative Health Sciences Department and many of the services once procured through individual department statisticians are now going through QHS. Or at least this is how I think it’s going. These are things that I don’t necessarily need to know and as I have plenty of things occupying my “need to know” gray matter right now, I can just follow along here.

The significance of the new team member, to me, was that it generated the need for a meeting so that he could be brought up to speed on the project. This meeting happened this afternoon. I believe it was good for him (as well as the Chair of the Quantitative Methods Core, his boss, also in attendance). I know that it was good for me. I’ve now heard the project and its various aspects described on a number of occasions, and each time gain some new insight. Today, that insight was that I have a pretty good grasp on where the data for this study comes from, the different sources that generate it, how it’s stored, where it’s stored, who’s managing it, and so forth. I also had a pretty clear understanding of where the problem spots and/or issues with it are (mostly gone over, yet again, in today’s morning meeting).

I decided to pay close attention during the meeting on the questions that the statistician asked. I imagine that these are the kinds of questions that an informationist, embedded librarian, or anyone concerned with data management and planning would ask a research team. Here are some that I noted. If you’re doing an interview with a researcher about his/her data, are you asking these questions?

  • Is the data in one place or multiple places? 
  • Do the different sources merge together easily?
  • Are the variable names consistent across the sources?
  • Where is the merged data stored and how?
  • When and/or how often do you do data pulls from the sources?

Additionally, the statistician said that he wanted to be walked through the process. He wanted to generate a visual for himself of how everything works together. I found this request confirmation of much of what I’ve been reading and thinking about in terms of how we best see, understand, and communicate systems and processes. Visuals are important. I remember meeting with one of the chief programmers a few months back and how helpful it was when he pulled out a marker and drew us a picture on the whiteboard to explain all of this.*

*NOTE: If you’re interested in the art of explanation, check out The Art of Explanation by Common Craft founder, Lee Lefever. I’m pretty sure I mentioned this a few posts back, but in case you missed it… Also, Common Craft has made wonderful templates of their cut-out characters available for free to download and use in your own creations. Give it a try and see how well you do at explaining a concept or problem. Make a little video and share it with me.

So, if you’re keeping up with the process of the research study, the next step for the statistician is to collect data from the first cohort and start to play with it; see what it shows so far; see if it identifies any gaps of missing data and/or holes in the process that need to be addressed. It’ll be a couple of months, at least, before we hear back, but it was obvious that the team was excited about this move.

A few questions that I’m left with, following today, are:

  • What’s the difference between an analyst and a statistician?
  • What is my role, if any, in this aspect of the study?

One last interesting aside – When we went around the table to introduce ourselves and I said, “I’m from the library, serving as the informationist,” Dr. Barton, the Director of the Quantitative Methods Core said, “Oh, good.” I’m the only one who got an “Oh, good.” I’ve no idea what he meant by it, but I like to see it as a positive sign that my library is engaged in this kind of work. Regardless, it was a nice gesture.

17 Mar

salgore:

This is a post from my personal blog, but I’m sharing it on the “Librarian Hats” blog for my librarian friends who might want to see some fun work on parade!

Originally posted on blahg, blahg, blahg...:

I have several sketchbooks traveling the country (and Canada) this year through projects of the The Sketchbook Project (Art House Co-op) of Brooklyn, NY. If one comes to your town, I hope you’ll take the chance to seek it and its many friends on the road trips.

The Memoir Project

500 handwritten books from writers and illustrators around the globe.

  • Brooklyn – June 28-30
  • San Francisco – July 26-28
  • Washington, DC – August 16-18

The Mysterious Maps Tour Mobile Library Tour

The Mystery Maps Tour asks you to make original maps of real and imagined places.

  • Providence, RI – June 13
  • Portland, ME – June 14
  • Montreal, Quebec – June 17

The 2013 Sketchbook Tour

11,000 sketchbooks on the road starting March, 2013. Check ‘em out!
Brooklyn, Austin, Atlanta, Toronto, Chicago, Portland (OR), San Francisco, Chicago, and Los Angeles.

View original

What does success look like?

15 Mar

In her memoir, The Mighty Queens of Freeville, author, columnist, and occasional panelist on NPR’s “Wait, Wait… Don’t Tell Me!,” Amy Dickinson, writes, “I am surrounded by people who are unimpressed with me.” It’s perhaps my favorite line in a book that I really liked. The self-deprecating humor of famous folks. It’s funny. As I was walking from the parking lot to the library this morning, I couldn’t help but think that it’s been a bit of a surreal week for me as I’ve had encounters with some incredibly successful people. To paraphrase Ms. Dickinson, “I am surrounded by impressive people… and I remain impressed by them.”

It’s a testament to the world we live in, the social media aspect of it in particular, that I had the week that I had…

Friday, March 8 - Brattleboro, VT

Friday, March 8 – Brattleboro, VT

Rosanne Cash

Last Friday, my spouse and I traveled to Vermont to see Rosanne Cash and John Leventhal in concert. This was not the first time I got to meet Rosanne. We became acquainted via Twitter about a year and a half ago. I followed her. We tweeted back and forth to one another a few times. She started following me. In November of 2011, I got tickets to see her perform in Fall River, MA and asked if she’d be so kind as to let me say hi to her after the show. Ever gracious, she did. The same happened last Friday. Kind and funny and smart and one of the greatest singer-songwriters of our time, she gave me a hug, joked about our mutual love of ironing (remembering this from our previous meeting), talked about librarians… it’s one of those moments I’ll cherish. And then, perhaps even more unreal, the next morning as Lynn and I were walking down Main Street, we heard from behind us, “Hello, ladies!” Turned around and there was Rosanne. We chatted for a minute on the sidewalk in Brattleboro, VT like some kind of old friends. Pinch me.

Sherry Pagoto

Bright research star here on our campus, #Plankaday Nation co-founder, author of the #1 health blog of 2011 (FUdiet), and one of my biggest advocates for the new work I’m doing on campus, Sherry Pagoto and I hung out in her office on Tuesday to work on the details of a proposal that will allow me to work on her President’s Award grant. She took me on a few years ago as an exercise physiologist for one of her studies and today is a fantastic champion of me as an informationist. We  have a Nobel Laureate on campus, a few Howard Hughes investigators, and some really outstanding leaders in biomedical and health sciences research. How I got lucky enough to have one of them in my corner… well, pretty lucky!

Facebook chatting with Amy, Wednesday, March 13.

Facebook chatting with Amy, Wednesday, March 13.

Amy Dickinson

As mentioned earlier, I’m also a fan of Amy Dickinson, the Amy of the syndicated advice column, “Ask Amy.” We also “met” through Twitter and I take part in the discussions she tosses out on her Facebook page. She’s promised to take part in my “Jam 51″ birthday party, if she can. Maybe the folks in Freeville are unimpressed, but not me. I’m counting on the face-to-face meeting in the future. In the meantime, I’m working up some turmoil in my life so that I can call into her Thursday noontime webcast from the Chicago Tribune. And look! She’s hoping for the same. :)

Suzy Becker

I had lunch yesterday with my uber talented and brilliant friend, Suzy Becker. Suzy is an author and a cartoonist and a teacher and one of those people you’d hate if she weren’t so darned nice. We talk over chicken shawarma sandwiches about girl’s high school basketball, her next book, her latest class at the Worcester Art Museum, her innate aptitude for Twitter, Lynn’s and my trip to Brattleboro, the PBS documentary “Makers,” why no women have sports talk shows, and the fact that she’s been on the Diane Rehm Show three times (3 times?!). She gives me a lucky horse shoe as a belated birthday present. I’m going to hang it in my new studio. She leaves to get her kid to the dentist on time and I walk home, still thinking about talking to Diane Rehm and helping someone with a Ford Foundation grant and knowing someone who’s putting together a new radio show and… lunch with me?

So “Lean In” on This

As I think about my week and how it intersected in different ways with 4 unbelievably successful women, I notice how not a single one of them fits the mold of “success” that Sheryl Sandberg espouses in her book, “Lean In,” that coincidentally also had a big week. Sandberg has been all over the air waves, sharing her thoughts on why women have not achieved success equal to men, despite now years of “equality.” We need to lean in, be more aggressive, change our priorities. Maybe. If you want to be the CEO of a gazillion dollar enterprise. Me, I’m glad for the successful people that I know (or at least have had the chance to briefly meet) in my life. And incidentally, not a one of them fits Sandberg’s definition of success.

Tip #1 in Daniel Coyle’s, “The Little Book of Talent” is “Stare at Who You Want to Become.” These are some of the people that I stare at. Despite their respective success – and a few of them are darned successful! – I’m not star struck. (Well, maybe a little.) No, just grateful to see and know and have people in my life to stare at, so that I can model the things that they do that bring them success.

How about you? How was your week? Did you find inspiration from anyone? Do you look to certain people to be your models of success?

(As an aside, just as I was finishing this post, my friend and colleague, Lisa Palmer, showed me pictures of her trip to Italy – when Pope John Paul II blessed her in 1983. I think it may have been some divine message for me to stay humble. I am surrounded by people who are unimpressed with me.)

Repeat After Me

13 Mar

Quote from Science

Preparing for some upcoming work, I took part in a webinar on systematic reviews yesterday morning. It was a brief, but good, review/overview of the process and the roles librarians and/or information scientists have in it. One thing that stuck out for me was the reminder by Dr. Edoardo Aromataris of the Joanna Briggs Institute, one of the program’s speakers, that a systematic review is a type of research and as such, it needs to be reproducible. He noted that the search strategy ultimately constructed in a review should yield pretty much the same results for anyone who repeats it.

Replication is a hallmark of the scientific method. As Jasny et al state in the above-referenced quote from a special issue of Science on data replication and reproducibility, it is the gold standard of research. Science grows in value as it builds upon itself. Without the characteristic of replication, such growth is thwarted and findings become limited to a study’s specific subject pool. If a study’s design becomes so complicated and the research question(s) keep changing along the way, the study’s value gets clouded, if it remains at all.

I remember during my master’s thesis defense, one of my advisers asked me why I hadn’t done a particular statistical analysis to answer another question about the data I collected. I admit that the question threw me, but after thinking about it for a moment, I said, “Because that isn’t what I said that I would do.” My statistics professor, who was also sitting in on the defense, said calmly, after I hemmed and hawed and tried to defend my answer in a long and drawn out way, “That’s the right answer.” In other words, when I proposed my study and laid out my methodology, I stated that I would do “x, y, and z.” If I later decided to do “q” simply because I thought “q” was more interesting, I wouldn’t have necessarily answered the research question that I set out to answer, nor would my methods be as strong as I initially put forward.

I bring all of this up this week because as I’ve been sitting in on the weekly meetings of my research team these past months, I can’t help but notice how often new questions are asked and how often those questions result in an awareness that the data needed to answer them is missing. This fact then leads to a lot of going back and gathering the missing data. Sometimes this is possible and sometimes it isn’t. For instance, you might go to see your doctor one time and you’re asked the question, “Do you smoke?” But the next time you visit, the nurse doesn’t ask you that same question. Usually, you’re asked something like, “Are you still taking (name the medication)?” You answer, “Yes,” but you fail to mention that you’ve changed dosage. Or that your doctor changed the dosage sometime during the past year. Is that captured in the record? Maybe, maybe not. And further, some insurance carriers require certain patient information while others do not. If you’re drawing subjects for a study from multiple insurance carriers, you’d better be sure that each is collecting all of the data that you need, otherwise you cannot compare the groups. As the analyst on our study said yesterday, “If you can’t get all of the data, you might as well not get any of it.”

Now please remember that I am working as an informationist on a study led by two principal investigators and a research team that has being doing research for a very long time. They have secured any number of big grants to do big studies. They are well-respected and know a whole helluva lot more about clinical research than me and my little master’s-thesis-experienced self. I’m not questioning their methods or their expertise at all. Rather, I’m pointing out that this kind of research – research that involves a lot of people (25+ on the research team), thousands of subjects, a bunch of years, several sources of data (and data and data and data…), and a whole lot of money over time – is messy. Really, really messy! In other words, an awful lot, if not the majority, of biomedical and/or health research today is messy. And as an observer of such research, I cannot help but wonder how in the world these studies could ever be replicated. As that issue of Science noted, research today is at a moment when so many factors are affecting the outcomes that it’s a time for those involved in it to stop and evaluate these factors, and to insure that the work being done – the science being done – meets high standards.

More, as a supposed “expert” in the area of information and a presumed member of the research team, I’m feeling at a loss as to what I can do, at this point in the study, to clean it up. Yes, I admit that yesterday just wasn’t my best day on the study and maybe that’s coloring part of my feelings today. I didn’t have anything to offer in the meeting. I didn’t feel like much of a part of the team. It happens.

So can I take a lesson from the day’s events? The answer to that is equivocally “YES!” and here’s why…

In the afternoon, I had a meeting with a different PI for a different study. We’re exploring areas where I can help her team; writing up a “scope of work” to embed me as an informationist on the study. It’s a very different kind of study and not as big as the mammography study (above), but it still involves multiple players across multiple campuses, and it ultimately will generate a whole bunch of data from a countless number of subjects. The biggest difference, though, is timing. And this is the take-away lesson for me in regards to what brings success to my role. When a researcher is just putting together his/her team, when s/he is just beginning to think about the who and what and where and why of the study, if THEN s/he thinks of including an individual with expertise in information, knowledge, and/or data management, the potential value of that person to the team and to the work is multiplied several fold.

This is because it’s in the beginning of a study when an informationist can put his/her skills to use in building the infrastructure, the system, and/or the tools needed to make the flow of information and data and communication go much more smoothly. It’s hard to go back and fix stuff. It’s much easier to do things right from the beginning. Again, I’m not saying that the mammography study is doing anything wrong, but building information organization into your methods from the get-go can surely help reduce the headaches down the road. And fewer headaches + cleaner data = better science, all the way around.

What is it again that you do?

7 Mar

Question-MarkHave you ever noticed how if you’re thinking of something in particular, it begins appearing more often in your life? It happens all the time. If you’re thinking of some old song, it pops-up on the radio. If you’re thinking of a person you haven’t heard from in awhile, you get an email or a letter from them. And if you’ve been thinking about something related to your work – some general idea or a belief about how things go – all of the sudden, everyone is thinking of that idea; everyone believes this (or is actively arguing against it!).

One thing that I’ve noticed the profession of librarianship talk about and/or think about and/or explore over the past decade that I’ve been a librarian is our identity. My role now, as an informationist, is a direct example of this exploration. Informationists are another kind of librarian – another way that we’re doing our job. We try on different names a lot. It’s one strategy for trying to sell our skills and our value to others, oftentimes new groups and/or patrons. As such, we spend a lot of time explaining what we do.

I was in a meeting just this morning where I was asked directly, “So what is it that you’re doing, specifically, for the CER group?” I was asked a very similar question on Tuesday, while giving my lecture to the graduate class on Team Science. It also happened in a meeting last Thursday. It happened in a conversation I was having with a church member the other night. It happens at the supper table on a fairly regular basis. “What is it that you do again?”

I used to think that this was simply a side effect of being a librarian. It’s a profession with such a strong stereotype that whenever I’d share something about my day with someone, s/he would be taken a little aback. When I say, “I couldn’t check out a book to you if I had to,” people are aghast. I say that I do a lot of information and knowledge management, but that jargon (as I was reminded this morning) means little of nothing to most people. I’ve come to see, in my line of work, that what people really want to know is the answer to the question, “What do you do and how will it help me?”

But what I’ve also come to see in my new line of work that takes me out of the library and into the worlds of my patrons, is that my patrons also struggle a lot with answering that same question. Just the other day, I heard a researcher say, “Nobody knows what the hell I do!” And inside, I shouted to myself, “WE’RE NOT ALONE!!”

And it’s true. Do you really know – do your really understand – what your friends, family members, colleagues, or patrons do? As an aside, I always wondered what Ward Cleaver and Steven Douglas did when they went off to the office. My parents were teachers, so I knew what they did, but what the heck did people do in offices all day? I had no idea. Similarly, I can stand on the new sidewalk and look up at the new research building on my campus and wonder just what’s going on in those labs.

As an informationist and/or embedded librarian, one of the skills I’m learning to master is interviewing. Part of a good interview involves clearly explaining to the researchers what I do. This involves practice. I need to think about it (a lot), talk about it with others, make sure that I’m making sense to people both in and outside of my profession. A good interview also involves my being flexible. I need to turn the tables on the researchers and ask them, “What do YOU do?,” and then, as I listen to their answers, I need to be able to think critically and creatively about when and where and how I can insert my skills and expertise into their work. I need to really be able to answer the question, “Where do I fit here?” I’m getting better with this as I do it more, as I’m gaining practice on and off the field.

But the real nugget of new-found knowledge that I want to share here today is this… we’re not alone. The people that we’re trying to help, struggle as much as we do in explaining what they do to others. We can make that easier for them in the interview. I asked a cardiologist last week, “What is that?” while pointing to these two medical devices that he had framed on his wall, looking liked crossed sabers. And in explaining what they were, I learned a lot about what he does. Changing the tone of the conversation, making it more personable and comfortable and often times less formal, helps both parties involved understand one another better. I wrote a couple of  posts back about empathy. That’s what this is – putting one’s self maybe not so much in another’s shoes, but in the same room and on the same level. Being part of the team.

It’s been a big week out of the library. Teaching the Team Science class went really well. I found a couple of other good opportunities for collaboration. I’m exploring another possible grant-funded part on a research team that looks really promising. And by golly, yesterday I spent the last hour of my day figuring out the H-index for an author based upon a long list of his citations he sent me, i.e. some good old fashioned librarian work! It’s still winter and we’re wearing a bunch of hats!

 

Follow

Get every new post delivered to your Inbox.

Join 1,512 other followers