Tag Archives: data science

Is Big Data Missing the Big Picture?

27 Apr

Forest_for_the_Trees

When I was defending my graduate thesis a number of years ago, I was asked by one of the faculty in attendance to explain why I had done “x” rather than “y” with my data. I stumbled for a bit until I finally said, somewhat out of frustration at not knowing the right answer, “Because that’s not what I said I’d do.” My statistics professor was also in attendance and as I quickly tried to backtrack from my response piped in, “That’s the right answer.”

As I’ve watched and listened to and read and been a part of so many discussions about data – data sharing, data citation, data management – over the past several years, I often find myself thinking back on that defense and my answer. More, I’ve thought of my professor’s comment; that data is collected, managed, and analyzed according to certain rules that a researcher or graduate student or any data collector decides from the outset. That’s best practice, anyway. And such an understanding always makes me wonder if in our exuberance to claim the importance, the need, the mandates, and the “sky’s the limit” views over data sharing, we don’t forget that.

I really enjoyed the panel that the Medical Library Association put together last week for their webinar, “The Diversity of Data Management: Practical Approaches for Health Sciences Librarianship.” The panelists included two data librarians and one research data specialist; Lisa Federer of the National Institutes of Health Library, Kevin Read from New York University’s Health Sciences Library, and Jacqueline Wirz of Oregon Health & Sciences University, respectively. As a disclosure, I know Lisa, Kevin and Jackie each personally and consider them great colleagues, so I guess I could be a little biased in my opinion, but putting that aside, I do feel that they each have a wealth of experience and knowledge in the topic and it showed in their presentations and dialogue.

Listening to the kind of work and the projects that these data-centric professionals shared, it’s easy and exciting to see the many opportunities that exist for libraries, librarians, and others with an interest in data science. At the same time, I admit that I wince when I sense our “We can do this! Librarians can do anything!” enthusiasm bubble up – as occasionally occurs when we gather together and talk about this topic – because I don’t think it’s true. I do believe that individually, librarians can move into an almost limitless career field, given our basic skills in information collection, retrieval, management, preservation, etc. We are well-positioned in an information age. That said, though, I also believe that (1) there IS a difference between information and data and (2) the skills librarians have as a foundation in terms of information science don’t, in and of themselves, translate directly to the age of big data. (I’m not fan of that descriptor, by the way. I tend to think it was created and is perpetuated by the tech industry and the media, both wishing we believe things are simpler than they ever are.) Some librarians, with a desire and propensity towards the opportunities in data science will find their way there. They’ll seek out the extra skills needed and they’ll identify new places and new roles that they can take on. I feel like I’ve done this myself and I know a good plenty handful of others who’ve done the same. But can we sell it as the next big thing that academic and research libraries need to do? Years later, I still find myself a little skeptical.

Moving beyond the individual, though, I wonder if libraries and other entities within information science, as a whole, don’t have a word of caution to share in the midst of our calls for openness of data. It’s certainly the belief of our profession(s) that access to information is vital for the health of a society on every level. However, in many ways it seems that in our discussions of data, we’ve simply expanded our dedication towards the principal of openness to information to include data, as well. Have we really thought through all that we’re saying when we wave that banner? Can we have a more tempered response and/or approach to the big data bandwagon?

Arguably, there are MANY valid reasons for supporting access in this area; peer review, expanded and more efficient science, reproducibility, transparency, etc. Good things, all. But going back to that lesson that I learned in grad school, it’s important to remember that data is collected, managed, and analyzed in certain ways for a reason; things decided by the original researcher. In other words, data has context. Just like information. And like information, I wonder (and have concern for) what happens to data when it’s taken out of its original context. And I wonder if my profession could perhaps advocate this position, too, along with those of openness and sharing, if nothing more than to raise the collective awareness and consciousness of everyone in this new world. To curb the exuberance just a tad.

I recently started getting my local paper delivered to my home. The real thing. The newsprint newspaper. The one that you spread out on the kitchen table and peruse through, page by page. You know what I’ve realized in taking up this long-lost activity again? When you look at a front page with articles of an earthquake in Nepal, nearby horses attacked by a bear, the hiring practices of a local town’s police force, and gay marriage, you’re forced to think of the world in its bigger context. At the very least, you’re made aware of the fact that there’s a bigger picture to see.

When I think of how information is so bifurcated today, I can’t help but ask if there’s a lesson there that can be applied to data before we jump overboard into the “put it all out there” sea. We take research articles out of the context of journals. We take scientific findings out of the context of science. We take individual experiences out of context of the very experience in which they occur. And of course, the most obvious, we take any and every politician’s words out of context in order to support whatever position we either want or don’t want him/her to support. I don’t know about you, but each and every one of these examples appears as a pretty clear reason to at least think about what can and will happen (already happens) to data if and when it suffers the same fate.

Are there reasons why librarians and information specialists are concerned with big data? Absolutely! I just hope that our concern also takes in the big picture.

 

Summer Picks

18 Jul

I’ve but a short post to share this week. Honestly, it’s just too hot to even think clearly enough to write, BUT not to read. With this in mind, I thought I’d share a few of the informationist-related books that I’m working through this summer. If you have others to contribute or thoughts to share about any of these, I hope you’ll do so in the comments section.

Beginning Database Design, Clare Churcher

Beginning Database Design, Clare Churcher

It’s true that most librarians learn about database design in grad school and it’s surely a skill that we should have expertise in throughout our careers, but a good refresher text is never anything to snuff at. I picked up this one at the MIT bookstore when I was taking the Software Carpentry Bootcamp several weeks back. It’s a keeper for the bookshelf on my desk.

Visualize This, Nathan Yau

Visualize This, Nathan Yau

Data Points: Visualization that Matters, Nathan Yau

Data Points: Visualization that Matters, Nathan Yau

These two books by Nathan Yau, together, are providing me with both a skill set to retrieve data from the Web and a really good understanding of how to present data and/or information so that it makes the most sense to an audience. Yau writes clearly and with a tone that keeps you interested in a topic that, lets face it, could easily slip into the dry and “put you to sleep” mode. As one with an appreciation for design, I also think that the books are treasures to look at. They’re a great starter set for what is my summer reading’s real focus, data visualization.

Visualizing Data: Exploring and Explaining Data with the Processing Environment, Ben Fry

Visualizing Data: Exploring and Explaining Data with the Processing Environment, Ben Fry

More technical and dense than Yau’s books, I had a half-price coupon for an O’Reilly Media ebook and so I picked this one. It’s definitely good for reference and troubleshooting, though I know it’s not one that I’ll read cover-to-cover.

The Functional Art: An introduction to information graphics and visualization (Voices That Matter), Alberto Cairo

The Functional Art: An Introduction to Information Graphics and Visualization, Alberto Cairo

Cairo’s is another really beautiful book to both look at and read. Design is first and foremost. I’m finding Yau’s books more practical for my learning, but I love picking this one up and flipping through its pages every now and then, just because it’s so nice to peruse. But not to sell it short, it’s filled with a lot of good advice for communicating information in a clear and interesting manner. It fits well with the others on my shelf.

Beautiful Visualization: Looking at Data through the Eyes of Experts (Theory in Practice), edited by Julie Steele and Noah Iliinsky

Beautiful Visualization: Looking at Data through the Eyes of Experts (Theory in Practice), edited by Julie Steele and Noah Iliinsky

As the title suggests, this is a phenomenal collection of works by many of the leading practitioners of data visualization working today. This is the perfect working informationist beach book, offering a bunch of short, quick reads, separate to themselves, that together give you a really high bar to shoot for if you want to go into this field.

A Simple Introduction to Data Science,  Lars Nielsen & Noreen Burlingame

A Simple Introduction to Data Science, Lars Nielsen & Noreen Burlingame

Short and sweet (just 75 pages long), this is a staple on my Kindle. It explains data science in lay terms, yet from the scientist’s (not the librarian’s) point of view. It’s a nice reference to keep handy.

Pretty Good for a Girl

Pretty Good for a Girl: Women in Bluegrass (Music in American Life), Murphy Hicks Henry

And finally, lest you think I’ve completely rearranged all of my life’s priorities, I’m really, (really), enjoying this compilation of women (most forgotten and/or overlooked) from the 1920s to present who have held their own in the male-dominated world of bluegrass music. It’s stellar!

That’s a full beach bag of books for me (and you, if you want to seek some or all of them out) and summer is really only so long. In fact, how many days do I have ’til vacation?!?!

Happy reading and stay cool!

So DO You Want to be a Data Scientist?

28 Mar

Last week, a colleague that I follow on Twitter retweeted a post from the blog, NatureJobs, titled , So you want to be a data scientist by Michael Koploy of SoftwareAdvice.com. The colleague who originally brought the piece to my attention, Kristi Holmes, PhD, is a bioinformaticist at Becker Medical Library at Washington University in St. Louis School of Medicine. She’s also an all-around good egg and one of my absolute favorite colleagues in the field, but that’s beside the point. I would have read the piece regardless of who tweeted it to my attention. However, because it came from Kristi, we then engaged in a mini tweetchat that we’ve had before, i.e. Where and what is the intersection between data scientists and librarians, if there even is one?

One of the interesting things about this discussion, to me, is that Kristi is a scientist who happens to work in a library, while I am a librarian, trying to work in the arena of scientists. And from our different perspectives, she is the one who is routinely much more optimistic about librarians getting into the area of data than I. There’s probably a thing or two you can decipher from this, but that’s for another time.

Another thing that happened after I retweeted and commented on the post was that I got an email from Brittany Richards at Software Advice thanking me for the tweet and additionally, asking if I’d do a blog post of their article here on the Librarian Hats blog. Specifically, Brittany wrote, “You mentioned library science and I was interested to see your thoughts on how the two are related to each other.”

Now if you’ve read this blog for any time, you know my answer was an enthusiastic, “SURE!” So here goes – a recap of that article and some summarizing of conversations I’ve had with Kristi and other scientists on the topic:

scienceI once saw/heard a librarian give a presentation where he identified himself as a data scientist. I called him on it. I am a librarian with a graduate degree in library & information science. I also have a graduate degree in an applied biological science (exercise physiology). Given that background, I feel pretty comfortable stating that while the two share the word, there is a world of difference between the science that librarians do and that which takes place in laboratories, clinics, the field, etc. As I’ve stated in this blog before, my background in exercise physiology is what I feel gives me the extra tools that I need to be effective as an informationist. That’s the science background that is recognized in the sciences.

I hope you don’t hear me dissing my library degree, education, or career. I’m not at all. They are just different and when I read articles like Koploy’s, as well as many books on data, specifically library and librarians’ roles in working with data, I cannot help but keep this thought in mind. It’s what comes to my mind. Every time.

In his post, Koploy recalls the description of a data scientist that he got from Bruno Aziza, a big name in Big Data. Aziza called a data scientist a “business analyst-plus.” He highlights mathematics, statistics, and business strategy as their core skills. Koploy himself adds, “While programming and statistical expertise is the foundation for any data scientist, a strong background in business and strategy can help jettison a younger scientist’s career to the next level.” Further, he notes that successful data scientists are drawn from the fields of biostatistics, econometrics, engineering, computer science, and the like. I’ve read the article several times. Library or information science is not on the list.

Again, this isn’t a slight against my field, but rather an observation that there are different skill sets required for different jobs and the job of a data scientist is not the job of a librarian. And vice versa.

So the question then becomes, how much does a librarian – or an informationist – need to learn to become a data scientist? I say, “A lot.” However, that “a lot” comes with the assumption that one isn’t entering data science from one of those previously mentioned fields. If this is the case, then of course, that individual is well prepared. You’ll note though, that even with the background, Koploy points out that data science is (1) fast-growing, (2) extremely competitive, and (3) new. Even the most seasoned statistician needs to learn some new skills and/or subjects to keep up.

The optimistic among us – those who believe the cross-over between information and data science is broad – focus upon those characteristics that are, in fact, mentioned by experts in the data science field as ones that separate the exceptional data scientist from the average; inquisitiveness, the ability to spot trends, and the tendency (skill) to ask the right questions. It’s the latter where librarians, informationists, and information scientists both have experience and often excel. We know how to ask the right questions that get to the heart of information problems, e.g. How does the business work? How does it collect data? How will it use the data? (per Krishna Gopinathan, Global Analytics Holdings)

So, do you want to be a data scientist? If you’re a librarian or an informationist, depending upon your background, you may or may not have a little or a lot of work to do to get ready to take on the role. If you don’t have the background, I see two possibilities:

  • Get it (hit the books!)
  • Find the right partner(s) where your skills can be paired to produce a good data science team

We choose careers for a lot of different reasons, but I like to believe that in the best case scenario, we choose something that we’re both interested in and good at. Remember those aptitude tests you took in the guidance counselor’s office in high school? They were (and still are) meant to measure something. They measure what we like and what we have an aptitude for. They measure what career would fit us best. It means something to be a librarian. It also means something to be a scientist. I believe that it’a a sign of the times, and a bit of a challenging time at that, that careers and skills and tasks that once sat neatly within cubicles and labs and computer workstations are now all mixed up together. This melting pot of vocations is difficult to navigate. On the one hand, it opens a wealth of new opportunities. On the other, though, it means for everyone working with information and/or data, we will never enjoy sitting back and doing the same old same old for very long.

If you’re interested, I also encourage you to read the original piece that Michael Koploy wrote, along with some of the links he suggests for further reading. In particular, I really enjoyed Hilary Mason’s blog. Good stuff there. I also happened to notice, just this morning, that Coursera’s free Introduction to Data Science class that’s listed is starting up in the not too distant future. If it piques your interest, give it a go. You might well find that you have a hidden talent that will take you far in this new area.

Which brings me full-circle to the question I began with, i.e. Is this new area in the library? Well, quite obviously there are individuals like Kristi, bioinformaticists and data scientists who find their home in libraries*. There are also librarians or informationists with training in data science who find their homes outside of the library. And then there are librarians. And then there are data scientists. In other words, there’s a big mix of us. If you’re comfortable in the mix and you’re up to the task of getting and/or honing new skills, you’ll likely do really well wherever you are.

The times they are a changin’, sings Mr. Dylan, and we look to change with them. At the same time, though, we need to be realistic. We need to see clearly what we know, what we do well, what we like, and more. We need changes in graduate education across the board to address these issues, and likewise those of us working need to accept that we’ll be learning for a lifetime. These are the times we live in. You can’t just call yourself something different. You need to do something different. Or do things differently. Likely all of the above.

special agents rockin

Rockin’ out with my pals, The Special Agents, at Houghton Elementary School. Support art, music, and physical education in your public schools, people! You could get a band out of it.

Now I’m off to play drums with a friend’s band, dressed up like the Cat in the Hat. You’ve got to have a really big tool box o’ skills, friends. Really big!

* And then there’s the matter of money. If you have the chops to get a job as a data scientist, are you willing to work for about half of what you could make in business or industry than you will in a library? It’s a question that comes up in our professional discussions often. If you want to have at it in the comments section to this post, go for it!

Follow

Get every new post delivered to your Inbox.

Join 1,750 other followers