Tag Archives: statistics

Show Me the Numbers

3 Feb

I’ve noticed how ever since I became an evaluator, I’m much more in tune to numbers. This isn’t to say that I never paid any attention to numbers before, but now, when I hear stories on the radio or I read articles in my local newspaper, I look more closely at what’s being reported regarding those numbers. What’s really being said? And more, I find myself asking, “What do these numbers really represent?” Here’s an example:

This morning, I was listening to a story on NPR about the voter turnout in this week’s Iowa caucus. Specifically, the story was about the turnout among younger voters (17-29 years of age) in Iowa and what, if anything, this turnout says about this voting bloc nationally.

Aside: You can find interesting data regarding the Iowa electorate (as well as other states) on the U.S. Census Department’s website. You can find specifics regarding the turnout of younger Iowa voters on the website of CIRCLE (The Center for Information and Research on Civic Learning and Engagement).

But back to the NPR story… Renee Montagne interviewed Kei Kawashima-Ginsberg, director of the Center for Information and Research on Civic Learning and Engagement at Tufts University, about these millennial voters. Phrases like, “record numbers” make my ears perk up. “What was the record?” I wonder. “What are we talking about?” In brief, Kawashima-Ginsberg stated, “The youth turnout was 11.2%.”

“11.2% of what?” I ask out loud in my car, to no one.

“On the Republican side, Ted Cruz received 27% of the votes, Mark Rubio 24%, and Donald Trump 19%.”

Again I ask, “27% of what?” No one answers.

Bernie Sanders, I’m told, won 84% of the Democratic vote, compared to Hillary Clinton’s 14%.

“Wow! 84%. That’s a lot! You do keep reporting how he’s winning the hearts of young folks.”

I pull out the note pad that I keep in the dashboard cubbie of my car and write down, “Young voters 84%, 14% // 11% = x” I put the note in my pocket, determined to figure out what these numbers mean. Later, I did.

The total number of young people, defined here as voters between the ages of 17-29, that participated in the Iowa caucus was 53,215. What’s that look like? I need a visual reference. I think of this demographic and I think of college. It’s a natural reference-point for me, a college grad. When I think of college and crowds, I think football. (Plus, the SuperBowl is but a few days away. Think football.) Thus, to give myself the visual that I need, I decide to compare these numbers to the capacities of various college football stadiums. Here’s what I found…

… 53, 215 people equals a sold-out crowd for a football game at Rutgers University’s High Point Solution Stadium.

RUFootballStadium

High Point Solutions Stadium, Rutgers University, East Rutherford, NJ

Okay, that’s a good-sized crowd. Granted, it’s not quite half of the capacity of the University of Michigan’s stadium, but let’s remember, it’s Iowa, a state who’s population makes up .97% of the United States as a whole. Michigan is up there at 3.11%. (All of this data comes from Census.gov.)

Of these 53,215 caucus-goers, 22,415 were Republicans and 30,800 were Democrats. Bernie Sanders won the support of 84% of those 30,800, or approximately 25,800 young people. I need a reference. What do 25,800 people look like? A sold-out crowd at my alma mater, James Madison University’s Bridgeforth Stadium. Go Dukes!

Bridgeforth Stadium

Bridgeforth Stadium, James Madison University, Harrisonburg, VA

Hillary Clinton’s 14%, or 4,312 youthful supporters from Tuesday night, could fit in at Sacred Heart University’s (Fairfield, CT) Campus Field.

Campus Field

Campus Field, Sacred Heart University, Fairfield, CT

Ted Cruz and his 27% of young Republicans (5,828) fill up the Butler Bowl of the Butler University’s Bulldogs in Indianapolis, IN.

Butler Bowl

Butler Bowl, Butler University, Indianapolis, IN

Mark Rubio’s 5,155 (24%) supporters would fill the stands of the University of Rhode Island’s Rams Meade Stadium.

Meade Stadium

University of Rhode Island, Meade Stadium, Kingston, RI

And finally, Donald Trump’s 4,483 supporters, or 19% of the young Republican caucus-goers, would fit nicely in Bryant College’s (Rhode Island) Bulldog Stadium. Or perhaps, more apropos, they could stay approximately 3 to a room in the 1,250 “deluxe guest rooms and palatial suites” of the Trump Taj Mahal casino in Atlantic City.

Bulldog Stadium

Put into these contexts, the numbers make so much more sense to me. Sure, 25,800 people (that 84% Bernie came home with) is a lot of people, but in perspective, my alma mater isn’t exactly a gigantic school. It’s a good-sized school, mind you, but it’s hardly representative of the number of people who might vote in a general election, even if they could all agree on anything, in mass, besides cheering for the Dukes.

Additionally, these stories say an awful lot about how numbers and statistics get used in our reporting. “The American People,” a phrase that every single politician, pollster, and news junkie talking head over-uses means … what A percentage of a percentage of a percentage of a percentage of people is generally a number way smaller than an image that “The American People” conjures up. It’s also, more than likely, a smaller sample size of ideas and beliefs, morals and behaviors, arguments and agreements, and problems and solutions than the 323,000,000 people in the United States hold in total. 

Yes, the political season in America is just getting rolling and it’s a great time to pay attention to the numbers reported, seek out sites for trustworthy statistics, do some math yourself, and hone up on your data fluency skills. (That last bit is a nod to a terrific book, Data Fluency, from the smart folks at Juice Analytics. Check it out.)

 

Learn Something New Every Day

27 Aug

My spouse recently got a call from a couple of faculty members in the computer science department at the college where she teaches. Lynn teaches in the art department; graphic design, motion design, typography, and the like. The computer science guys wanted to explore the possibility of her teaching a course in data visualization. Knowing that I have both an interest in the topic, plus the need to fumble through learning it (and using the new-found skills) for my job as an evaluator, she asked me what I thought about the opportunity.

Lynn knows enough about data visualization to know there’s a computer programming aspect to it. The computer science guys know enough to know there’s a design element to it. They all know that there’s math involved, specifically statistical analysis. I also suggested that it involves writing and/or journalism. She was hesitant – and rightly so – to jump on board without thinking and talking it through, because what she is an expert in is only one area of a multi-disciplinary field.

“It’s team science,” my boss, Nate, said when I shared the story with him. Exactly. And in many ways it’s an example of how the ways we traditionally teach, research, and work need to be re-examined and re-worked.

Too often, I find, we search for collaborators within our own circles of expertise. Librarians collaborate with other librarians. They might be from different types of libraries or different library departments, but often we’re all librarians. Researchers collaborate with other researchers. Scientists with other scientists. In some ways, it can be argued, this is team science (or team-based work), but it falls short of the ideal.

At it’s best, team science brings together experts from across different disciplines to work on problems that simply cannot be tackled by any one group. Think about a health problem like obesity. It’s huge and as such, touches upon so many different aspects of life. Addressing it requires everyone from geneticists to behavioral psychologists to nutritionists to exercise physiologists to public policy makers to urban planners to educators to medical doctors to parents to science writers to … it’s probably easier to identify the experts not needed than those who are. The point being that some of the most successful efforts at addressing obesity are those that bring as many of these fields of expertise together, to work together towards a solution. (The UMass Worcester Prevention Research Center is an example, close to home for me.)

But back to data visualization, what I’ve found is that those who do it best are either freakingly gifted (there’s always an Edward Tufte in any area) or they’re smart enough – and talented enough – to assemble good teams for the work. As I’m seeking to discover the best resources to learn and practice the skills for this job, I’m continually reminded to look across lots of different disciplines. I look to evaluators (Stephanie Green and Chris Lysy), graphic designers (Nigel Holmes), business intelligence consultants (Stephen Few), journalists and journalism professors (David McCandless and Alberto Cairo, respectively), artists (Manuel Lima), statisticians (Nathan Yau), doctors (Hans Rosling), and the people in my very own Quantitative Health Sciences Department. I read things by people who are good presenters, experts in visual communication, and those skilled in improvisation. In other words, while I’m limited in resources to actually form a real team of experts to do data visualization for the UMCCTS, I’ve learned enough to seek them out from across lots of corners so that I can do a better job. (I’m also lucky enough to be working in an environment where people don’t mind me trying things out on them. It’s a benefit of being in academia.)

Thanks to Chris Lysy’s (DiY Data Design) weekly creative challenge, this week I practiced using design icon arrays to report on the findings of a course evaluation with a small (n=15) class size. We get so hung up on “big data” that it’s easy to forget the real challenges of working with and presenting the results from small data sets. I really enjoyed taking this challenge and putting it to use. Here are a couple of examples. For the sake of privacy, I’ve redacted the questions being reported.

Time

Sample Arrays

Now, here’s one lesson that I learned for the next time that I use this visualization device. I need to make them like this:

better copy

This example allows me to better show that each response is represented by a single box, thus 11 people answered “Yes” and 4 answered “Somewhat.” Live and learn. Every day.

Next Tuesday, I’m taking a workshop on creating podcasts. It’s something that I’ve wanted to try and I found a 2-hour, evening class in Boston. Stay tuned to see what that new learning might bring. 

2013 in Review

31 Dec

The WordPress.com stats helper monkeys prepared a 2013 annual report for this blog.

Here’s an excerpt:

The concert hall at the Sydney Opera House holds 2,700 people. This blog was viewed about 33,000 times in 2013. If it were a concert at Sydney Opera House, it would take about 12 sold-out performances for that many people to see it.

Click here to see the complete report.

Let’s Ask the Expert

26 Mar

Normal Distribution

The research team has a new statistician; not a new analyst, but new statistician. If you look at it as a pecking order, the statistician oversees the analyst. Our former statistician retired recently, leaving the team to find a replacement. The University has a relatively new Quantitative Health Sciences Department and many of the services once procured through individual department statisticians are now going through QHS. Or at least this is how I think it’s going. These are things that I don’t necessarily need to know and as I have plenty of things occupying my “need to know” gray matter right now, I can just follow along here.

The significance of the new team member, to me, was that it generated the need for a meeting so that he could be brought up to speed on the project. This meeting happened this afternoon. I believe it was good for him (as well as the Chair of the Quantitative Methods Core, his boss, also in attendance). I know that it was good for me. I’ve now heard the project and its various aspects described on a number of occasions, and each time gain some new insight. Today, that insight was that I have a pretty good grasp on where the data for this study comes from, the different sources that generate it, how it’s stored, where it’s stored, who’s managing it, and so forth. I also had a pretty clear understanding of where the problem spots and/or issues with it are (mostly gone over, yet again, in today’s morning meeting).

I decided to pay close attention during the meeting on the questions that the statistician asked. I imagine that these are the kinds of questions that an informationist, embedded librarian, or anyone concerned with data management and planning would ask a research team. Here are some that I noted. If you’re doing an interview with a researcher about his/her data, are you asking these questions?

  • Is the data in one place or multiple places? 
  • Do the different sources merge together easily?
  • Are the variable names consistent across the sources?
  • Where is the merged data stored and how?
  • When and/or how often do you do data pulls from the sources?

Additionally, the statistician said that he wanted to be walked through the process. He wanted to generate a visual for himself of how everything works together. I found this request confirmation of much of what I’ve been reading and thinking about in terms of how we best see, understand, and communicate systems and processes. Visuals are important. I remember meeting with one of the chief programmers a few months back and how helpful it was when he pulled out a marker and drew us a picture on the whiteboard to explain all of this.*

*NOTE: If you’re interested in the art of explanation, check out The Art of Explanation by Common Craft founder, Lee Lefever. I’m pretty sure I mentioned this a few posts back, but in case you missed it… Also, Common Craft has made wonderful templates of their cut-out characters available for free to download and use in your own creations. Give it a try and see how well you do at explaining a concept or problem. Make a little video and share it with me.

So, if you’re keeping up with the process of the research study, the next step for the statistician is to collect data from the first cohort and start to play with it; see what it shows so far; see if it identifies any gaps of missing data and/or holes in the process that need to be addressed. It’ll be a couple of months, at least, before we hear back, but it was obvious that the team was excited about this move.

A few questions that I’m left with, following today, are:

  • What’s the difference between an analyst and a statistician?
  • What is my role, if any, in this aspect of the study?

One last interesting aside – When we went around the table to introduce ourselves and I said, “I’m from the library, serving as the informationist,” Dr. Barton, the Director of the Quantitative Methods Core said, “Oh, good.” I’m the only one who got an “Oh, good.” I’ve no idea what he meant by it, but I like to see it as a positive sign that my library is engaged in this kind of work. Regardless, it was a nice gesture.