DSVIL Write-Up

My write-up for the NCSU Data Science and Visualization Institute for Librarians is published in the NNLM MidContinential Region’s newsletter (recently renamed The MidContinental Messenger)!

Thanks to the A-mazing @ncsulibraries for showing us around during #dsvil. And ‪@NCStateVetMed ‬too

A post shared by doujoudc (@doujoudc) on



Journey to DSVIL

The next Data Science and Visualization Institute for Librarians (DSVIL) at North Carolina State University (NCSU) was announced in late November 2016 and naturally everyone made sure I knew about it. My job title may say “Data Science Librarian” but I’m always on the lookout for relevant professional development. I had come back from the Bibliometrics and Research Assessment Symposium earlier that month and was just winding down from hosting the Research Reproducibility Conference. For my own work, I am especially interested in creating a reproducible workflow to generate visualizations and documentation of research outputs and impact at different levels of the university. I have grand plans to utilize the ORCID Public API to build an interactive web dashboard that would pull in publication data and then visualize it — once I figured out how!

My knowledge of data science has so far been acquired piecemeal; I taught myself how to program in R and watched webinars on data visualization. Every few years I teach myself Python again to keep up with a second programming language; I’ve relearned it 3 times now. I have guest-lectured in undergraduate and graduate level courses on data visualization best practices, using the ggplot package to generate plots, and writing in LaTeX. However, I still currently lack the depth of knowledge (e.g. statistical analysis, data wrangling, using APIs) and practical experience in order to be a valuable member of a research team. DSVIL looked like the perfect program to build up additional knowledge and skills!

Costs & Funding

The tuition alone for this program is $2500 and includes an evening reception and daily breakfast/lunch, but not travel or lodging. Sticker shock led to my initial dismissal of the opportunity until my library director encouraged me to go for it. We would figure out a way to fund me to attend if I was accepted. So I applied for any possible external funding to help supplement EHSL funds:

  • MLA Continuing Education (CE) Grant [$500]
  • MLA MIS Career Development Grant [$1500]
  • NNLM MCR Professional Development Funding [$1500]

Happily I was accepted to DSVIL! Thanks to EHSL, MLA CE, and NNLM MCR for covering the tuition, travel expenses, and remaining meals. And thanks to extended family for housing me this coming week! Who knew Raleigh could be so expensive‽


As a condition of funding, I will be formally writing up this experience for NNLM MCR’s quarterly newsletter. I can’t promise I’ll have time to post daily here, but my goal is to have detailed notes at the end of it. I anticipate doing my usual tweeting using #DSVIL, but the schedule is tight and there’s a lot to learn! I am looking forward to meeting the other participants and being part of this network of data science librarians.

Teaching ggplot2

titleslide_ggplot2Last week I gave a short presentation to a math class on advanced graphing in R with ggplot2. Click here to view the updated presentation: http://goo.gl/QNQjuV

I think the session went well overall. The students are masters and PhD candidates who are interested in learning more statistics and applied math. Their course textbook uses R for basic graphing and the professor thought it would be nice to introduce more advanced graphing. While I don’t know much about multilinear models, I do know a thing or two about advanced graphing in R.

One hitch: I’ve never taught R or ggplot2 before. I use it in my work at the library, but teaching it is a whole other animal. I spent a week pondering the best way to show how ggplot2 is superior to base graphics. Other constraints included time (50 minute class) and students bringing their own installations. Ultimately, I decided the best way is to show comparisons and then allow time for them to try it for themselves.

Post-session things I learned, including feedback received:

  1. Prepare a structured exercise. I let them loose and they needed more direction so the updated presentation has a couple slides.
  2. Post the slides ahead of time. This was actually not very easy to do since WP doesn’t host HTML files and Dropbox makes you download the file…
  3. Presentation only took about 20 minutes. So for future presentations, I threw in a couple more slides after the Exercise slide about even more advanced graphing.

This presentation would probably fit pretty well in a lunchtime tech series, much like Eccles Express from a few years ago. I can see making a series of short presentations around using R. Maybe a breakdown like this?

  1. Setting Up to Learn R
  2. Basics of Using R (this one might need to be split into two parts)
  3. Basic Graphing in R
  4. Advanced Graphing in R
  5. Writing in R Markdown