Getting into the habit...
A PhD student’s perspective on data management
Written by Annemarie Hildegard Eckes
A PhD project is a significant period in a
researcher’s life. During the project, we generally must develop our own
research question and methodology, generate data and publish our results in
papers and as a final thesis. Such a project is meant to teach us how to
conduct research. This is the crucial time in which we as early career
researchers should pick up the right habits for our future as successful
Research Data Management
(RDM) is an important day-to-day activity for Scientists. Research output,
collaborations and productivity depend on it. No surprise, then, that the
documentation of a project’s RDM has become a requirement for many grant
applications. By writing a Data Management Plan as part of the PhD proposal, we
students are not only confronted with the whole data lifecycle of our research
data before it is even generated, but we also gain experience in how such a
plan is written. Early career researchers such as us PhD students should not
underestimate the importance of skills in RDM, which in my opinion are nowadays
pretty essential for a good scientific career.
I think “data management” in its most basic form starts with managing
your email inbox. To me, it often is simply the act of keeping all information
that I deal with in order. We all do it more or less all the time. The tricky bit is how to deal with what we
data the best way. As we may be new to the research subject of our PhD,
we may not know how best to collect, document and manage the data we are
PHD students and RDM
The point of a PhD is to learn how to conduct research and RDM is part of that process. But sometimes it may be important to learn about good practices right from the beginning, rather than getting into bad habits that cause problems later in your research.
I think that training in RDM for us PhD students is useful for two reasons, firstly to learn the right habits and secondly to enhance productivity throughout the duration of the PhD. (see figure below)
Figure 1 RDM-smileys: In talks I give about research data management, I like to use these smileys in my presentations. The first row at the beginning and the second row at the end of the talk. The principle behind these smileys is based on a presentation by the Cambridge office for Scholarly communication https://osc.cam.ac.uk/.
I conducted a survey and
interviews at our department, asking fellow PhD students about their data
management practices, the data types they collected and their training needs.
Participants at the end of their PhD indicated that they generally felt
prepared to conduct data management in their coming research career, while they
also say that they would have benefited from training at the beginning of their
PhD. In one interview, this came out especially, with one interviewee stating
that “the lack of training in research data management slowed me down”. This
shows that while we PhD students have ourselves learned more on the aspects of
data management during our PhD, early training would have made us more
productive - and certainly more happy ( see figure 1)! For a blog entry where I discuss some of the
survey results, please see here: https://researchdata.jiscinvolve.org/wp/2018/03/28/bad-habits-best-practices-survey-rdm-among-early-career-researchers/
PhD students and the
data tree training platform
An online platform that provides training on RDM for PhD students is in my opinion a much needed resource! I think that at the beginning of my PhD, I would have been happy if Data Tree had existed to provide me with a good overview of RDM.
As an online course it is accessible to all PhD students at
any time. And with some of us having crazy schedules and weird working and
sleeping habits, doing such training in our own time might help us remain
flexible. It’s my experience that people do not spend the time to come to talks
or workshops. While my survey showed clearly that PhD students do think RDM is
important, the turnout to stand-alone talks, workshops and other events I have
organised has been rather low. I hope that such a continuously accessible
platform would decrease the barrier to learning more about RDM.
While time and timing might be a barrier to learning about
and performing RDM, I wonder whether the main reason PhD for students not
attending training courses is the lack of priority. For many busy PhD students, RDM never seems to
be a priority- and neither does RDM training. Therefore, PhD students will
probably need to be encouraged in some ways to make use of this online platform.
One option could be that Universities make this online course count in their PhD
It will be interesting to see how the platform is taken up and what strategies are used to encourage us busy PhD students to do this online course. I wish this platform a good start, a lot of users and that it makes a significant contribution to PhD students’ success in RDM!
I am a PhD student (https://www.geog.cam.ac.uk/people/eckes/) in Biogeography at the department of Geography in Cambridge, working with all sorts of data and formats: Climate data in .netcdf, and .txt format. Tree growth dynamics data in .excel spreadsheets. Tree ring anatomical data as images, and later as .txt -files.
My project involves the development of a computer model that
simulates how a tree stem grows in width, in response to the environment
(temperature, precipitation etc..). The ultimate aim is for the final model to be
used in the vegetation model HYBRID, developed by Andrew Friend(https://www.geog.cam.ac.uk/people/friend/),
my supervisor, to help in projections on how vegetation will behave under
climate change in the future.
All this data needs to be described and managed well, for example: who gave it to me?
What did I do to it? How to make sure I don’t lose it? How do I version control and document the scripts that use the data and the model that I compare the data against? How will I make sure the data and scripts during my PhD will be shared with the community and what standards should I adhere to, to make reusability really easy? I didn’t feel that I had enough expertise in this, but wanted to do it right from the start. Before I started my PhD I worked with a database for crop data. That’s when I really learned how poorly documented and poorly organised research data can slow down a research project immensely and I did not want to make the same mistake which I have seen experienced researchers make. My previous experience and motivation to acquire good habits right from the start got me very interested in RDM and made me an advocate for it as Cambridge and JISC data champion.