Datatree: Getting into the habit.. A PhD student’s perspective on data management

Getting into the habit...

A PhD student’s perspective on data management

Written by Annemarie Hildegard Eckes

Annemarie Eckes A PhD project is a significant period in a researcher’s life. During the project, we generally must develop our own research question and methodology, generate data and publish our results in papers and as a final thesis. Such a project is meant to teach us how to conduct research. This is the crucial time in which we as early career researchers should pick up the right habits for our future as successful scientists.

Research Data Management (RDM) is an important day-to-day activity for Scientists. Research output, collaborations and productivity depend on it. No surprise, then, that the documentation of a project’s RDM has become a requirement for many grant applications. By writing a Data Management Plan as part of the PhD proposal, we students are not only confronted with the whole data lifecycle of our research data before it is even generated, but we also gain experience in how such a plan is written. Early career researchers such as us PhD students should not underestimate the importance of skills in RDM, which in my opinion are nowadays pretty essential for a good scientific career.

I think “data management” in its most basic form starts with managing your email inbox. To me, it often is simply the act of keeping all information that I deal with in order. We all do it more or less all the time. The tricky bit is how to deal with what we call research data the best way. As we may be new to the research subject of our PhD, we may not know how best to collect, document and manage the data we are dealing with.

PHD students and RDM training

The point of a PhD is to learn how to conduct research and RDM is part of that process. But sometimes it may be important to learn about good practices right from the beginning, rather than getting into bad habits that cause problems later in your research.

I think that training in RDM for us PhD students is useful for two reasons, firstly to learn the right habits and secondly to enhance productivity throughout the duration of the PhD. (see figure below)

Figure 1 RDM-smileys: In talks I give about research data management, I like to use these smileys in my presentations. The first row at the beginning and the second row at the end of the talk. The principle behind these smileys is based on a presentation by the Cambridge office for Scholarly communication https://osc.cam.ac.uk/.

I conducted a survey and interviews at our department, asking fellow PhD students about their data management practices, the data types they collected and their training needs. Participants at the end of their PhD indicated that they generally felt prepared to conduct data management in their coming research career, while they also say that they would have benefited from training at the beginning of their PhD. In one interview, this came out especially, with one interviewee stating that “the lack of training in research data management slowed me down”. This shows that while we PhD students have ourselves learned more on the aspects of data management during our PhD, early training would have made us more productive - and certainly more happy ( see figure 1)! For a blog entry where I discuss some of the survey results, please see here: https://researchdata.jiscinvolve.org/wp/2018/03/28/bad-habits-best-practices-survey-rdm-among-early-career-researchers/

PhD students and the data tree training platform

An online platform that provides training on RDM for PhD students is in my opinion a much needed resource! I think that at the beginning of my PhD, I would have been happy if Data Tree had existed to provide me with a good overview of RDM.

As an online course it is accessible to all PhD students at any time. And with some of us having crazy schedules and weird working and sleeping habits, doing such training in our own time might help us remain flexible. It’s my experience that people do not spend the time to come to talks or workshops. While my survey showed clearly that PhD students do think RDM is important, the turnout to stand-alone talks, workshops and other events I have organised has been rather low. I hope that such a continuously accessible platform would decrease the barrier to learning more about RDM.

While time and timing might be a barrier to learning about and performing RDM, I wonder whether the main reason PhD for students not attending training courses is the lack of priority. For many busy PhD students, RDM never seems to be a priority- and neither does RDM training. Therefore, PhD students will probably need to be encouraged in some ways to make use of this online platform. One option could be that Universities make this online course count in their PhD training logs.

It will be interesting to see how the platform is taken up and what strategies are used to encourage us busy PhD students to do this online course. I wish this platform a good start, a lot of users and that it makes a significant contribution to PhD students’ success in RDM!

About me:

I am a PhD student (https://www.geog.cam.ac.uk/people/eckes/) in Biogeography at the department of Geography in Cambridge, working with all sorts of data and formats: Climate data in .netcdf, and .txt format. Tree growth dynamics data in .excel spreadsheets. Tree ring anatomical data as images, and later as .txt -files.

My project involves the development of a computer model that simulates how a tree stem grows in width, in response to the environment (temperature, precipitation etc..). The ultimate aim is for the final model to be used in the vegetation model HYBRID, developed by Andrew Friend(https://www.geog.cam.ac.uk/people/friend/), my supervisor, to help in projections on how vegetation will behave under climate change in the future.

All this data needs to be described and managed well, for example: who gave it to me?

What did I do to it? How to make sure I don’t lose it? How do I version control and document the scripts that use the data and the model that I compare the data against? How will I make sure the data and scripts during my PhD will be shared with the community and what standards should I adhere to, to make reusability really easy? I didn’t feel that I had enough expertise in this, but wanted to do it right from the start. Before I started my PhD I worked with a database for crop data. That’s when I really learned how poorly documented and poorly organised research data can slow down a research project immensely and I did not want to make the same mistake which I have seen experienced researchers make. My previous experience and motivation to acquire good habits right from the start got me very interested in RDM and made me an advocate for it as Cambridge and JISC data champion.

Last modified: Friday, 1 June 2018, 10:47 AM