Data Management

Author: Sina G. Lewis

Chances are that you will create large amounts of data on a daily basis, that’s research for you! Not every piece of that data will be useful necessarily, but you don’t always know immediately. This is where your personal 5 TB and the separate Teams 25 TB comes in handy. Use those 5 TB that are just yours to their max, moving things to the appropriate place within Teams as it becomes apparent the data is important. A good rule of thumb to identify ‘important’ data is to ask yourself if it’s needed to reproduce your results or marks an important milestone. For example, when running DFT calculations you often start with an un-optimized ‘guess’ at the system’s structure, and you end up with a structure whose forces are below a selected tolerance. Keeping both this initial guess and the final structure can save time for both yourself and future graduate students. The below sections will detail how to organize this important data within Teams.

A Word of Warning: In the summer of 2022, the University of Colorado Boulder transferred from unlimited Google Drive storage space for everyone to OneDrive storage space, with individual users getting 5 TB of storage and 25 TB in our Team storage. Because of this migration, we lost research data from previous members. We want to ensure that as we build our new storage space on OneDrive, we put into place practices that make data easy to find and backup in case of future bureaucracy.

Down to Summary List

Channels

When you start on a project, you should be given access to a channel, or several channels, that pertain to your project(s). Each channel has its own dedicated file space, separate from other channels. Your first step should be to create a folder with your name on it within each channel that you can access. Data you generate for a project should go within your folder in the appropriate channel.

How you organize your files within your folder in a channel is largely up to you. However, you should strive to organize your files with a future graduate student in mind. Will they be able to parse through your data and understand the purpose of every file? As part of this process, you should have a README.txt in every folder. This file should detail what you would tell someone if you were giving them a tour of your files with the intention of them continuing your work. For example, in a folder titled ‘fromClaire’ in the ‘molecules attached to silicon’ project, the README.txt is

Claire at UT Austin gave me these .xyz files. The powerpoint details all the surfaces they can make / are in the process of comparing to one another. The goal is to use VASP to look at the hybridization between bulk silicon and a molecule attachment.

This README details where the files came from and their purpose. If these were final result data files, the README should detail what analysis has been done and the conclusions we have drawn from the results. As an example of this, I have written the following README in a folder titled “Clock Transition Plots” that contains various .jpgs

As part of a grant that Joel and Niels were writing, Joel asked me to revisit my clock transition work. Specifically, we were curious if there was another point where maybe the magnetic field wasn’t as stabilized, but the splitting was minimally sensitive to the angle of the molecule within the field. To help answer this question, I revisited my Mathematica notebook with my JDE code and started thinking of different ways to plot the splitting. These plots show the lower energy splitting and its derivative for varying values of beta, phi, and theta. The most revealing plots are the vector plots that show the gradient of the lower energy as a function of phi and theta for different beta. This shows how at parallel/anti-parallel configurations the energy splitting has a weak tendency to stabilize with B0 along x,z, but for any other angle of beta, there is a flow towards B0 along y

Presentations

With collaborations, group meetings, supergroup, and conferences, you will generate posters and presentations about your work. Because these may reference multiple projects and be useful to everyone in the group, these files should be contained in “General > Posters-and-Presentations”. There are sub-folders for posters and supergroup presentations, but we want to avoid over categorizing the files. The image tags will do most of the heavy lifting. Whenever you add a file, be sure to add image tags in the details as described in the image tags page. Some possible tags: your name and any others involved, a mention of the topic(s) discussed in the file, the purpose of the file (supergroup, conference, etc.), or date (year-month-day format) the presentation was presented.

Meetings with Joel also will go more smoothly if he has access to any notes you want him to read beforehand. Under “Meeting Notes” you should have a folder with your name where you add anything pertaining to upcoming meetings with Joel. These files should also be labeled through image tags with the date that the meeting occurs being most important to add.

Paper Data

As you can imagine, most papers have a fair bit of data that supports their findings. It has become increasingly important that this data is easy to find or reproduce on demand. For this purpose, we have devised a slightly different protocol for data associated with a paper.

When you start a paper, create a folder just for the paper and its data within your folder in the appropriate channel. I recommended naming the folder by the name of the paper. You can update the name if it changes when it’s published. You then want to make this folder available to everyone in the Team. This is done by clicking “Share” and specifying that “CHEM-EAVES Members” with the link have access. Copy the share link, head to the folder “General > Papers > Group_Papers”, and create a new link using the dropdown menu. The URL is the share link and the name can either be the name of the paper, or something short yet descriptive. That last setup step you need is to add descriptive “image tags” in the details as described on this page.

As you write the paper, you should save the data used to create figures, pre-submission versions (PDF and LaTex code), reviewer responses, and so on to this folder. Finish it off with a README to explain any specific details one might need to know about the files.

GitHub

Summary

Initial Setup

Gain access to the channel(s) associated with your project(s)
Create a folder with your name (FirstLast) in each channel folder

Day-to-Day

Add important data to relevant channel as needed
- Keep it organized with a README.txt in each folder
Upload presentations to “General > Posters-and-Presentations”
- Add image tags

Papers

Create a folder for the paper in the correct channel
Share with “CHEM-EAVES Members”
Add link in “General > Papers > Group_Papers”
Add image tags