Research Guides: Research Data Management: Best Practices for Data Preservation

Develop a File-Naming Strategy

In order to find your files in the future and understand what they contain, you should be consistent and descriptive in naming your files. Have conventions for naming directories, folders, and files. Always include the same information in the same order. Include relevant information such as unique identifier, date, author name, and project name. When using sequential numbering, make sure to use leading zeros to allow for multi-digit versions. For example, a sequence of 1-10 should be numbered 01-10. Do not use special characters such as & , * % # ; * ( ) ! @$ ^ ~ ' { } [ ] ? < > - . Dates should always be yyyy-mm-dd to organize files chronologically.

Examples of good naming conventions:

20130503_DOEProject_DesignDocument_Smith_v2-01.docx
20130709_DOEProject_MasterData_Jones_v1-00.xlsx
20130825_DOEProject_Ex1Test1_Data_Gonzalez_v3-03.xls

You should also document your decisions for creating file naming conventions. You may want to include in the directory a readme.txt file that explains your naming format along with any abbreviations or codes you have used.

If you already have a lot of data collected and wish to organize and rename the files for easier data management, there are applications available that allow you to rename files in bulk. Examples include ReNamer or PSRenamer.

Back Up Data

Data should be backed up on a regular basis. Ideally, data will be backed up on both local and remote external servers so that one copy can be restored in the even of a disaster. It is recommended that three copies of your data be saved in geographically dispersed locations, for example one on your computer, one on an external hard drive, and one in networked storage.

Develop a regular backup routine for your data and synchronize among your backup copies. Software and utilities can be used to schedule automatic backups of your files; examples include Microsoft Backup and Apple Time Machine.

Use Stable Formats

Use open, non-proprietary file formats for archiving your data. These are more likely to remain usable even if the software that created them becomes unavailable or no longer functional, so they are generally better for re-use and long-term preservation. The following formats are recommended:

Plain text -- .txt, .xml, .html
Tabular data -- .csv
Databases -- .xml, .csv
Image -- .tiff, .jpg2, .png, .gif
Documents -- .pdf/a
Audio -- .wav, .mp3
Video -- .mp4, .mov

In addition, files should be unencrypted, uncompressed, and in common usage by the research community.

Use Version Control

Versioning refers to saving new copies of your files when you make changes so that you can go back and retrieve specific versions of your files later. There are three basic ways to keep track of versions:

Manually save versions when you make changes, including a version number in the file name, e.g "v1," "v2," or "v2.1".
Use tools that automatically assign version numbers to manage data, for example Google Drive or Dropbox.
Use a version control software such as Git, TortoiseSVN, or Apache Subversion to track revisions.

Document Data

It's important to document your data so that it can be found, identified, reused, and understood in the future. This information should explain how your data was created, what the context is for the data, the structure of the data and its contents, and any manipulations that have been done to the data.

Metadata is a form of documentation that describes characteristics of the data, such as creator, origin, purpose, geographic location, and terms of access. There are a variety of metadata standards available, many of them discipline-specific. For more detailed information about creating metadata for your research data, visit the Metadata tab.

Image Credits

*All graphics on this page were created by Jørgen Stamp and published under a Creative Commons Attribution 2.5 Denmark License (www.digitalbevaring.dk).