Skip to main content
UCF Libraries Home

Data Management

Data organization considerations

There are some fundamental decisions that you need to make when you start your research, and data organization should be within this set. The choices that you make will vary based on type of research that you do, but everyone must address the same issues.

   Data identifiers

   File formats

   File versioning

   Naming conventions

Data Identifier considerations

You may want to consider using more sophisticated name schema if you want to share or cite your data. You'll want put your datasets where other people can access them, and give your datasets identifiers that can be referenced easily.

Data identifiers must be globally unique and persistent. That is to say, they must not be repeated elsewhere and they must not change over time.

There are many different schemes:

PURL -- A PURL is a Persistent Uniform Resource Locator. Functionally, a PURL is a URL. However, instead of pointing directly to the location of an Internet resource, a PURL points to an intermediate resolution service. The PURL resolution service associates the PURL with the actual URL and returns that URL to the client.

DOI -- A DOI (Digital Object Identifier) is a name (not a location) for an entity on digital networks. It provides a system for persistent and actionable identification and interoperable exchange of managed information on digital networks.

ACCESSION -- Accession numbers used by the National Center for Biotechnology Information (NCBI) are unique and citable.

InChI -- The IUPAC International Chemical Identifier (InChITM) is a non-proprietary identifier for chemical substances that can be used in printed and electronic data sources thus enabling easier linking of diverse data compilations.

URI -- Uniform Resource Identifier (URI) consists of a string of characters used to identify or name a resource on the Internet. Such identification enables interaction with representations of the resource over a network, typically the World Wide Web, using specific protocols.

File format considerations

It is important to think carefully about what file format will be best for long-term preservation and continued access to your data.

Accessible in the future
   Non-proprietary
Open, documented standard
   Common, used by the research community
Standard representation (ASCII, Unicode)
   Unencrypted
Uncompressed
   Not software specific

 

File version considerations

Keeping track of versions of documents and datasets is critical. Strategies include:

  • Directory Structure Naming Conventions
  • File Naming conventions
  • Always record every change to a file no matter how small.
  • Discard obsolete versions after making backups.
  • Track changes
    • Record every change to a file, no matter how small
    • Keep track of changes to files
    • Use file naming conventions
    • Headers inside the file
    • Log files
    • Version Control Software (e.g. SVN)
    • File sharing software (Google Docs or Amazon S3)

Naming convention considerations

Directory Structure Naming Conventions

  • When organizing files, directory top-level folder should include the project title, unique identifier, and date (year).
  • The substructure should have a clear, documented naming convention; for example, each run of an experiment, each version of a dataset, and/or each person in the group.

File Naming Conventions

  • Reserve the 3-letter file extension for application-specific codes, for example, formats like .wrl, .mov, and .tif.
  • Identify the activity or project in the file name

File Naming Conventions for Specific Disciplines

File Renaming