There are some fundamental decisions that you need to make when you start your research, and data organization should be within this set. The choices that you make will vary based on type of research that you do, but everyone must address the same issues.
Data identifiers
File formats
File versioning
Naming conventions
You may want to consider using more sophisticated name schema if you want to share or cite your data. You'll want put your datasets where other people can access them, and give your datasets identifiers that can be referenced easily.
Data identifiers must be globally unique and persistent. That is to say, they must not be repeated elsewhere and they must not change over time.
There are many different schemes:
PURL -- A PURL is a Persistent Uniform Resource Locator. Functionally, a PURL is a URL. However, instead of pointing directly to the location of an Internet resource, a PURL points to an intermediate resolution service. The PURL resolution service associates the PURL with the actual URL and returns that URL to the client.
DOI -- A DOI (Digital Object Identifier) is a name (not a location) for an entity on digital networks. It provides a system for persistent and actionable identification and interoperable exchange of managed information on digital networks.
ACCESSION -- Accession numbers used by the National Center for Biotechnology Information (NCBI) are unique and citable.
InChI -- The IUPAC International Chemical Identifier (InChITM) is a non-proprietary identifier for chemical substances that can be used in printed and electronic data sources thus enabling easier linking of diverse data compilations.
URI -- Uniform Resource Identifier (URI) consists of a string of characters used to identify or name a resource on the Internet. Such identification enables interaction with representations of the resource over a network, typically the World Wide Web, using specific protocols.
It is important to think carefully about what file format will be best for long-term preservation and continued access to your data.
Accessible in the future
|
Non-proprietary
|
Open, documented standard
|
Common, used by the research community
|
Standard representation (ASCII, Unicode)
|
Unencrypted
|
Uncompressed
|
Not software specific |
Keeping track of versions of documents and datasets is critical. Strategies include:
Directory Structure Naming Conventions
File Naming Conventions
File Naming Conventions for Specific Disciplines
File Renaming