Skip to Main Content
UCF Libraries Home

Metadata

Data and Research Data

Data are numerical quantities or other factual attributes derived from observation, experiment or calculation.         

  - National Research Council, 1992a. "Setting priorities for space research: Opportunities and imperatives."

• Data are facts, numbers, letters, and symbols that describe an object, idea, condition, situation, or other factors. Data in a database may be characterized as predominantly word oriented (e.g., as in a text, bibliography, directory, dictionary), numeric (e.g., properties, statistics, experimental values), image (e.g., fixed or moving video, such as a film of microbes under magnification or time-lapse photography of a flower opening), or sound (e.g., a sound recording of a tornado or a fire)... Data can also be referred to as raw, processed, or verified

  - Committee for a Study on Promoting Access to Scientific and Technical Data for the Public Interest, National Research Council. A Question of Balance: Private Rights and the Public Interest in Scientific and Technical Databases (1999). 

• The term "data" is used in this report to refer to any information that can be stored in digital form, including text, numbers, images, video or movies, audio, software, algorithms, equations, animations, models, simulations, etc. Such data may be generated by various means including observation, computation, or experiment. 

  - National Science Foundation (2005). Long-Lived digital data Collections: enabling Research and education in the 21st Century.  P.9. Available at: http://www.nsf.gov/pubs/2005/nsb0540/nsb0540.pdf

Research data, unlike other types of information, is collected, observed, or created, for purposes of analysis to produce original research results

   - University of Edinburgh. How to manage research data: Defining research data.

• In the context of these Principles and Guidelines [Principles and Guidelines for Access to Research Data from Public Funding], “research data” are defined as factual records (numerical scores, textual records, images and sounds) used as primary sources for scientific research, and that are commonly accepted in the scientific community as necessary to validate research findings.   

    - Organisation for Economic Co-operation and Development (OECD, 2007). OECD Principles and Guidelines for Access to Research Data from Public Funding. P.13. Available at: http://www.oecd.org/dataoecd/9/61/38500813.pdf

Dataset

Data set: A logically meaningful collection or grouping of similar or related data, usually assembled as a matter of record or for research, for example, the American FactFinder Data Sets provided online by the U.S. Census Bureau or the National Elevation Dataset available from the U.S. Geological Survey. Also spelled dataset.

   - Online dictionary for library and information science (ODLIS). Available at: http://www.abc-clio.com/ODLIS/odlis_A.aspx.

A research data set constitutes a systematic, partial representation of the subject being investigated.

   - Organisation for Economic Co-operation and Development (OECD, 2007). Available at: http://www.oecd.org/dataoecd/9/61/38500813.pdf.

Over the life course of a survey that results in a data set – from initial conceptualization to data publication and beyond -- a huge amount of metadata is typically produced. These metadata can be recorded in DDI format and re-used as the data collection, processing, tabulation, and reporting/dissemination take place

   - Arofan Gregory, Open Data Foundation (2011). The Data Documentation Initiative (DDI): An Introduction for National Statistical Institutes. Available at: http://odaf.org/papers/DDI_Intro_forNSIs.pdf

Research Data Types

Research data can be generated for different purposes and through different processes. Based on Research Information Network, it can include the following types of data:

  • Observational: data captured in real-time, usually irreplaceable. For example, sensor data, survey data, sample data, neuroimages.
  • Experimental: ldata from lab equipment, often reproducible, but can be expensive. For example, gene sequences, chromatograms, toroid magnetic field data.
  • Simulation: data generated from test models where model and metadata are more important than output data. For example, climate models, economic models.
  • Derived or compiled: data is reproducible but expensive. For example, text and data mining, compiled database, 3D models.
  • Reference or canonical: a (static or organic) conglomeration or collection of smaller (peer-reviewed) datasets, most probably published and curated. For example, gene sequence databanks, chemical structures, or spatial data portals.