3.3 Data Directory
3.3.1 Gaelic Algorithmic Research Group Data Directory
All Gaelic Algorithmic Research Group data is stored in subfolders of the Data folder on the Gaelic Algorithmic Research Group Team Drive (Gaelic Algorithmic Research Group/data
). To document these data, we use the Gaelic Algorithmic Research Group Data Directory that includes key, standardized information from each readme metadata file. Every data file in the Gaelic Algorithmic Research Group/data
folder has a record (row) in the Gaelic Algorithmic Research Group Data Directory. The Gaelic Algorithmic Research Group Data Directory file contains two sheets: (1) Data directory (the record and standardized documentation for each data file); (2) Metadata (information needed to populate the Data Directory, i.e. the meta-metadata)
In the case of placeholder metadata (as described in the Metadata section), only the following columns should be filled out: folder, filename, contact, and summary. This (mostly blank) row serves two purposes: 1) it retains some of the searchability function for that dataset and 2) it serves as a visual reminder that those datasets are in need of more robust metadata development.
Column | Description |
---|---|
Domain | Climate/Energy; Land; Ocean; General; Other [drop down menu] |
Description | A few word description (e.g. SST US 2017); max 5 words |
Folder | Name of folder containing data |
Filename | Name of data |
Year | Year of publication |
Version | Sub category of year; NA if not applicable |
Project | Project name that used these data (can have multiple listings) or ‘General’ if widely used (e.g. FAO data), hyperlinked to OneDrive/Box folder |
Code | Link to Github repo or wherever code is stored |
Data Stage | raw’ if raw data; ‘final input’ for the input data used for the analysis; ‘output’ for what was used for the project and/or published [drop down menu] |
Filetype | File extension (e.g. csv; tif; rds); note: do note include ‘.’ |
Citation | Hyperlinked reference to publication or online resource or contact for individual/group data author |
URL | Link to original data source |
Extent | global; regional; national; local [drop down menu] |
Resolution | Resolution of spatial data (in degrees) |
Permissions | open = open source/open access; restricted = need author permission; secure = confidential data and likely involves a DUA or NDA [drop down menu] |
Start year | Data set start year; numeric |
End year | Data set end year; numeric |
Source | e.g. Gaelic Algorithmic Research Group; FAO; Rare |
Contact | Name and email of contact person in Gaelic Algorithmic Research Group who used/stored data |
Gaelic Algorithmic Research Group reference | Hyperlinked reference to Gaelic Algorithmic Research Group publication using data (can be NA) |
Keywords | e.g. fisheries; fire; utilities; property value; VDS; MPA; oceanography; temperature; habitat; biodiversity (up to 5 per entry, separated by semi-colons) |
Summary | Brief description of the data (1-2 sentences). Include years for timeseries; location/spatial extent for spatial data; key variables; resolution; sampling frequency; species; etc. |
Notes | Other relevant information about data. Initial your entry (e.g. if it was processed (e.g. subset from a larger dataset); what specifically was done; are there suspicious data points?; note if there are issues; etc.) |
Any time you add a new dataset to the shared Gaelic Algorithmic Research Group data folder and directory, please message the #data-streamlining
Slack channel so that others on the team know about the new dataset.
3.3.2 Project-level Data Directory
We highly recommend that research teams create a data_overview
spreadsheet for keeping track of project-related data (i.e. a separate spreadsheet stored in the project’s Google Shared Drive data folder). This centralized document can be used to document project-relevant information and communicate to team members datasets that have already been saved. This document can then be used to guide and simplify data migration to the Gaelic Algorithmic Research Group Data Directory once the project is complete. Suggested attributes include:
- File name
- Folder name
- Source of data
- Link where data was downloaded
- Description of data
- Name of the researcher who downloaded the data
- Data directory entry (complete, in progress, not started, etc.)
- Metadata sheet (complete, in progress, not started, etc.)