3.3 Formatting Guidelines

You are here:

General Rules

  • You may provide multiple formats for the same data.
  • Consider submitting the files used by your statistical program as they are already machine-readable.
  • Provide data as CSV (comma-separated values) wherever possible for tabular data. In addition, you may also submit spreadsheets to capture formulas.
  • Use standard terminology wherever possible (e.g. include both the Latin and common names for plants and animals). 
    • If you are a scientific researcher, you may have access to disciplinary thesauri or ontologies. Use these for standardization or information and remember to cite them! For example, the Integrated Taxonomic Information System provides standardized names for plants, animals, fungi, and microbes and is internationally recognized.
  • Do not include dates/years in your dataset title.  Instead include dates in the dataset description and temporal coverage fields.  The resources (data files) can include date/years in their title.
    • Why?  When a dataset is created the system generates a unique DOI and URL link.  This ID can be used as a quick link to access the data, it may be referenced by the public using your data and it also used in the website statistics.
      Amending the title of a dataset will mean the URL is incorrect and will break these linkages.  When a dataset is refreshed eg with a new year of data, the title quickly becomes out of date.
  • Avoid using acronyms or ambiguous terms in your dataset title.  The title should be understood by members of the public.  Describe any terms or words specific to your field in the description.
  • Files can be zipped but you should include a list to let people know what is inside the zipped file.  
  • Shapefiles are made up of at least 3 files (shp, shx, dbf) and can be zipped. The format selected should be shp. KML files can also be zipped (KMZ) and should be entered with the format kml

Formatting Excel and CSV files

  1. CSV should be UTF-8 character set encoding.
  2. First row should contain the single column header row to label your variables.
    1. If your current data has more than one header row, consider combining these into one row in a way that makes sense.
    2. Always start at row 1, column 1
    3. The second row or column should always be the first row or column of data
  3.  File should only contain a single table of data
    1. Combine your data onto a single spreadsheet page. Delete any additional pages by right-clicking the blank pages at the bottom of your spreadsheet and choosing “Delete”. Note that one multi-tab spreadsheet might become several CSV files.
  4. Column headers should use the universal text format
    1. Only small caps letters a-z and digits 0-9 should be used
    2. No spaces should be used in column names
    3. Separate words should be connected with the underscore (_) character
  5. Use of Units/Measure
    • Also, use consistent units of measure (e.g. always kilograms or always tonnes but not both)
    • The units of measure cannot be mixed with the numeric values in the column. If the unit is the same for the entire column, the unit can be added to the column name (e.g. weight_kg)
    • In the case the unit/measure has different values for a column, the unit/measure must be stored in a separate column (ideally following the numeric value column)
      • e.g. Column header: “description”;”amount”;”currency”
        • “Drone”;”86.56″;”EUR”
        • “CTD”;56.23″;”CDN”
  6. Only one data type per column
    1. Data values in the same column should be of the same data type (e.g. text, integer, decimal, date, time, etc.)
  7. Data values of type Data
    1. For data values of type date, the ISO-format should be used as follows
      1. YYYY-MMM-DD
    2. For type date/time, the ISO-format should be as follows:
      1. YYYY-MMM-DD-HHMMSS (if there are SS)
  8. Do not merge cells
  9. Do not use blank rows or columns
  10. Do not use zeroes or leave a cell blank. Select a code to identify missing data or to otherwise flag data (e.g. missing data identified as NA, -999 or -9999, data too low or high flagged as -1111 or -22222). You can add a separate data dictionary or page to document this information.
  11. Do not include notes, macros or graphs (use a data dictionary or additional document page)
  12. For numeric columns, use simple decimal number (e.g. 23.5) NOT fractions (e.g. 23 1/2) or ranges (e.g. 23-24)
  13. No “thousands” formatting characters
    1. Number values can only consist of digits 0-9 and the optional “-” character for negative number, and the decimal point (.) for decimals
    2. Cannot contain formatting characters/separators (e.g. “,” or blank for the “thousands” position. Or currency symbols
  14. In order to use the built-in chart previews in CanWIN, remove any special characters from your column header rows and data cells. Some characters in the column headers prevent the embed/link features from working with charts and graphs. This step is not necessary to create a CSV but is necessary to use the Embed feature for sharing your data visualizations created in the CanWIN. Below is a list of incompatible and acceptable special characters.

Characters that should NOT be used in column headers or data cells:

Character Description Character Description
` accent ! exclamation point
@ “at” symbol # number / hashtag symbol
$ dollar symbol % percent symbol
^ caret & “and” symbol
* asterisk ( ) parenthesis
< > greater / less than symbols ? question mark
[ ] brackets { } brackets
| pipe \ back slash
apostrophe ~ tilde
  1. Remove any commas from your document. Because the delimiter is a comma, extra commas in your text can cause errors in interpreting the data.
  2. IF you wish to save your document as CSV, choose “Save as: CSV” (Comma delimited)

Information modified from Resource Formatting, QGov Online, Queensland Government and USDA National Agriculture Library ( https://data.nal.usda.gov/guidelines-formatting-your-data) and 2015. Intrasoft et al. European Data Portal- Data Supplier Guidelines. 

Last Updated On September 20, 2017