Metadata Control File Reference
Version: 1.0
Basic Concepts
- Sections are case sensitive
- Section parameters are case sensitive
- Section parameter values are case sensitive
- YAML rules, conventions and features are suppported, such as:
- node anchors / references
- data typing
- If an optional section is specified, then its child parameters' cardinality are enforced
- Filename conventions are up to the user. However, below are some suggestions:
- use
.yml
as file extension - name the MCF basename the same as the dataset (e.g.
foo.shp
,foo.yml
)
- use
- To add a comment in a MCF, a line that begins with a hash tag (
#
) will be ignored - If you have a colon (
:
) in your text / value, use quotation marks ("
or'
) on either side of your text to avoid mapping errors when parsing the YAML file
Version format
MCFs are versioned using X.Y (MAJOR.MINOR changes) format. If a non supported MCF version is provided, pygeometa will throw an error and stop processing the MCF. Thus, the user must provide a valid and supported MCF version to generate the metadata.
Encoding
The MCF MUST be utf8 encoded. That is all.
Is your MCF Encoded as UTF8?
# editing in vim
# :set encoding=utf8
# pasting in vim
# in insert mode, hit CTRL-V $CODE, where $CODE is as per http://www.htmlhelp.com/reference/charset
# to see how the file is actually encoded on disk
file --mime-encoding file.txt
file -i file.txt
# to convert from one encoding to another
iconv -f iso8859-1 -t utf-8 file.yml > file.yml.new
Nesting MCFs
In the case where the user is generating metadata for multiple datasets which have common information, it becomes efficient to nest MCF together. pygeometa allows chaining MCFs to inherit values from other MCFs. Example: multiple datasets MCFs can refer a single MCF that contain contact information common to all those datasets.
To use MCF nesting:
- At the top of any section of a MCF add
base_mcf: foo.yml
- Specify the corresponding section in the base_mcf file
Notes about nesting MCFs:
- You can refer to one
base_mcf
per section of a MCF - Multiple sections can refer to the same base_mcf file
- When a parameter is defined in both the base_mcf file and the current MCF, it's always the current MCF that overwrites the base_mcf file
- Note that if a parameter in the current MCF is a YAML list, the corresponding base_mcf list (if it exists) is entirely overwritten
- MCFs can be nested in chains, meaning a MCF can be use a 'child' MCF and be used by a 'parent' MCF
Environment variables
pygeometa supports environment variables, using the notation in the example MCF snippet below:
metadata:
identifier: 3f342f64-9348-11df-ba6a-0014c2c00eab
parentidentifier: ${COLLECTION_ID}
Multilingual support
pygeometa supports default and alternate languages in ISO metadata.
Multilingual support is driven by the following sections in [metadata]
:
language
: 2 letter language code (i.e.en
,fr
) of primary languagelanguage_alternate
: 2 letter language code (i.e.en
,fr
) of secondary language
Example:
metadata:
language: en
language_alternate: fr
If language_alternate
is not defined or missing, pygeometa assumes a single language.
Values which support multilingual values can be specified with as keys to denote the respective language. Examples:
# single language
title: foo
# single language, explicit
title:
en: foo
# two languages, explicit
title:
en: foo
fr: bar
The language
value in the metadata
section must be a 2 letters language code. The user can use any language. For example: es
for Spanish.
Sections
mcf
Property Name | Mandatory/Optional | Description | Example | Reference |
---|---|---|---|---|
version | Mandatory | version of MCF format | 1.0 | pygeometa |
metadata
Property Name | Mandatory/Optional | Description | Example | Reference |
---|---|---|---|---|
identifier | Mandatory | unique identifier for this metadata file | 11800c2c-e6b9-11df-b9ae-0014c2c00eab | ISO 19115:2003 Section B.2.1 |
language | Mandatory | primary language used for documenting metadata, the metadata records themselves can be provided in multiple languages nonetheless | en | ISO 19115:2003 Section B.2.1 |
language_alternate | Optional | alternate language used for documenting metadata | en | ISO 19115:2003 Annex J |
charset | Mandatory | full name of the character coding standard used for the metadata set | utf8 | ISO 19115:2003 Section B.2.1 |
parentidentifier | Optional | file identifier of the metadata to which this metadata is a subset | 11800c2c-e6b9-11df-b9ae-0014c2c33ebe | ISO 19115:2003 Section B.2.1 |
hierarchylevel | Mandatory | level to which the metadata applies (must be one of 'series', 'software', 'featureType', 'model', 'collectionHardware', 'collectionSession', 'nonGeographicDataset', 'propertyType', 'fieldSession', 'dataset', 'service', 'attribute', 'attributeType', 'tile', 'feature', 'dimensionGroup' | dataset | ISO 19115:2003 Section B.2.1 |
datestamp | Mandatory | date that the metadata was created, pygeometa supports specifying the $date$ or $datetime$ variable to update the date value at run time | 2000-11-11 or 2000-01-12T11:11:11Z | ISO 19115:2003 Section B.2.1 |
dataseturi | Optional | Uniformed Resource Identifier (URI) of the dataset to which the metadata applies | urn:x-wmo:md:int.wmo.wis::http://geo.woudc.org/def/data/uv-radiation/uv-irradiance |
ISO 19115:2003 Section B.2.1 |
spatial
Property Name | Mandatory/Optional | Description | Example | Reference |
---|---|---|---|---|
datatype | Mandatory | method used to represent geographic information in the dataset (must be one of 'vector', 'grid', 'textTable', 'tin', 'stereoModel', 'video') | vector | Section B.5.26 |
geomtype | Mandatory | name of point or vector objects used to locate zero-, one-, two-, or threedimensional spatial locations in the dataset (must be one of 'complex', 'composite', 'curve', 'point', 'solid', 'surface') | point | ISO 19115:2003 B.5.15 |
identification
Property Name | Mandatory/Optional | Description | Example | Reference |
---|---|---|---|---|
doi | Optional | Digital Object Identifier (DOI) | 12345 | ISO 19115:2003 Section B.3.2.1 |
language | Optional | language(s) used within the dataset. If the dataset is made of numerical values, the dataset language can be set to 'missing', 'withheld', 'inapplicable', 'unknown' or 'template' | eng; CAN | ISO 19115:2003 Section B.2.2.1 |
charset | Optional | full name of the character coding standard used for the dataset | eng; CAN | ISO 19115:2003 Section B.2.1 |
title | Mandatory | name by which the cited resource is known | Important Bird Areas | ISO 19115:2003 Section B.3.2.1 |
edition | Optional | version of the cited resource | 1.8.0 | ISO 19115:2003 Section B.3.2.1 |
abstract | Mandatory | brief narrative summary of the content of the resource(s) | Birds in important areas... | ISO 19115:2003 Section B.2.2.1 |
topiccategory | Optional | main theme(s) of the dataset (must be one of 'geoscientificInformation', 'farming', 'elevation', 'utilitiesCommunication', 'oceans', 'boundaries', 'inlandWaters', 'intelligenceMilitary', 'environment', 'location', 'economy', 'planningCadastre','biota', 'health', 'imageryBaseMapsEarthCover', 'transportation', 'society', 'structure', 'climatologyMeteorologyAtmosphere'. More than one topic category can be specified | climatologyMeteorologyAtmosphere | ISO 19115:2003 Section B.5.27 |
fees | Optional | fees and terms for retreiving the resource. Include monetary units (as specified in ISO 4217). If there are no fees, use the term 'None' | None | ISO 19115:2003 Section B.2.10.6 |
accessconstraints | Optional | access constraints applied to assure the protection of privacy or intellectual property, and any special restrictions or limitations on obtaining the resource or metadata (must be one of 'patent', 'otherRestrictions','copyright','trademark', 'patentPending','restricted','license', 'intellectualPropertyRights'). If there are no accessconstraints, use the term 'otherRestrictions' | None | ISO 19115:2003 Section B.2.3 |
rights | Mandatory | Information about rights held in and over the resource. pygeometa supports using the $year$ variable to update the year value at run time. | Copyright (c) 2010 Her Majesty the Queen in Right of Canada | DMCI 1.1 |
url | Mandatory | URL of the dataset to which the metadata applies | http://host/path/ | ISO 19115:2003 Section B.2.1 |
status | Mandatory | "the status of the resource(s) (must be one of 'planned','historicalArchive','completed','onGoing', 'underDevelopment','required','obsolete')" | completed | ISO 19115:2003 Section B.2.2.1 |
maintenancefrequency | Optional | frequency with which modifications and deletions are made to the data after it is first produced (must be one of 'continual', 'daily', 'weekly', 'fortnightly', 'monthly', 'quarterly', 'biannually', 'annually', 'asNeeded', 'irregular', 'notPlanned', 'unknown' | continual | ISO 19115:2003 B.5.18 |
browsegraphic | Optional | graphic that provides an illustration of the dataset | http://example.org/dataset.png | ISO 19115:2003 B.2.2.2 |
identification.dates
identification.dates
sections can have 1..n dates
sections as required with the following object names/types:
Property Name | Mandatory/Optional | Description | Example | Reference |
---|---|---|---|---|
creation | Mandatory* | 'creation' reference date for the cited resource, referring to when the resource was brought into existence, *: presence of creation or publication or revision is mandatory | 2000-09-01 or 2000-09-01T00:00:00Z | ISO 19115:2003 Section B.3.2.4 |
publication | Optional* | 'publication' reference date for the cited resource, referring to when the resource was issued, *: presence of creation or publication or revision is mandatory | 2000-09-01 or 2000-09-01T00:00:00Z | ISO 19115:2003 Section B.3.2.4 |
revision | Optional* | 'revision' reference date for the cited resource, refersring to when the resource was examined or re-examined and improved or amended, *: presence of creation or publication or revision is mandatory | 2000-09-01 or 2000-09-01T00:00:00Z | ISO 19115:2003 Section B.3.2.4 |
identification:
...
dates:
creation: 2011-11-11
publication: 2000-09-01T00:00:00Z
identification.extents
identification.extents
sections can have 1..n spatial
and temporal
sections as required with the following properties.
Property Name | Mandatory/Optional | Description | Example | Reference |
---|---|---|---|---|
spatial.bbox | Mandatory | geographic position of the dataset, formatted as as list of [minx,miny,maxx,maxy] | -141,42,-52,84 | ISO 19115:2003 Section B.3.1.2 |
spatial.crs | Mandatory | EPSG code identifier | 4326 | ISO 19115:2003 Section B.2.7.3 |
spatial.description | Optional | description of the geographic area using an identifier | Toronto, Ontario, Canada | ISO 19115:2003 Section B.3.1.2 |
temporal.begin | Optional | Starting time period covered by the content of the dataset, either time period (startdate/enddate) or a single point in time value | 1950-07-31 | ISO 19115:2003 Section B.3.1.3 |
temporal.end | Optional | End time period covered by the content of the dataset, either time period (startdate/enddate) or a single point in time value. For data updated in realtime, use the term now |
now | ISO 19115:2003 Section B.3.1.3 |
temporal.resolution | Optional | Minimum time period resolvable in the dataset, as an ISO 8601 duration | P1D | ISO 19108 |
identification:
...
extents:
spatial:
- bbox: [-141,42,-52,84]
crs: 4326
temporal:
- begin: 1950-07-31
end: now
resolution: P1D
identification.keywords
identification
sections can have 1..n keywords
sections as required using nesting. Example:
identification:
...
keywords:
default:
keywords:
en: [foo1, bar1]
fr: [foo2, bar2]
keywords_type: theme
vocabulary:
name: my vocabulary
url: https://example.org/vocab
wmo:
keywords:
en: [foo3, bar3]
fr: [foo4, bar4]
keywords_type: theme
keywords_codelist: http://wis.wmo.int/2011/schemata/iso19139_2007/schema/resources/Codelist/gmxCodelists.xml#MD_KeywordTypeCode
Schema specific keywords sections
wmo
: World Meteorological Organization keywords (used for WMO Core Metadata Profile)gc_cst
: Government of Canada Core Subject Thesaurus (used for HNAP)hnap_category_information
: HNAPhnap_category_geography
: HNAPhnap_category_content
: HNAP
Within each keywords
section, the following elements are supported:
Property Name | Mandatory/Optional | Description | Example | Reference |
---|---|---|---|---|
keywords | Mandatory | category keywords | keyword1,keyword2,keyword3 | ISO 19115:2003 Section B.2.2.1 |
keywords_type | Mandatory | subject matter used to group similar keywords (must be one of 'discipline', 'place', 'stratum', 'temporal', 'theme') | theme | ISO 19115:2003 Section B.2.2.3 |
keywords_codelist | Optional | specific code list URL (for advanced use cases, else the default is as per the given specified schema) | http://wis.wmo.int/2011/schemata/iso19139_2007/schema/resources/Codelist/gmxCodelists.xml | ISO 19115:2003 Section B.2.2.3 |
identification.keywords.vocabulary
identification.keywords
sections can specify an optional vocabulary
section with the following elements:
Property Name | Mandatory/Optional | Description | Example | Reference |
---|---|---|---|---|
name | Mandatory | name of the source of keywords (English) | my thesaurus name | ISO 19115:2003 Section B.2.2.3 |
url | Optional | URL of source of keywords | https://example.org/my-vocab | - |
identification.license
identification.license
sections can provide a optinoal license via a name or URL using the following elements:
Property Name | Mandatory/Optional | Description | Example | Reference |
---|---|---|---|---|
name | Mandatory | name of license | CC BY 4.0 | - |
url | Optional | URL of license | https://creativecommons.org/licenses/by/4.0 | - |
content_info
Property Name | Mandatory/Optional | Description | Example | Reference |
---|---|---|---|---|
type | Mandatory | Content type (must be one of 'coverage', 'image', 'feature_catalogue' | image | ISO 19115:2003 Section B.2.8.1 |
cloud_cover | Optional | area of the dataset obscured by clouds, expressed as a percentage of the spatial extent | 72 | ISO 19115:2003 Section B.2.8.1 |
processing_level | Optional | image distributor’s code that identifies the level of radiometric and geometric processing that has been applied | L1 | ISO 19115:2003 Section B.2.8.1 |
content_info.attributes
content_info
sections can have 1..n attributes
sections as required using nesting. Example:
attributes:
- name: temperature
title:
en: Air temperature
abstract:
en: Description of air temperature attribute
type: number
units: K
values: [1.2, 2.5, 1.3]
Property Name | Mandatory/Optional | Description | Example | Reference |
---|---|---|---|---|
name | Mandatory | attribute name | temperature |
|
title | Optional | attribute title | Air temperature | |
abstract | Optional | attribute description | Description of air temperature | |
url | Optional | URL with more information about the attribute | Air temperature | |
type | Optional | data type | (must be one of 'string', 'number', 'integer', 'object', 'array', 'boolean') | string |
units | Optional | SI units | K | https://en.wikipedia.org/wiki/International_System_of_Units |
values | Optional | specification of attribute values | see below |
content_info.attributes.values
Attributes may also provide the values within a given domain, via one of the following constructs:
content_info.attributes.values.enum
Property Name | Mandatory/Optional | Description | Example | Reference |
---|---|---|---|---|
enum | Optional | Enumerated list of values | [1, 2, 3, 4] |
Example:
values:
enum: [1, 2, 3. 4]
content_info.attributes.values.range
Property Name | Mandatory/Optional | Description | Example | Reference |
---|---|---|---|---|
range | Optional | Range of values (min/max) | [1, 4] |
Example:
values:
range: [1, 4]
content_info.attributes.values.codelist
Property Name | Mandatory/Optional | Description | Example | Reference |
---|---|---|---|---|
codelist | Optional | codelist of values | see below | |
codelist.name | Mandatory | value name | foo | |
codelist.title | Optional | value title | Foo | |
codelist.abstract | Optional | value abstract | Description of value Foo | |
codelist.url | Optional | URL with more information about the value | https://example.org/foo |
Example:
values:
codelist:
- name: foo
title:
en: foo title
abstract:
en: foo description
url: https://example.org/values/foo
content_info.dimensions
content_info
objects support 1..n dimension
objects.
Property Name | Mandatory/Optional | Description | Example | Reference |
---|---|---|---|---|
name | Mandatory | name of dimension | B1 | ISO 19115:2003 Section B.2.8.2 |
units | Mandatory | units in which sensor wavelengths are expressed | nm | ISO 19115:2003 Section B.2.8.2 |
min | Mandatory | shortest wavelength that the sensor is capable of collecting within a designated band | 101 | ISO 19115:2003 Section B.2.8.2 |
max | Mandatory | longest wavelength that the sensor is capable of collecting within a designated band | 199 | ISO 19115:2003 Section B.2.8.2 |
contact
MCFs can have 1..n contact
sections as required using nesting. Example:
contact:
pointOfContact:
....
distributor:
....
The contact.pointOfContact
section provides information for the pointOfContact
role (see ISO 19115:2003 Section B.3.2.1). This section is minimally required.
Within each contact
section, the following elements are supported:
Property Name | Mandatory/Optional | Description | Example | Reference |
---|---|---|---|---|
organization | Mandatory | name of the responsible organization | Environment Canada | ISO 19115:2003 Section B.3.2.1 |
url | Mandatory | on-line information that can be used to contact the individual or organization | http://host/path | ISO 19115:2003 Section B.3.2.3 |
individualname | Mandatory | name of the responsible person-surname | given name | title seperated by a delimiter |
positionname | Mandatory | role or position of the responsible person | Senior Systems Scientist | ISO 19115:2003 Section B.3.2.1 |
phone | Mandatory | telephone number by which individuals can speak to the responsible organization or individual | +01-123-456-7890 | ISO 19115:2003 Section B.3.2.7 |
fax | Mandatory | telephone number of a facsimile machine for the responsible organization or individual | +01-123-456-7890 | ISO 19115:2003 Section B.3.2.7 |
address | Mandatory | address line for the location (as described in ISO 11180 | Annex A) | 4905 Dufferin Street |
city | Mandatory | city of the location | Toronto | ISO 19115:2003 Section B.3.2.2 |
administrativearea | Mandatory | state, province of the location | Ontario | ISO 19115:2003 Section B.3.2.2 |
postalcode | Mandatory | ZIP or other postal code | M3H 5T4 | ISO 19115:2003 Section B.3.2.2 |
country | Mandatory | country of the physical address | Canada | ISO 19115:2003 Section B.3.2.2 |
Mandatory | address of the electronic mailbox of the responsible organization or individual | foo@bar.tld | ISO 19115:2003 Section B.3.2.2 | |
hoursofservice | Optional | time period (including time zone) when individuals can contact the organization or individual | 0700h - 1500h EST | ISO 19115:2003 Section B.3.2.3 |
contactinstructions | Optional | supplementalinstructions on how or when to contact the individual or organization | contact during working business hours | ISO 19115:2003 Section B.3.2.3 |
contact.distributor
The contact.distributor
section provides information for the distributor
role (see ISO 19115:2003 Section B.3.2.1) and has the identical structure as contact.pointOfContact
.
If contact information is the same for both, use YAML node anchors and references to have it provided in both sections in the metadata:
contact:
pointOfContact: &id_contact_poc
...
distributor: *id_contact_poc
distribution
MCFs can have 1..n distribution
sections as required using nesting. Example:
distribution:
wms:
....
waf:
....
Within each distribution
section, the following elements are supported:
Property Name | Mandatory/Optional | Description | Example | Reference |
---|---|---|---|---|
url | Mandatory | location (address) for on-line access using a Uniform Resource Locator address or similar addressing scheme such as http://www.isotc211.org/ | http://host/path | ISO 19115:2003 Section B.3.2.5 |
type | Mandatory | connection protocol to be used. Must be one of the identifier values from https://github.com/OSGeo/Cat-Interop/blob/master/LinkPropertyLookupTable.csv |
WWW:LINK | ISO 19115:2003 Section B.3.2.5 |
rel | Optional | the type or semantic of the relation. The value should be an IANA link relation or a relation type specific to an established standard | canonical | Link Relations - Internet Assigned Numbers Authority |
name | Mandatory | name of the online resource | Download portal | ISO 19115:2003 Section B.3.2.5 |
description | Mandatory | detailed text description of what the online resources is/does | brief description of the online resource (English) | ISO 19115:2003 Section B.3.2.5 |
function | Mandatory | code for function performed by the online resource (must be one of 'download', 'information', 'offlineAccess', 'order', 'search') | download | ISO 19115:2003 Section B.3.2.5 |
format | Optional | Format of the distribution method | WMS | HNAP 2.3 |
format_version | Optional | Format version of the distribution method. Note: this value needs to be encoded as a string | 1.0 | HNAP 2.3 |
channel | Optional | channel/topic/exchange when link is a Pub/Sub endpoint | my/cool/topic | OGC API - Pub/Sub |
dataquality
Property Name | Mandatory/Optional | Description | Example | Reference |
---|---|---|---|---|
scope.level | Optional | hierarchical level of the data specificed by the scope | dataset | ISO 19115:2003 Section B.2.4.5 |
lineage.statement | Optional | general explanationn of the data producer's knowledge about the lineage of a dataset | this dataset was produced from a custom process against dataset xyz | ISO 19115:2003 Section B.2.4.2.1 |
Tips
Keyword Substitution
pygeometa supports using the following keyword substitutions:
$year$
, which is substituted for the current year with the YYYY format, example: 2016$date$
, which is substituted for the current date with the YYYY-MM-DD, example: 2016-12-22 format$datetime$
, which is substituted for the current date and time with the YYYY-MM-DDThh:mm:ssZ, example: 2016-12-22T16:34:15Z format