Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

H5M Convention
Version 0.1
Revision 15 Author: Gerco de Jager, MARIN MSG

Rev

When

Who

What & Why

15

2018-03-13

GdJ

SignalSet's 'programmeNo' renamed to 'programNo' to be consistent with the application pool.

14

2018-02-27

GdJ

Added kind of information per attribute:
s: structural, i: informational, a: interpreted by application(s)

 

 

GdJ

Conformed signal set's 'projectNo' to MARIN integer representation.

13

2018-01-31

GdJ

Signal: added 'order' to indicate whether the signal represents a first or second order effect. Only if applicable.

12

2018-01-30

GdJ

Signal: unit is singular, not plural.
Removed definitionsList from

11

2018-01-12

GdJ

Meeting with DJN and JB:
Need for domain / source specific metadata (attributes) for signals and sets. For Tank / MMS2 this is mainly sensor and calibration facts (no, this is NOT stored in TDMS).

10

2018-01-09

GdJ

Processed feedback of MP, HT, RH, NC:
Made file structure image readable for non-programmers.
Added extra note on the default value 'not specified'.
Improved iso_fmt UTC explanation.
Added optional attributes 'definitionsListVersion' and '...Name' at file level to cater for future change of a common set of definitions for units, quantities and so on. This might be redundant if this is implied by application name and version.
Reduced signal 'direction' to 3D as it is either a translation OR a rotation. Which one can be inferred from the 'signalType' and / or the 'units'.

9

2017-12-19

GdJ

Removed "rawUnits" and upgraded "units" to UTF-8.(Rationale: Leave interpretation of units to client code)

 

 

GdJ

Added signal attribute "baseNames" for informational purposes. Reason: ObjectReference does not render very descriptive.

8

2017-12-14

GdJ

changed "original*" prefix in "raw*" as it expresses the cause better.
changed "datetime*" to "dateTime*"

7

2017-12-08

GdJ

removed 'format' prefix on file header.

6

2017-11-28

GdJ

changed 'reference' to 'referenceSystem' for clarity.

5

2017-11-16

GdJ

Support non-unit-definition-strings as Can-be-missing attribute "rawUnits". (a.r.o. units meeting EvdB et al dd 2017-11-16)

4

2017-10-23

GdJ

Unicode support in attributes and names.

3

2017-10-12

GdJ

Video chat with Gerd Heber (HDF Group) dd 2017-10-06.

2

2017-10-02

GdJ

Discarding Timezone information in datetime strings.

1

2017-09-21

GdJ

Results of meeting dd 21-09-2017.

0

2017-09-20

GdJ

First draft for H5M.NET, "MMS2 writes H5M".

...

Note before reading any further: This file format is still in development. Not all features and intended usages are supported yet by this standard. New features will be added gradually and lead to new versions of this standard.

Contents
1 Introduction
1.1 Intended usage
1.2 Supported features
2 Glossary
3 H5M Structure
3.1 Root : Group "/"
3.2 Signal Set : Group
3.3 Signal : Dataset
4 H5M Data Descriptions
4.1 Naming conventions
4.2 Attributes
4.3 Data types
4.3.1 Default value
4.3.2 Strings: favour unicode (UTF-8) over ascii
4.3.3 iso_fmt
4.3.4 unit_str
4.3.5 prop_str
4.3.6 obj_ref

Anchor
_Toc516237316
_Toc516237316
Introduction

H5M is an acronym for HDF5 MARIN Datasets File.
The data which it contains can have any available source at MARIN: Basin measurements, simulator runs, CFD calculations, full scale measurements and post-processing cq analysis of this data.
This convention specifies how the content of a HDF5 file is organised for MARIN. For more information about HDF5 itself and for generic HDF5 viewers please visit https://support.hdfgroup.org/.
The glossary of terms is described in the next chapter.The data structure and their properties are specified in chapter 3.Details of the data types used and the followed naming convention are specified in chapter 4.

Anchor
_Toc516237317
_Toc516237317
Intended usage

MARIN Internal use:

  • Exchange and access results of the measurements in the basins to data analysts for further processing. This contains the raw data from MMS2 and supersedes the MMS format.
  • Exchange the results of simulations and calculations.
  • Exchange the results of analysis and other post-processing of raw data between data analysts and between departments.

MARIN External use:

  • Exchange processed results of measurements or calculations or simulations or experiments or analysis with clients. (Portal, e-mail, ftp, …)

Anchor
_Toc516237318
_Toc516237318
Supported features

The specified format supports the following features:

  1. Identification of the software and system configuration used to write the data set.
  2. Identification of the process step(s) that generated the data set. (i.e. experiment number, project number, etc)
  3. Identification of the experiment variables (a.k.a. signals) and – if applicable - the sensors used.
  4. Grouping of signals with shared origin or properties. This is called a signal set.
  5. Link a signal set to another signal set to create a dependency of one to another.
  6. Time branching; The available Signal Sets can form a graph from which a Signals can be composed by following a specific path.

Anchor
_Toc516237319
_Toc516237319
Glossary

ExperimentIn the context of this convention, any action that results in one or more signals that need to be stored. E.g. MMS2 Measurement, ReFresco Calculation, XMF Simulation, SHARK data analysis.
H5MMARIN convention of structuring the content of a HDF5 container for Experiment Datasets.
Master Signal Independent value range. E.g. the timestamp values for a time series signal that correspond with each sample value.
Signal Measured, calculated or simulated variable; range of values.
Signal SetSet of signals grouped by one or more common properties. E.g. origin, experiment run, sample rate, sample count, timestamp.
Slave SignalDependent value range. E.g. the sample values at each corresponding timestamp for a time series signal or the RAO value at certain values on several frequency axis.

Anchor
_Toc516237320
_Toc516237320
H5M Structure

The H5M content is structured as depicted in the diagram below.
Each tree node item will be explained in the next paragraphs in this chapter.The location of the item is defined by it's address in the HDF5 file, called path. Also all available attributes are specified per item.
Image Added

Anchor
_Toc516237321
_Toc516237321
Root : Group "/"

The root group contains:

  • attributes that identify the used H5M format.
  • zero or more HDF Groups for each set of signals.
  • attributes that identify the provenance of the file.

Attributes:

Name

Role

Data type

Exists?
A / O / NS

Description

Convention / example

name

i

utf8

A

Name of the convention; constant

“H5M”

description

i

utf8

A

Description of the format; constant.:

“HDF5 MARIN Datasets File”

version

a

utf8

A

H5M Format version number

“0.1”

documentation

i

utf8

A

Where to find this convention

mods.marin.nl/dispay/H5M/convention_0_1

hdf5Version

i

utf8

A

Version of HDF5

“1.8.19”

libraryName

i

utf8

A

Name of the library that performed the actual writing.

In case of appending signal sets only the last editor is tracked.

“pymarin”, “Marin.Experiments.IO.H5M”

libraryVersion

i

utf8

A

Version of this library

e.g. “7.0.1”

applicationName

i

utf8

NS

Name of the application that wrote this file

“SHARK”,

“SHARK: some vistrail.vt”

applicationVersion

i

utf8

NS

Version of this application used.

“0.0.20”

systemName

i

utf8

O

Name of the system running the application. This not necessarely the system that performed the experiment (e.g. measurement).

“LP3138”, “MMS2”

systemVersion

i

utf8

O

Version or configuration of this systtem

(tbd)

dateTimeOfCreation

i

iso_fmt

A

Date and time of the moment the file was created.

ISO 8601

“2017-09-27 T21:13:00.012345”

userName

i

utf8

NS

Name of user / author

“user123”

notes

i

utf8

NS

Any additional notes at file level.

(free text)

writeErrors

i

utf8[]

O

Specifies any errors that occurred while writing the file..

 

 

Anchor
_Toc516237322
_Toc516237322
Signal Set : Group

The signal set group contains a set of signals that are logically a single group. Such signals share common properties like number of samples andsample rate. Which properties are common has to be determined by the consumer of the data; either a human or an application.
Name: HDF Group name of the set of signals.
Path: "/<Signal set name>"

Attributes:

Name

Role

Data type

Exists?
A / O / NS

Description

Convention / example

type

a

utf8

NS

type of signals in the set.

Value NS maps to “General”.

“General”, “Frequency”, “Time”

rawName

i

utf8

O

original signal set name if this had to be renamed to be used as valid hdf5 name.

 

description

i

utf8

NS

additional description of the set

(tbd)

parent

s

obj_ref

O

Link to parent signal set.

(tbd)

parentName

s

utf8

O

For information only: the name of the parent signal set.

 

dataScale

a

float64

A

Scale of the signal values in the set with respect to full scale

1.0 ( full scale)

23.456 (model scale)

0.023456 (larger than life)

waterDensityFactor

a

float64

NS

water density factor to be used for scaling data values to full scale.
Preferred ‘not specified’over a default value.

1.432

stepSize

i

float64

A

if one and only one master signal for all signals and it is equidistant a value; otherwise, NaN

In case of a time master this is the inverse of the sample rate.

 

dateTimeRecordingStart

a

iso_fmt

A

date and time of first sample of the time signal of measurement or simulation.
Required for time series only..

(tbd)

dateTimeRecordingEnd

a

iso_fmt

O

date and time of last sample of the time signal of the measurement or simulation.
(May disappear in future versions.)

(tbd)

projectNo

a

int32

A

projectnumber

“80220”

projectSubNo

i

int32

NS

subnumber

“368”

programNo

a

int32

A

test programme number

1

source

a

utf8

A

name of the source application or facility
Within one project each ‘programmeNo’  is a fixed combination with the ‘source’.

“SMB”, “ReFresco”

categoryNo

a

int32

A

Number of the test category used.

2

testNo

a

int32

A

Number of the test setup used.

3

experimentNo

a

int32

A

Number of the experiment settings used.

4

measurementNo

a

int32

A

Number of the actual measurement c.q. experiment execution

2

modelScale

a

float64

A

Scale of the model in this project. (Unrelated to the signal values in the set.)

23.456

...

H5M is an acronym for HDF5 MARIN Datasets File.
The data which it contains can have any available source at MARIN: Basin measurements, simulator runs, CFD calculations, full scale measurements and post-processing cq analysis of this data.
This convention specifies how the content of a HDF5 file is organised for MARIN. For more information about HDF5 itself and for generic HDF5 viewers please visit https://support.hdfgroup.org/.
The glossary of terms is described in the next chapter.The data structure and their properties are specified in chapter 3.Details of the data types used and the followed naming convention are specified in chapter 4.

...

MARIN Internal use:

  • Exchange and access results of the measurements in the basins to data analysts for further processing. This contains the raw data from MMS2 and supersedes the MMS format.
  • Exchange the results of simulations and calculations.
  • Exchange the results of analysis and other post-processing of raw data between data analysts and between departments.

MARIN External use:

  • Exchange processed results of measurements or calculations or simulations or experiments or analysis with clients. (Portal, e-mail, ftp, …)

...

The specified format supports the following features:

  1. Identification of the software and system configuration used to write the data set.
  2. Identification of the process step(s) that generated the data set. (i.e. experiment number, project number, etc)
  3. Identification of the experiment variables (a.k.a. signals) and – if applicable - the sensors used.
  4. Grouping of signals with shared origin or properties. This is called a signal set.
  5. Link a signal set to another signal set to create a dependency of one to another.
  6. Time branching; The available Signal Sets can form a graph from which a Signals can be composed by following a specific path.

...

ExperimentIn the context of this convention, any action that results in one or more signals that need to be stored. E.g. MMS2 Measurement, ReFresco Calculation, XMF Simulation, SHARK data analysis.
H5MMARIN convention of structuring the content of a HDF5 container for Experiment Datasets.
Master Signal Independent value range. E.g. the timestamp values for a time series signal that correspond with each sample value.
Signal Measured, calculated or simulated variable; range of values.
Signal SetSet of signals grouped by one or more common properties. E.g. origin, experiment run, sample rate, sample count, timestamp.
Slave SignalDependent value range. E.g. the sample values at each corresponding timestamp for a time series signal or the RAO value at certain values on several frequency axis.

...

The H5M content is structured as depicted in the diagram below.
Each tree node item will be explained in the next paragraphs in this chapter.The location of the item is defined by it's address in the HDF5 file, called path. Also all available attributes are specified per item.
Image Removed

...

The root group contains:

  • attributes that identify the used H5M format.
  • zero or more HDF Groups for each set of signals.
  • attributes that identify the provenance of the file.

...

Name

Role

Data type

Exists?A / O / NS

Description

Convention / example

name

i

utf8

A

Name of the convention; constant

"H5M"

description

i

utf8

A

Description of the format; constant.:

"HDF5 MARIN Datasets File"

version

a

utf8

A

H5M Format version number

"0.1"

documentation

i

utf8

A

Where to find this convention

"mods.marin.nl/dispay/H5M/convention_0_1"

hdf5Version

i

utf8

A

Version of HDF5

"1.8.19"

libraryName

i

utf8

A

Name of the library that performed the actual writing.
In case of appending signal sets only the last editor is tracked.

"pymarin", "Marin.Experiments.IO.H5M"

libraryVersion

i

utf8

A

Version of this library

e.g. "7.0.1"

applicationName

i

utf8

NS

Name of the application that wrote this file

"SHARK",
"SHARK: some vistrail.vt"

applicationVersion

i

utf8

NS

Version of this application used.

"0.0.20"

systemName

i

utf8

O

Name of the system running the application. This not necessarely the system that performed the experiment (e.g. measurement).

"LP3138", "MMS2"

systemVersion

i

utf8

O

Version or configuration of this systtem

(tbd)

dateTimeOfCreation

i

iso_fmt

A

Date and time of the moment the file was created.

ISO 8601
"2017-09-27 T21:13:00.012345"

userName

i

utf8

NS

Name of user / author

"user123"

notes

i

utf8

NS

Any additional notes at file levelinformation about this signal set.

(free text)

<ac:structured-macro ac:name="unmigrated-wiki-markup" ac:schema-version="1" ac:macro-id="8ab884a4-44ea-4d15-b867-dc002a995754"><ac:plain-text-body><![CDATA[

writeErrors

i

utf8[]

O

Specifies any errors that occurred while writing the file.signal set (group).

 ]]></ac:plain-text-body></ac:structured-macro>

 


Anchor
_

...

Toc516237323
_

...

Toc516237323
Signal

...

:

...

Dataset

The signal set group contains a set of signals that are logically a single group. Such signals share common properties like number of samples andsample rate. Which properties are common has to be determined by the consumer of the data; either a human or an applicationis a dataset containing the samples of the measurement or simulation or calculation or postprocessing step. A signal maps to an experiment variable.
Name: HDF Group name of the set of signalsDataset name; must be unique in the set. It is the HDF-safe name of the signal. For reference the original potentially HDF-unsafe name is provided with the data in the 'signalSource' attribute.
Path: "/<set name>/<Signal set <signal name>"
Attributes Additional properties of the signal are added as attributes. Below are the common attributes. In sub sections domain or source specific attributes can be found.

Attibutes:

Name

Role

Data type

Exists?
A / O / NS

Description

Convention / example

type

a

utf8

NS

type of signals in the set.
Value NS maps to "General".

"General", "Frequency", "Time"

rawName

i

utf8

O

original signal set name if this had to be renamed reformatted to be used as valid hdf5 name.

 

descriptionunit

iautf8

unit_str

NS

additional description of the set

(tbd)

parent

s

obj_ref

O

Link to parent signal set.

(tbd)

parentName

s

utf8

O

For information only: the name of the parent signal set.

 

dataScale

a

float64

A

Scale of the signal values in the set with respect to full scale

1.0 ( full scale)
23.456 (model scale)
0.023456 (larger than life)

waterDensityFactor

a

float64

NS

water density factor to be used for scaling data values to full scale.Preferred 'not specified'over a default value.

1.432

stepSize

i

float64

A

if one and only one master signal for all signals and it is equidistant a value; otherwise, NaN
In case of a time master this is the inverse of the sample rate.

 

dateTimeRecordingStart

a

iso_fmt

A

date and time of first sample of the time signal of measurement or simulation.Required for time series only..

(tbd)

dateTimeRecordingEnd

a

iso_fmt

O

date and time of last sample of the time signal of the measurement or simulation. (May disappear in future versions.)

(tbd)

projectNo

a

int32

A

projectnumber

"80220"

projectSubNo

i

int32

NS

subnumber

"368"

programNo

a

int32

A

test programme number

1

source

a

utf8

A

name of the source application or facilityWithin one project each 'programmeNo' is a fixed combination with the 'source'.

"SMB", "ReFresco"

categoryNo

a

int32

A

Number of the test category used.

2

testNo

a

int32

A

Number of the test setup used.

3

experimentNo

a

int32

A

Number of the experiment settings used.

4

measurementNo

a

int32

A

Number of the actual measurement c.q. experiment execution

2

modelScale

a

float64

A

Scale of the model in this project. (Unrelated to the signal values in the set.)

23.456

notes

i

utf8

NS

Any additional information about this signal set.

(free text)

<ac:structured-macro ac:name="unmigrated-wiki-markup" ac:schema-version="1" ac:macro-id="d0e23a65-0655-42b8-9858-d2b115df80f5"><ac:plain-text-body><![CDATA[

writeErrors

i

utf8[]

O

Specifies any errors that occurred while writing the signal set (group).

 

]]></ac:plain-text-body></ac:structured-macro>

...

A

name of the unit of the signal. This defines the quantity of the experiment variable.

“m/s”, “-“, “rad”

signalType

a

prop_str

NS

defines the experiment variable or simulation property or the kind of quantity.

“velocity”, etc

“angular velocity”

description

i

utf8

A

description of the signal. More detail than name. E.g. the kind of quantity or experiment variable; whether it is an absolute value or a delta.

(tbd)

timeOffset

a

float64

NS

the time offset in seconds between real start time and the moment the timestamp is created.

 

order

a

int32

O

In case of frequency data specifies whether it is a first or second order effect or otherwise.

1: first order.

2: second order.

-1: Not applicable.

position

a

float64[3]

NS

location of the variable in the specified Coordinate System.

[0.0, 0.0, 0.0]

direction

a

float64[3]

NS

direction of the signal c.q. experiment variable in the specified CS. In case of a rotation it is the direction of the axis of rotation.

Roll in ACK:

[1.0,0.0,0.0]

Sway in ACK:

[0.0, 1.0, 0.0]

referenceSystem

a

utf8

NS

specifies in which Coordinate Sytem position and direction are given.

“ACK”, (todo: complete list)

signalSource

I

utf8

O

holds information about the sensor.

Depends on system used.
Value might be interpretable by a tool corresponding to the data source.

Also contains the original signal name from the measurement system.

(tbd)

channelNo

i

int32

O

if no sensor information is provided holds the channel number in case of measured data.

123

bases

s

obj_ref[]

O

List of object references to all datasets that are a master signal to this signal.

[objRef(“time”)]

baseNames

s

utf8[]

O

Names of the base signals (informational only. Not intended for rebuilding the datamodel)

 

notes

i

utf8

A

Any additional information about this signal.

(free text)

The signal is a dataset containing the samples of the measurement or simulation or calculation or postprocessing step. A signal maps to an experiment variable.
Name: HDF Dataset name; must be unique in the set. It is the HDF-safe name of the signal. For reference the original potentially HDF-unsafe name is provided with the data in the 'signalSource' attribute.
Path: "/<set name>/<signal name>"
Additional properties of the signal are added as attributes. Below are the common attributes. In sub sections domain or source specific attributes can be found.
Attibutes:

Name

Role

Data type

Exists?A / O / NS

Description

Convention / example

rawName

i

utf8

O

original signal name if this had to be reformatted to be used as hdf5 name.

 

unit

a

unit_str

A

name of the unit of the signal. This defines the quantity of the experiment variable.

"m/s", "-", "rad"

signalType

a

prop_str

NS

defines the experiment variable or simulation property or the kind of quantity.

"velocity", etc
"angular velocity"

description

i

utf8

A

description of the signal. More detail than name. E.g. the kind of quantity or experiment variable; whether it is an absolute value or a delta.

(tbd)

timeOffset

a

float64

NS

the time offset in seconds between real start time and the moment the timestamp is created.

 

order

a

int32

O

In case of frequency data specifies whether it is a first or second order effect or otherwise.

1: first order.
2: second order.
-1: Not applicable.

<ac:structured-macro ac:name="unmigrated-wiki-markup" ac:schema-version="1" ac:macro-id="648e0998-c54e-4ae0-9053-49efa0c4786c"><ac:plain-text-body><![CDATA[

position

a

float64[3]

NS

location of the variable in the specified Coordinate System.

[0.0, 0.0, 0.0]

]]></ac:plain-text-body></ac:structured-macro>

<ac:structured-macro ac:name="unmigrated-wiki-markup" ac:schema-version="1" ac:macro-id="ae0a7916-d1c0-4e29-97b5-a78fbbd4c2a3"><ac:plain-text-body><![CDATA[

direction

a

float64[3]

NS

direction of the signal c.q. experiment variable in the specified CS. In case of a rotation it is the direction of the axis of rotation.

Roll in ACK:
]]></ac:plain-text-body></ac:structured-macro>
<ac:structured-macro ac:name="unmigrated-wiki-markup" ac:schema-version="1" ac:macro-id="2827350b-47f1-4855-b7b8-e84ae7f74de4"><ac:plain-text-body><![CDATA[[1.0,0.0,0.0]
]]></ac:plain-text-body></ac:structured-macro>
Sway in ACK:
<ac:structured-macro ac:name="unmigrated-wiki-markup" ac:schema-version="1" ac:macro-id="03f09939-db97-45b3-9ac2-3e468761049a"><ac:plain-text-body><![CDATA[[0.0, 1.0, 0.0]

]]></ac:plain-text-body></ac:structured-macro>

referenceSystem

a

utf8

NS

specifies in which Coordinate Sytem position and direction are given.

"ACK", (todo: complete list)

signalSource

I

utf8

O

holds information about the sensor.
Depends on system used.Value might be interpretable by a tool corresponding to the data source.
Also contains the original signal name from the measurement system.

(tbd)

channelNo

i

int32

O

if no sensor information is provided holds the channel number in case of measured data.

123

<ac:structured-macro ac:name="unmigrated-wiki-markup" ac:schema-version="1" ac:macro-id="ceb434b5-4b1e-4eda-aa46-f4212bcc231f"><ac:plain-text-body><![CDATA[

bases

s

obj_ref[]

O

List of object references to all datasets that are a master signal to this signal.

[objRef("time")]

]]></ac:plain-text-body></ac:structured-macro>

<ac:structured-macro ac:name="unmigrated-wiki-markup" ac:schema-version="1" ac:macro-id="81be4bb2-7b30-4c6b-9c6f-a73dc4a9e67d"><ac:plain-text-body><![CDATA[

baseNames

s

utf8[]

O

Names of the base signals (informational only. Not intended for rebuilding the datamodel)

 

]]></ac:plain-text-body></ac:structured-macro>

notes

i

utf8

A

Any additional information about this signal.

(free text)

<ac:structured-macro ac:name="unmigrated-wiki-markup" ac:schema-version="1" ac:macro-id="92f5282e-f8ae-44a7-9569-87a5283e0044"><ac:plain-text-body><![CDATA[

writeErrors

i

utf8[]

O

Specifies any errors that occurred while writing the signal (dataset). Mostly value errors.

 ]]></ac:plain-text-body></ac:structured-macro>

 

Note: There is no explicit sample rate property. Sample rate is a specific attribute of equidistant time based signals. If sample rate needs to be visible it can be specified in the attribute 'description' of the signal set group.

...