TVB standard h5 formats

TVB has standardized classes that hold data. We call them DataTypes and they form a simple ontology that models data in TVB.

In code DataTypes are traited classes meant for storing data that is not temporary.

We can store DataTypes on disk in TVB specific file formats. These formats contain h5 datasets and attributes that closely map to a DataType class.

The neotraits h5 API’s give you the tools to construct this mapping from DataType to h5 file.

Quick example

Volume is a traited class that holds permanent data, a DataType.

class Volume(HasTraits):
    origin = NArray(dtype=float, label="Volume origin coordinates")
    voxel_size = NArray(label="Voxel size")
    voxel_unit = Attr(str, label="Voxel Measure Unit", default="mm")

We define how to store a Volume in h5 by creating a H5File class. Such classes are Serializers, they define a specific h5 file format. They are responsible with storing and reading from their format.

class VolumeH5(H5File):

    def __init__(self, path):
        super(VolumeH5, self).__init__(path)
        self.origin = DataSet(Volume.origin, self)
        self.voxel_size = DataSet(Volume.voxel_size, self)
        self.voxel_unit = Scalar(Volume.voxel_unit, self)

Accessors

On initialization the VolumeH5 class creates data Accessors. self.origin is a DataSet Accessor. It will know how to read and write a numpy float array to a h5 dataset named origin. The string voxel unit is serialized by a Scalar Accessor.

>>> vol_h5 = VolumeH5('vol.h5')
<VolumeH5("vol.h5")>
>>> vol_h5.origin.store(numpy.array([0.1, 0.2, 0.3])
>>> vol_h5.voxel_unit.store('mm')
>>> vol_h5.origin.load()
array([0.1, 0.2, 0.3])

An Accessor knows how to serialize a traited attribute to h5. Typically the traited attribute argument for an Accessor comes from a DataType as in the examples above.

You can store all Accessors at once from a DataType. Each defined Accessor will store it’s corresponding DataType attribute.

>>> vol = Volume(origin=numpy.array([0.1, 0.2, 0.3]), voxel_unit='mm')
>>> vol_h5 = VolumeH5('vol.h5')
>>> rm_h5.store(vol)

Independent Accessors

You can create a new Attr when creating the Accessor. This allows you to define h5 formats that do not map one to one with a DataType, or even create formats that have no corresponding DataType.

These are independent Accessors. They require a name argument. They are ignored by H5File.store and H5File.load_into as they are not connected to a datatype.

class IndependentH5(H5File):
    def __init__(self, path):
        super(IndependentH5, self).__init__(path)
        # name is required if Attr does not come from a HasTraits class
        self.scalar_int = Scalar(Attr(int), self, name='scalar_int')
        self.array_float = DataSet(NArray(), self, name='floating_leaves')

DataSet

This Accessor writes to h5 datatypes. Like all Accessors it has load and store methods that read and write whole numpy arrays.

Along those methods it supports partial reads and stores. This is intended for large on disk data sets.

To read only a subset of a dataset use slicing:

>>> file = IndependentH5('test.h5')
>>> file.array_float[0, 20: 40]

You might want to append to a dataset, increasing it’s size. To do that you must tell which dimension is the flexible one. You can grow a dataset only along one dimension, the rest of the shape is fixed.

class StreamyH5(H5File):
    def __init__(self, path):
        super(StreamyH5, self).__init__(path)
        self.array_int = DataSet(
            NArray(dtype=int),
            self,
            expand_dimension=1
        )

Then to append new data :

>>> file = StreamyH5('large.h5')
>>> file.array_int.append(numpy.eye(42, dtype=int))

References

Many times a DataType will contain references to other DataTypes. TVB h5 files will not recursively store these.

Instead we just record a unique identifier for those referenced DataTypes, and we store them to their own h5 files.

The abaz Reference Accessor in FooFile records a UUID that points to the h5 file that contains the serialized BazDataType:

class BazDataType(HasTraits):
    scalar_str = Attr(str)


class FooDatatype(HasTraits):
    abaz = Attr(field_type=BazDataType)


class BazFile(H5File):
    def __init__(self, path):
        super(BazFile, self).__init__(path)
        self.scalar_str = Scalar(BazDataType.scalar_str, self)


class FooFile(H5File):
    def __init__(self, path):
        super(FooFile, self).__init__(path)
        self.abaz = Reference(FooDatatype.abaz, self)

Note

Serializing object graphs is not the job of this API’s. Instead they focus on defining a clear h5 file format and to read and store to that format only, not on formats of the dependent DataTypes.

Inheritance

A H5File can inherit another one. Just make sure you call super.__init__ to retain the superclass Accessors. In the resulting h5 file the inheritance hierarchy if flattened.

Reference

class tvb.core.neotraits.h5.H5File(path: str)[source]

A H5 based file format. This class implements reading and writing to a specific h5 based file format. A subclass of this defines a new file format.

KEY_WRITTEN_BY = 'written_by'
close()[source]
determine_datatype_from_file()[source]
static determine_type(path: str) Type[HasTraits][source]
classmethod file_name_base()[source]
static from_file(path: str) H5File[source]
gather_references(datatype_cls=None)[source]
get_class_path()[source]
static get_metadata_param(path, param)[source]
static h5_class_from_file(path: str) Type[H5File][source]
is_new_file = False
iter_accessors() Generator[Accessor][source]
iter_datasets()[source]
load_generic_attributes() GenericAttributes[source]
load_into(datatype: HasTraits) None[source]
read_subtype_attr()[source]
store(datatype: HasTraits, scalars_only: bool = False, store_references: bool = True) None[source]
store_generic_attributes(generic_attributes: GenericAttributes, create: bool = True) None[source]
store_metadata_param(key, value)[source]
class tvb.core.neotraits.h5.DataSet(trait_attribute: NArray, h5file: H5File, name: str = None, expand_dimension: int = -1)[source]

A dataset in a h5 file that corresponds to a traited NArray.

append(data: ndarray, close_file: bool = True, grow_dimension: int | None = None) None[source]

Method to be called when it is necessary to write slices of data for a large dataset, eg. TimeSeries. Metdata for such datasets is written only at file close time, see H5File.close method.

get_cached_metadata()[source]

Returns cached properties of this dataset, like min max mean etc. This cache is useful for large, expanding datasets, when we want to avoid loading the whole dataset just to compute a max.

load() ndarray[source]
property shape: Tuple[int]
store(data: ndarray) None[source]
class tvb.core.neotraits.h5.Scalar(trait_attribute: Attr, h5file: H5File, name: str = None)[source]

A scalar in a h5 file that corresponds to a traited attribute. Serialized as a global h5 attribute

load() str | int | float[source]
store(val: str | int | float) None[source]
class tvb.core.neotraits.h5.Reference(trait_attribute: Attr, h5file: H5File, name: str = None)[source]

A reference to another h5 file Corresponds to a contained datatype

load() UUID
store(val: HasTraits) None[source]

The reference is stored as a gid in the metadata. :param val: a datatype or a uuid.UUID gid

class tvb.core.neotraits.h5.Json(trait_attribute, h5file, name=None, json_encoder=None, json_decoder=None)[source]

A python json like data structure accessor This works with simple Attr(list) Attr(dict) List(of=…)

load()[source]
store(val)[source]

stores a json in the h5 metadata

class tvb.core.neotraits.h5.SparseMatrix(trait_attribute: Attr, h5file: H5File, name: str = None)[source]

Stores and loads a scipy.sparse csc or csr matrix in h5.

constructors = {'csc': <class 'scipy.sparse._csc.csc_matrix'>, 'csr': <class 'scipy.sparse._csr.csr_matrix'>}
get_metadata()[source]
load()[source]
store(mtx: spmatrix) None[source]