TVB standard h5 formats¶
TVB has standardized classes that hold data. We call them DataTypes and they form a simple ontology that models data in TVB.
In code DataTypes are traited classes meant for storing data that is not temporary.
We can store DataTypes on disk in TVB specific file formats. These formats contain h5 datasets and attributes that closely map to a DataType class.
The neotraits h5 API’s give you the tools to construct this mapping from DataType to h5 file.
Quick example¶
Volume is a traited class that holds permanent data, a DataType.
class Volume(HasTraits):
origin = NArray(dtype=float, label="Volume origin coordinates")
voxel_size = NArray(label="Voxel size")
voxel_unit = Attr(str, label="Voxel Measure Unit", default="mm")
We define how to store a Volume in h5 by creating a H5File class. Such classes are Serializers, they define a specific h5 file format. They are responsible with storing and reading from their format.
class VolumeH5(H5File):
def __init__(self, path):
super(VolumeH5, self).__init__(path)
self.origin = DataSet(Volume.origin, self)
self.voxel_size = DataSet(Volume.voxel_size, self)
self.voxel_unit = Scalar(Volume.voxel_unit, self)
Accessors¶
On initialization the VolumeH5 class creates data Accessors.
self.origin
is a DataSet Accessor.
It will know how to read and write a numpy float array to a h5 dataset named origin.
The string voxel unit is serialized by a Scalar
Accessor.
>>> vol_h5 = VolumeH5('vol.h5')
<VolumeH5("vol.h5")>
>>> vol_h5.origin.store(numpy.array([0.1, 0.2, 0.3])
>>> vol_h5.voxel_unit.store('mm')
>>> vol_h5.origin.load()
array([0.1, 0.2, 0.3])
An Accessor knows how to serialize a traited attribute to h5. Typically the traited attribute argument for an Accessor comes from a DataType as in the examples above.
You can store all Accessors at once from a DataType. Each defined Accessor will store it’s corresponding DataType attribute.
>>> vol = Volume(origin=numpy.array([0.1, 0.2, 0.3]), voxel_unit='mm')
>>> vol_h5 = VolumeH5('vol.h5')
>>> rm_h5.store(vol)
Independent Accessors¶
You can create a new Attr when creating the Accessor. This allows you to define h5 formats that do not map one to one with a DataType, or even create formats that have no corresponding DataType.
These are independent Accessors. They require a name argument. They are ignored by H5File.store and H5File.load_into as they are not connected to a datatype.
class IndependentH5(H5File):
def __init__(self, path):
super(IndependentH5, self).__init__(path)
# name is required if Attr does not come from a HasTraits class
self.scalar_int = Scalar(Attr(int), self, name='scalar_int')
self.array_float = DataSet(NArray(), self, name='floating_leaves')
DataSet¶
This Accessor writes to h5 datatypes. Like all Accessors it has load and store methods that read and write whole numpy arrays.
Along those methods it supports partial reads and stores. This is intended for large on disk data sets.
To read only a subset of a dataset use slicing:
>>> file = IndependentH5('test.h5')
>>> file.array_float[0, 20: 40]
You might want to append to a dataset, increasing it’s size. To do that you must tell which dimension is the flexible one. You can grow a dataset only along one dimension, the rest of the shape is fixed.
class StreamyH5(H5File):
def __init__(self, path):
super(StreamyH5, self).__init__(path)
self.array_int = DataSet(
NArray(dtype=int),
self,
expand_dimension=1
)
Then to append new data :
>>> file = StreamyH5('large.h5')
>>> file.array_int.append(numpy.eye(42, dtype=int))
References¶
Many times a DataType will contain references to other DataTypes. TVB h5 files will not recursively store these.
Instead we just record a unique identifier for those referenced DataTypes, and we store them to their own h5 files.
The abaz Reference
Accessor in FooFile records a UUID that points to
the h5 file that contains the serialized BazDataType:
class BazDataType(HasTraits):
scalar_str = Attr(str)
class FooDatatype(HasTraits):
abaz = Attr(field_type=BazDataType)
class BazFile(H5File):
def __init__(self, path):
super(BazFile, self).__init__(path)
self.scalar_str = Scalar(BazDataType.scalar_str, self)
class FooFile(H5File):
def __init__(self, path):
super(FooFile, self).__init__(path)
self.abaz = Reference(FooDatatype.abaz, self)
Note
Serializing object graphs is not the job of this API’s. Instead they focus on defining a clear h5 file format and to read and store to that format only, not on formats of the dependent DataTypes.
Inheritance¶
A H5File can inherit another one. Just make sure you call super.__init__ to retain the superclass Accessors. In the resulting h5 file the inheritance hierarchy if flattened.
Reference¶
- class tvb.core.neotraits.h5.H5File(path: str)[source]¶
A H5 based file format. This class implements reading and writing to a specific h5 based file format. A subclass of this defines a new file format.
- KEY_WRITTEN_BY = 'written_by'¶
- is_new_file = False¶
- load_generic_attributes() GenericAttributes [source]¶
- store(datatype: HasTraits, scalars_only: bool = False, store_references: bool = True) None [source]¶
- store_generic_attributes(generic_attributes: GenericAttributes, create: bool = True) None [source]¶
- class tvb.core.neotraits.h5.DataSet(trait_attribute: NArray, h5file: H5File, name: str = None, expand_dimension: int = -1)[source]¶
A dataset in a h5 file that corresponds to a traited NArray.
- append(data: ndarray, close_file: bool = True, grow_dimension: int | None = None) None [source]¶
Method to be called when it is necessary to write slices of data for a large dataset, eg. TimeSeries. Metdata for such datasets is written only at file close time, see H5File.close method.
- get_cached_metadata()[source]¶
Returns cached properties of this dataset, like min max mean etc. This cache is useful for large, expanding datasets, when we want to avoid loading the whole dataset just to compute a max.
- property shape: Tuple[int]¶
- class tvb.core.neotraits.h5.Scalar(trait_attribute: Attr, h5file: H5File, name: str = None)[source]¶
A scalar in a h5 file that corresponds to a traited attribute. Serialized as a global h5 attribute
- class tvb.core.neotraits.h5.Reference(trait_attribute: Attr, h5file: H5File, name: str = None)[source]¶
A reference to another h5 file Corresponds to a contained datatype
- load() UUID ¶
- class tvb.core.neotraits.h5.Json(trait_attribute, h5file, name=None, json_encoder=None, json_decoder=None)[source]¶
A python json like data structure accessor This works with simple Attr(list) Attr(dict) List(of=…)