Data Items
Data
- class dynamicdl.data.dataentry.DataEntry(items: list[DataItem] | DataItem)[source]
Bases:
object
Contains all items required for an entry in the dataset, a collection of DataItem objects. Most use is handled by internal merging processes, and is not to be instantiated by users.
- Parameters:
items (list[DataItem] | DataItem) – A (list of) data items which are to be batched together
- class dynamicdl.data.dataitem.DataItem(delimiter: DataType, value: Any)[source]
Bases:
object
The DataItem class represents a value associated with a particular DataType. DataItem objects are regularly handled and created by internal processes, but can be used in instantiating Static variables with certain values.
Example: my_static = Static(‘my_image_set_name’, DataItem(DataTypes.IMAGE_SET_NAME), ‘my_set’) The above example creates a static which contains the value my_set as an image set name for its hierarchical children to inherit.
- Parameters:
delimiter (DataType) – The type of the DataItem.
value (Any) – The value associated with the DataType, must be compatible.
- class dynamicdl.data.datatype.DataType(desc: str, token_type: Token, doc: str | None = None)[source]
Bases:
object
DataType is a container class for storing relevant dataset items. Token type options can be found in the tokens module. Warning: DataType instantiates are persistent through program execution, and can be accessed at the static dict DataType.types.
- Parameters:
desc (str) – The purpose of the DataType. This should be unique for every new object.
token_type (Token) – The token type of the DataType.
- class dynamicdl.data.datatypes.DataTypes[source]
Bases:
object
The DataTypes class contains static presets for DataType types. Below is a description of all presets currently available:
- ABSOLUTE_FILE = Represents the **absolute** filepath of an entry image only. This DataType is automatically generated in `Image` and `File` type objects when parsing, but can also be used to parse data. All valid values under `ABSOLUTE_FILE` must be a valid filepath on the user's filesystem. `RELATIVE_FILE` is currently not supported, but may be in future versions. [GENERAL]
- ABSOLUTE_FILE_SEG = Represents the **absolute** filepath of an entry segmentation mask only. This DataType is also automatically generated in `Image` and `File` type objects when parsing, but can also be used to parse data. All valid values under `ABSOLUTE_FILE` must be a valid filepath on the user's filesystem. `RELATIVE_FILE_SEG` is currently not supported, but may be in future versions. [GENERAL]
- BBOX_CLASS_ID = The ID (parsed to int) complement for `BBOX_CLASS_NAME`. Behaves just like its complement. [DETECTION]
- BBOX_CLASS_NAME = Represents the detection class name of an image entry. There can be multiple classes per image entry, and accepts parsed strings. Its ID complement can be found under `BBOX_CLASS_ID`. Each detection class must have a one-to-one correspondence to a valid bounding box when in the same hierarchy. When in different hierarchies it, just like other redundant types, will expand naturally to fit the existing length. [DETECTION]
- CLASS_ID = The ID (parsed to int) complement for `CLASS_NAME`. Behaves just like its complement. [CLASSIFICATION]
- CLASS_NAME = Represents the classification class name of an image entry. There can only be one class per image entry, and accepts parsed strings. Its ID complement can be found under `CLASS_ID`. [CLASSIFICATION]
- GENERIC = A generic token with no significance that can be used as a wildcard token for parsing. Can represent anything, and any type. [GENERAL]
- GENERIC_INT = Same as `GENERIC`, except accepts only integer types. [GENERAL]
- GENERIC_QUANTITY = Same as `GENERIC`, except accepts only numeric types (i.e. float and int). [GENERAL]
- GENERIC_WORD = Same as `GENERIC`, except accepts only one word, i.e. no spaces allowed. [GENERAL]
- HEIGHT = The height of the bounding box. Must be accompanied with `WIDTH` or else has no effect. Can be used as an alternative to defining `YMAX` or `YMIN`. [DETECTION]
- IMAGE_ID = The ID (parsed to int) complement for `IMAGE_NAME`. Behaves just like its complement. [GENERAL]
- IMAGE_NAME = Represents an identifier token for image entries via a string description. As of 0.1.1-alpha all `IMAGE_NAME` entries must be unique as it serves as a sole identifier for image entries. Accepts parsed strings. Its ID complement can be found under `IMAGE_ID`. [GENERAL]
- IMAGE_SET_ID = Represents the ID of an image set. This includes any valid integers. The named complement of this DataType is `IMAGE_SET_NAME`. See above for details. [GENERAL]
- IMAGE_SET_NAME = Represents the name of an image set. This includes any valid strings, but is not meant to store the ID of the image set; see `IMAGE_SET_ID`. Image sets are used to allocate specific entries to a group which can be split when dataloading. Most commonly, image set names will be `train`, `val`, or `test`. [GENERAL]
- POLYGON = Should not be instantiated by the user as there is no way to parse it. However, it is automatically created upon every `SegmentationObject` wrapper of `X` and `Y` objects. This DataType is used internally for parsing. [SEGMENTATION]
- SEG_CLASS_ID = The ID (parsed to int) complement for `SEG_CLASS_NAME`. Behaves just like its complement. [SEGMENTATION]
- SEG_CLASS_NAME = Represents the segmentation class name of an image entry. There can be multiple classes per image entry, and accepts parsed strings. Its ID complement can be found under `SEG_CLASS_ID`. Each detection class must have a one-to-one correspondence to a valid bounding box when in the same hierarchy. When in different hierarchies it, just like other redundant types, will expand naturally to fit the existing length. [SEGMENTATION]
- WIDTH = The width of the bounding box. Must be accompanied with `HEIGHT` or else has no effect. Can be used as an alternative to defining `XMAX` or `XMIN`. [DETECTION]
- X = A segmentation polygon x-coordinate. Used to define the vertices of a polygon for segmentation tasks. Each `X` coordinate must be paired with a corresponding `Y` coordinate to form a valid vertex. [SEGMENTATION]
- X1 = A bounding box x-coordinate. Can be in any order as long as it forms a valid bounding box with `X2`, `Y1`, and `Y2`. [DETECTION]
- X2 = A bounding box x-coordinate. Can be in any order as long as it forms a valid bounding box with `X1`, `Y1`, and `Y2`. [DETECTION]
- XMAX = The maximum x-coordinate in the bounding box. Must be accompanied with `YMAX` or else has no effect, and must be accompanied either with `XMIN` or `WIDTH` and their y-counterparts. [DETECTION]
- XMID = The midpoint x-coordinate in the bounding box. Used to denote the vertical center of the bounding box. Must be accompanied with `YMID` to define a central point, and with either `XMIN` or `XMAX` to fill the bounding box. [DETECTION]
- XMIN = The minimum x-coordinate in the bounding box. Must be accompanied with `YMIN` or else has no effect, and must be accompanied either with `XMAX` or `WIDTH` and their y-counterparts. [DETECTION]
- Y = A segmentation polygon y-coordinate. Used to define the vertices of a polygon for segmentation tasks. Each `Y` coordinate must be paired with a corresponding `X` coordinate to form a valid vertex. [SEGMENTATION]
- Y1 = A bounding box y-coordinate. Can be in any order as long as it forms a valid bounding box with `X1`, `X2`, and `Y2`. [DETECTION]
- Y2 = A bounding box y-coordinate. Can be in any order as long as it forms a valid bounding box with `X1`, `X2`, and `Y1`. [DETECTION]
- YMAX = The maximum y-coordinate in the bounding box. Must be accompanied with `XMAX` or else has no effect, and must be accompanied either with `YMIN` or `HEIGHT` and their y-counterparts. [DETECTION]
- YMID = The midpoint y-coordinate in the bounding box. Used to denote the vertical center of the bounding box. Must be accompanied with `XMID` to define a central point, and with either `YMIN` or `YMAX` to fill the bounding box. [DETECTION]
- YMIN = The minimum y-coordinate in the bounding box. Must be accompanied with `XMIN` or else has no effect, and must be accompanied either with `YMAX` or `HEIGHT` and their y-counterparts. [DETECTION]
- class dynamicdl.data.partialtype.ComboType(to: DataType, constructor: str, *datatypes: PartialType, preserve_all: bool = False)[source]
Bases:
DataType
The ComboType class is used to create datatypes which comprise of another type. This is especially useful, for example, when one wishes to use an IMAGE_ID-like parameter but it is not distinct throughout the dataset. Instead, suppose that the combination of CLASS_NAME and IMAGE_ID forms some unique IMAGE_NAME. PartialType and its wrapper ComboType achieves this functionality. See the PartialType class for details.
- Parameters:
to (DataType) – The DataType to convert the fully initialized PartialType collection to.
constructor (str) – The structure to apply when converting to the DataType. Each PartialType section should be replaced with a wildcard {} with the order presented in datatypes.
datatypes (PartialType) – The PartialTypes for which to make up the ComboType.
preserve_all (bool) – Preserves the data for each PartialType in the dataframe; otherwise, constructing the ComboType will result in popping all PartialType data.
- class dynamicdl.data.partialtype.PartialType(desc: str, token_type: Token, doc: str | None = None)[source]
Bases:
DataType
The PartialType class is used to create datatypes which comprise of another type. This is especially useful, for example, when one wishes to use an IMAGE_ID-like parameter but it is not distinct throughout the dataset. Instead, suppose that the combination of CLASS_NAME and IMAGE_ID forms some unique IMAGE_NAME. PartialType and its wrapper ComboType achieves this functionality.
Example:
my_id_type = DataType('my_id', IDToken()) my_image_name = ComboType( DataTypes.IMAGE_NAME, '{}_{}', DataTypes.CLASS_NAME, my_id_type, preserve_all = True ) # ... other code # we can now place `my_id_type` and `DataTypes.CLASS_NAME` in our form when parsing the # dataset, and when they are found together they will automatically parse into # DataTypes.IMAGE_NAME!
Now, every IMAGE_NAME datatype will be constructed from the template {CLASS_NAME}_{ID} as we have specified. This no longer creates merge conflicts! Note that we created a new ID type that is not exactly IMAGE_ID, as IMAGE_ID is a unique token which should not be merged.
- Parameters:
desc (str) – The purpose of the DataType. This should be unique for every new object.
token_type (Token) – The token type of the DataType.
Tokens
- class dynamicdl.data.tokens.FilenameToken[source]
Bases:
UniqueToken
The FilenameToken class is a Token which checks for valid absolute filenames.
- class dynamicdl.data.tokens.IDToken[source]
Bases:
Token
Represents an ID. Items must be integers.
- class dynamicdl.data.tokens.QuantityToken[source]
Bases:
Token
Represents a numeric quantity. Can be int or float.
- class dynamicdl.data.tokens.RedundantIDToken[source]
Bases:
IDToken
,RedundantToken
Represents a redundant ID.
- class dynamicdl.data.tokens.RedundantObjectToken[source]
Bases:
RedundantToken
Represents a segmentation object.
- class dynamicdl.data.tokens.RedundantQuantityToken[source]
Bases:
QuantityToken
,RedundantToken
Represents a redundant numeric quantity.
- class dynamicdl.data.tokens.RedundantToken[source]
Bases:
Token
RedundantToken items are used for when a data item stores multiple values of itself per image or unique item. Cases like these include multiple bounding boxes or segmentation objects.
- class dynamicdl.data.tokens.Token[source]
Bases:
object
The Token class is the base class which carries important information into Data objects for data parsing functions. Subclasses of this class may have specific requirements for content.
All implementations of the Token class should not be static but also not use self, for compatibility reasons (may be changed in the future)
- class dynamicdl.data.tokens.UniqueIDToken[source]
Bases:
IDToken
,UniqueToken
Represents a unique ID.
- class dynamicdl.data.tokens.UniqueToken[source]
Bases:
Token
UniqueToken items are used when an identifier is a unique item pertaining to any property of an image or entry. Unique tokens serve as valid IDs for identifying each data entry in the dataset.
- class dynamicdl.data.tokens.WildcardIntToken[source]
Bases:
IDToken
,WildcardToken
Wildcards for only integers.
- class dynamicdl.data.tokens.WildcardQuantityToken[source]
Bases:
QuantityToken
,WildcardToken
Wildcards for only quantities.
- class dynamicdl.data.tokens.WildcardToken[source]
Bases:
Token
The WildcardToken class represents a generic wildcard which can stand for anything and will not be used for any identifiers. The key difference is that these tokens are not affected by merge operations.
- class dynamicdl.data.tokens.WildcardWordToken[source]
Bases:
WildcardToken
Disallows spaces in the wildcard.
Module contents
The data module handles low-level data interaction, providing tokens and data objects to aid with parsing and processing.
User classes:
DataTypes
DataItem