Data Items

Data

class dynamicdl.data.dataentry.DataEntry(items: list[DataItem] | DataItem)[source]

Bases: object

Contains all items required for an entry in the dataset, a collection of DataItem objects. Most use is handled by internal merging processes, and is not to be instantiated by users.

Parameters:: items (list[DataItem] | DataItem) – A (list of) data items which are to be batched together

apply_tokens(items: Iterable[DataItem]) → None[source]

Apply new tokens to the item.

Parameters:: items (list[DataItem] | DataItem) – Additional items to associate with this data entry.

merge_inplace(other: Self) → None[source]

Merge two data entries together, storing it in this instance.

Parameters:: other (DataEntry) – The other data entry to merge into this instance.

class dynamicdl.data.dataitem.DataItem(delimiter: DataType, value: Any)[source]

Bases: object

The DataItem class represents a value associated with a particular DataType. DataItem objects are regularly handled and created by internal processes, but can be used in instantiating Static variables with certain values.

Example: my_static = Static(‘my_image_set_name’, DataItem(DataTypes.IMAGE_SET_NAME), ‘my_set’) The above example creates a static which contains the value my_set as an image set name for its hierarchical children to inherit.

Parameters:

delimiter (DataType) – The type of the DataItem.
value (Any) – The value associated with the DataType, must be compatible.

add(item: Self) → None[source]

Add an item to current data if it is redundant. Used by internal merging processes.

Parameters:: item (DataItem) – An item to add to itself.
Raises:: ValueError – Either self or item are not redundant.

classmethod copy(first: Self) → Self[source]

Shallow copy self’s data into new instance. Used by internal merging processes.

Parameters:: first (DataItem) – The item to copy.
Returns:: A shallow copy of the data item.
Return type:: DataItem

class dynamicdl.data.datatype.DataType(desc: str, token_type: Token, doc: str | None = None)[source]

Bases: object

DataType is a container class for storing relevant dataset items. Token type options can be found in the tokens module. Warning: DataType instantiates are persistent through program execution, and can be accessed at the static dict DataType.types.

Parameters:

desc (str) – The purpose of the DataType. This should be unique for every new object.
token_type (Token) – The token type of the DataType.

verify_token(value: str) → bool[source]

Verify that a given value is valid for the datatype. Calls on internal Token functions for validation.

value (str): the value to check if it is compatible with the DataType.

class dynamicdl.data.datatypes.DataTypes[source]

Bases: object

The DataTypes class contains static presets for DataType types. Below is a description of all presets currently available:

ABSOLUTE_FILE = Represents the **absolute** filepath of an entry image only. This DataType is automatically generated in `Image` and `File` type objects when parsing, but can also be used to parse data. All valid values under `ABSOLUTE_FILE` must be a valid filepath on the user's filesystem. `RELATIVE_FILE` is currently not supported, but may be in future versions. [GENERAL]

ABSOLUTE_FILE_SEG = Represents the **absolute** filepath of an entry segmentation mask only. This DataType is also automatically generated in `Image` and `File` type objects when parsing, but can also be used to parse data. All valid values under `ABSOLUTE_FILE` must be a valid filepath on the user's filesystem. `RELATIVE_FILE_SEG` is currently not supported, but may be in future versions. [GENERAL]

BBOX_CLASS_ID = The ID (parsed to int) complement for `BBOX_CLASS_NAME`. Behaves just like its complement. [DETECTION]

BBOX_CLASS_NAME = Represents the detection class name of an image entry. There can be multiple classes per image entry, and accepts parsed strings. Its ID complement can be found under `BBOX_CLASS_ID`. Each detection class must have a one-to-one correspondence to a valid bounding box when in the same hierarchy. When in different hierarchies it, just like other redundant types, will expand naturally to fit the existing length. [DETECTION]

CLASS_ID = The ID (parsed to int) complement for `CLASS_NAME`. Behaves just like its complement. [CLASSIFICATION]

CLASS_NAME = Represents the classification class name of an image entry. There can only be one class per image entry, and accepts parsed strings. Its ID complement can be found under `CLASS_ID`. [CLASSIFICATION]

GENERIC = A generic token with no significance that can be used as a wildcard token for parsing. Can represent anything, and any type. [GENERAL]

GENERIC_INT = Same as `GENERIC`, except accepts only integer types. [GENERAL]

GENERIC_QUANTITY = Same as `GENERIC`, except accepts only numeric types (i.e. float and int). [GENERAL]

GENERIC_WORD = Same as `GENERIC`, except accepts only one word, i.e. no spaces allowed. [GENERAL]

HEIGHT = The height of the bounding box. Must be accompanied with `WIDTH` or else has no effect. Can be used as an alternative to defining `YMAX` or `YMIN`. [DETECTION]

IMAGE_ID = The ID (parsed to int) complement for `IMAGE_NAME`. Behaves just like its complement. [GENERAL]

IMAGE_NAME = Represents an identifier token for image entries via a string description. As of 0.1.1-alpha all `IMAGE_NAME` entries must be unique as it serves as a sole identifier for image entries. Accepts parsed strings. Its ID complement can be found under `IMAGE_ID`. [GENERAL]

IMAGE_SET_ID = Represents the ID of an image set. This includes any valid integers. The named complement of this DataType is `IMAGE_SET_NAME`. See above for details. [GENERAL]

IMAGE_SET_NAME = Represents the name of an image set. This includes any valid strings, but is not meant to store the ID of the image set; see `IMAGE_SET_ID`. Image sets are used to allocate specific entries to a group which can be split when dataloading. Most commonly, image set names will be `train`, `val`, or `test`. [GENERAL]

POLYGON = Should not be instantiated by the user as there is no way to parse it. However, it is automatically created upon every `SegmentationObject` wrapper of `X` and `Y` objects. This DataType is used internally for parsing. [SEGMENTATION]

SEG_CLASS_ID = The ID (parsed to int) complement for `SEG_CLASS_NAME`. Behaves just like its complement. [SEGMENTATION]

SEG_CLASS_NAME = Represents the segmentation class name of an image entry. There can be multiple classes per image entry, and accepts parsed strings. Its ID complement can be found under `SEG_CLASS_ID`. Each detection class must have a one-to-one correspondence to a valid bounding box when in the same hierarchy. When in different hierarchies it, just like other redundant types, will expand naturally to fit the existing length. [SEGMENTATION]

WIDTH = The width of the bounding box. Must be accompanied with `HEIGHT` or else has no effect. Can be used as an alternative to defining `XMAX` or `XMIN`. [DETECTION]

X = A segmentation polygon x-coordinate. Used to define the vertices of a polygon for segmentation tasks. Each `X` coordinate must be paired with a corresponding `Y` coordinate to form a valid vertex. [SEGMENTATION]

X1 = A bounding box x-coordinate. Can be in any order as long as it forms a valid bounding box with `X2`, `Y1`, and `Y2`. [DETECTION]

X2 = A bounding box x-coordinate. Can be in any order as long as it forms a valid bounding box with `X1`, `Y1`, and `Y2`. [DETECTION]

XMAX = The maximum x-coordinate in the bounding box. Must be accompanied with `YMAX` or else has no effect, and must be accompanied either with `XMIN` or `WIDTH` and their y-counterparts. [DETECTION]

XMID = The midpoint x-coordinate in the bounding box. Used to denote the vertical center of the bounding box. Must be accompanied with `YMID` to define a central point, and with either `XMIN` or `XMAX` to fill the bounding box. [DETECTION]

XMIN = The minimum x-coordinate in the bounding box. Must be accompanied with `YMIN` or else has no effect, and must be accompanied either with `XMAX` or `WIDTH` and their y-counterparts. [DETECTION]

Y = A segmentation polygon y-coordinate. Used to define the vertices of a polygon for segmentation tasks. Each `Y` coordinate must be paired with a corresponding `X` coordinate to form a valid vertex. [SEGMENTATION]

Y1 = A bounding box y-coordinate. Can be in any order as long as it forms a valid bounding box with `X1`, `X2`, and `Y2`. [DETECTION]

Y2 = A bounding box y-coordinate. Can be in any order as long as it forms a valid bounding box with `X1`, `X2`, and `Y1`. [DETECTION]

YMAX = The maximum y-coordinate in the bounding box. Must be accompanied with `XMAX` or else has no effect, and must be accompanied either with `YMIN` or `HEIGHT` and their y-counterparts. [DETECTION]

YMID = The midpoint y-coordinate in the bounding box. Used to denote the vertical center of the bounding box. Must be accompanied with `XMID` to define a central point, and with either `YMIN` or `YMAX` to fill the bounding box. [DETECTION]

YMIN = The minimum y-coordinate in the bounding box. Must be accompanied with `XMIN` or else has no effect, and must be accompanied either with `YMAX` or `HEIGHT` and their y-counterparts. [DETECTION]

class dynamicdl.data.partialtype.ComboType(to: DataType, constructor: str, *datatypes: PartialType, preserve_all: bool = False)[source]

Bases: DataType

The ComboType class is used to create datatypes which comprise of another type. This is especially useful, for example, when one wishes to use an IMAGE_ID-like parameter but it is not distinct throughout the dataset. Instead, suppose that the combination of CLASS_NAME and IMAGE_ID forms some unique IMAGE_NAME. PartialType and its wrapper ComboType achieves this functionality. See the PartialType class for details.

Parameters:

to (DataType) – The DataType to convert the fully initialized PartialType collection to.
constructor (str) – The structure to apply when converting to the DataType. Each PartialType section should be replaced with a wildcard {} with the order presented in datatypes.
datatypes (PartialType) – The PartialTypes for which to make up the ComboType.
preserve_all (bool) – Preserves the data for each PartialType in the dataframe; otherwise, constructing the ComboType will result in popping all PartialType data.

construct(values: list[Any]) → Any[source]: Construct the full datatype value

class dynamicdl.data.partialtype.PartialType(desc: str, token_type: Token, doc: str | None = None)[source]

Bases: DataType

The PartialType class is used to create datatypes which comprise of another type. This is especially useful, for example, when one wishes to use an IMAGE_ID-like parameter but it is not distinct throughout the dataset. Instead, suppose that the combination of CLASS_NAME and IMAGE_ID forms some unique IMAGE_NAME. PartialType and its wrapper ComboType achieves this functionality.

Example:

my_id_type = DataType('my_id', IDToken())
my_image_name = ComboType(
    DataTypes.IMAGE_NAME,
    '{}_{}',
    DataTypes.CLASS_NAME,
    my_id_type,
    preserve_all = True
)
# ... other code
# we can now place `my_id_type` and `DataTypes.CLASS_NAME` in our form when parsing the
# dataset, and when they are found together they will automatically parse into
# DataTypes.IMAGE_NAME!

Now, every IMAGE_NAME datatype will be constructed from the template {CLASS_NAME}_{ID} as we have specified. This no longer creates merge conflicts! Note that we created a new ID type that is not exactly IMAGE_ID, as IMAGE_ID is a unique token which should not be merged.

Parameters:

desc (str) – The purpose of the DataType. This should be unique for every new object.
token_type (Token) – The token type of the DataType.

Tokens

class dynamicdl.data.tokens.FilenameToken[source]

Bases: UniqueToken

The FilenameToken class is a Token which checks for valid absolute filenames.

verify_token(token: Any) → bool[source]

Checks whether the token is in valid format in accordance with the identifier.

class dynamicdl.data.tokens.IDToken[source]

Bases: Token

Represents an ID. Items must be integers.

transform(token: str) → Any[source]

Transform the token from a string value to token type.

verify_token(token: Any) → bool[source]

Checks whether the token is in valid format in accordance with the identifier.

class dynamicdl.data.tokens.QuantityToken[source]

Bases: Token

Represents a numeric quantity. Can be int or float.

transform(token: str) → Any[source]

Transform the token from a string value to token type.

verify_token(token: Any) → bool[source]

Checks whether the token is in valid format in accordance with the identifier.

class dynamicdl.data.tokens.RedundantIDToken[source]

Bases: IDToken, RedundantToken

Represents a redundant ID.

transform(token: str) → Any[source]

Transform the token from a string value to token type.

verify_token(token: Any) → bool[source]

Checks whether the token is in valid format in accordance with the identifier.

class dynamicdl.data.tokens.RedundantObjectToken[source]

Bases: RedundantToken

Represents a segmentation object.

transform(token: list) → Any[source]

Transform the token from a string value to token type.

verify_token(token: Any) → bool[source]

Checks whether the token is in valid format in accordance with the identifier.

class dynamicdl.data.tokens.RedundantQuantityToken[source]

Bases: QuantityToken, RedundantToken

Represents a redundant numeric quantity.

transform(token: str) → Any[source]

Transform the token from a string value to token type.

verify_token(token: Any) → bool[source]

Checks whether the token is in valid format in accordance with the identifier.

class dynamicdl.data.tokens.RedundantToken[source]

Bases: Token

RedundantToken items are used for when a data item stores multiple values of itself per image or unique item. Cases like these include multiple bounding boxes or segmentation objects.

transform(token: str) → Any[source]

Transform the token from a string value to token type.

class dynamicdl.data.tokens.Token[source]

Bases: object

The Token class is the base class which carries important information into Data objects for data parsing functions. Subclasses of this class may have specific requirements for content.

All implementations of the Token class should not be static but also not use self, for compatibility reasons (may be changed in the future)

transform(token: Any) → Any[source]

Transform the token from a string value to token type.

verify_token(token: Any) → bool[source]

Checks whether the token is in valid format in accordance with the identifier.

class dynamicdl.data.tokens.UniqueIDToken[source]

Bases: IDToken, UniqueToken

Represents a unique ID.

class dynamicdl.data.tokens.UniqueToken[source]

Bases: Token

UniqueToken items are used when an identifier is a unique item pertaining to any property of an image or entry. Unique tokens serve as valid IDs for identifying each data entry in the dataset.

class dynamicdl.data.tokens.WildcardIntToken[source]

Bases: IDToken, WildcardToken

Wildcards for only integers.

class dynamicdl.data.tokens.WildcardQuantityToken[source]

Bases: QuantityToken, WildcardToken

Wildcards for only quantities.

class dynamicdl.data.tokens.WildcardToken[source]

Bases: Token

The WildcardToken class represents a generic wildcard which can stand for anything and will not be used for any identifiers. The key difference is that these tokens are not affected by merge operations.

class dynamicdl.data.tokens.WildcardWordToken[source]

Bases: WildcardToken

Disallows spaces in the wildcard.

verify_token(token: str) → bool[source]

Checks whether the token is in valid format in accordance with the identifier.

Module contents

The data module handles low-level data interaction, providing tokens and data objects to aid with parsing and processing.

User classes:

DataTypes
DataItem