Parsing Structures
- class dynamicdl.parsing.alias.Alias(generics: list[Generic | DataType])[source]
Bases:
object
Class used when a placeholder in Generic could be interpreted multiple ways. For example, if IMAGE_NAME also contains CLASS_NAME and IMAGE_ID, we can extract all 3 tokens out using Alias. Counts for a single wildcard token ({}) when supplied in Generic.
Example:
alias = Alias([ DataTypes.IMAGE_NAME, Generic('{}_{}', DataTypes.CLASS_NAME, DataTypes.IMAGE_ID) ])
Now we can use Generic(alias) as a valid generic and it will obtain all contained DataTypes.
- Parameters:
generics (list[Generic | DataType]) – The list of Generic type objects which can be used for alias parsing.
- Raises:
ValueError – There must be at least one item in the provided Generics
- match(entry: str) tuple[bool, list[DataItem]] [source]
Return a list of DataItems if matched successfully. Used for internal processing functions.
- Parameters:
entry (str) – The entry string to be matched to the alias pattern.
- Returns:
A boolean indicating success of the matching, and a list of the DataItems passed.
- Return type:
tuple[bool, list[DataItem]]
- class dynamicdl.parsing.ambiguouslist.AmbiguousList(form: GenericList | list | Any)[source]
Bases:
object
Ambiguous List. Used to represent when an item could either be in a list, or a solo item. This is primarily used for XML files.
Example:
<annotation> <box> <x1>1.0</x1> <x2>2.0</x2> <y1>3.0</x1> <y2>4.0</y2> </box> <annotation> <annotation> <box> <x1>1.0</x1> <x2>2.0</x2> <y1>3.0</x1> <y2>4.0</y2> </box> <box> <x1>5.0</x1> <x2>6.0</x2> <y1>7.0</x1> <y2>8.0</y2> </box> <annotation>
Observe that the above XML file contains potentially multiple box tags. When the XML parser encounters a tag, it is inferred to be a single tag such that for the first annotation, box is a dict value with keys x1, x2, y1, y2 but for the second annotation box is a list of dicts following the form previously. In this case we wish to use AmbiguousList to disambiguate the usage of the provided form with an XML file. AmbiguousList performs identically to GenericList for multiple objects, and is primarily separate in order to detect otherwise invisible errors with dataset parsing.
- Parameters:
form (GenericList | list | Any) – Essentially a wrapper for GenericList. Either can provide the args to instantiate a GenericList, or provide the GenericList object itself.
- expand(path: list[str], dataset: Any) dict[Static, Any] [source]
Expand potential list into dict of statics.
- Parameters:
dataset (Any) – The dataset data, which is either a single value or a list of values following some format.
- Returns:
The parsed expansion of Static values, always a list. Single values are converted to lists of length 1. Note: for consistency lists are converted to dicts with int keys.
- Return type:
dict[int, Any]
Generic type objects.
- class dynamicdl.parsing.generic.File(pattern: str | DataType | Alias, *data: DataType | Alias, ignore: list[str] | str | None = None, extensions: list[str] | str = '', disable_warnings: bool = False)[source]
Bases:
Generic
A subclass of Generic which extends Generic pattern matching but for valid files in the filesystem only. During parsing, File must be parsed as keys in the filestructure format. All behaviors are otherwise exactly alike. Also takes a list of valid extensions. In future versions, filetypes will be inferred from the corresponding value in the filestructure format.
- Parameters:
pattern (str | DataType | Alias) – The pattern with which to match to, containing wildcards of the {} format. It is assumed that the generic should be matched to the entire string. Regex expressions compatible with the re module are allowed except capture groups such as (.+), which will throw an error. If DataType or Alias is specified, data is overriden and has no effect.
data (DataType | Alias) – Tokens that correspond to data types which each {} matches to.
ignore (list[str] | str) – Values that match any item in ignore are not matched. Currently only supports str, in future versions will support Generic types.
extensions (list[str] | str) – Valid extensions to match to. This will be whatever is after the ., i.e. txt. Files without extensions are not allowed, but can be instead parsed as a Generic.
disable_warnings (bool) – Disables the warnings that incur when pattern includes . in it. This may be useful when the filenames do indeed include . without it being the ext.
- match(entry: str) tuple[bool, list[DataItem]] [source]
Return a list of the tokens’ string values provided an entry string which follows the pattern.
- Parameters:
entry (str) – The entry string to be matched to the generic pattern.
- Returns:
A boolean indicating success of the matching, and a list of the DataItems passed.
- Return type:
tuple[bool, list[DataItem]]
- class dynamicdl.parsing.generic.Folder(pattern: str | DataType | Alias, *data: DataType | Alias, ignore: list[str] | str | None = None)[source]
Bases:
Generic
A subclass of Generic which extends Generic pattern matching but for valid directories in the filesystem only. During parsing, Folder must be parsed as keys in the filestructure format. All behaviors are otherwise exactly alike.
- Parameters:
pattern (str | DataType | Alias) – The pattern with which to match to, containing wildcards of the {} format. It is assumed that the generic should be matched to the entire string. Regex expressions compatible with the re module are allowed except capture groups such as (.+), which will throw an error. If DataType or Alias is specified, data is overriden and has no effect.
data (DataType | Alias) – Tokens that correspond to data types which each {} matches to.
ignore (list[str] | str) – Values that match any item in ignore are not matched. Currently only supports str, in future versions will support Generic types.
- class dynamicdl.parsing.generic.Generic(pattern: str | DataType | Alias, *data: DataType | Alias, ignore: list[str] | str | None = None)[source]
Bases:
object
The Generic class is a basic building block for representing wildcard-optional data. It can be used anywhere in the DynamicDL dataset format and provides the structure needed to interpret data items and tokens.
Example:
# example 1 gen = Generic('{}_{}', DataTypes.IMAGE_SET_NAME, DataTypes.IMAGE_SET_ID) my_data_type = DataTypes.GENERIC # example 2 Generic('{}', my_data_type) # example 3 Generic(my_data_type) # example 4 my_data_type # example 5 Generic( '{}_{}', DataTypes.IMAGE_SET_NAME, DataTypes.IMAGE_SET_ID, ignore = [ 'invalid_line', '{}_invalidclasstype' ] )
Above, we can see that example 1 allows items of “*_*” to be interpreted, where the first wildcard is interpreted as image set name, and the latter as image set id. The Generic class also accepts DataType, which is meant to encapsulate the full wildcard; in other words, example 2, 3, 4 are functionally the same.
Generic also accepts a ignore kwarg parameter which is either a string or list of strings containing patterns where anything which matches will be ignored, accepting regex patterns and also using {} as a valid wildcard. This is illustrated in example 5.
- Parameters:
pattern (str | DataType | Alias) – The pattern with which to match to, containing wildcards of the {} format. It is assumed that the generic should be matched to the entire string. Regex expressions compatible with the re module are allowed except capture groups such as (.+), which will throw an error. If DataType or Alias is specified, data is overriden and has no effect.
data (DataType | Alias) – Tokens that correspond to data types which each {} matches to.
ignore (list[str] | str) – Values that match any item in ignore are not matched. Currently only supports str, in future versions will support Generic types.
- Raises:
LengthMismatchError – The length of the {} wildcards must match the number of DataType or Alias values provided in data.
ValueError – (.+) and (.*) regex groups cannot be present in the pattern; use {} with an associated DataType instead.
- match(entry: str) tuple[bool, list[DataItem]] [source]
Return a list of the tokens’ string values provided an entry string which follows the pattern.
- Parameters:
entry (str) – The entry string to be matched to the generic pattern.
- Returns:
A boolean indicating success of the matching, and a list of the DataItems passed.
- Return type:
tuple[bool, list[DataItem]]
- class dynamicdl.parsing.generic.ImageFile(pattern: str, *data: DataType | Alias, ignore: list[str] | str | None = None, extensions: list[str] | str | None = None, disable_warnings: bool = False)[source]
Bases:
File
A subclass of File which extends Generic pattern matching but for valid images in the filesystem only. During parsing, ImageFile must be parsed as keys in the filestructure format. All behaviors are otherwise exactly alike. Default image extensions are provided but can also be specified to restrict to a certain subset. In the future, this class may be deprecated to support automatic type inference.
- Parameters:
pattern (str | DataType | Alias) – The pattern with which to match to, containing wildcards of the {} format. It is assumed that the generic should be matched to the entire string. Regex expressions compatible with the re module are allowed except capture groups such as (.+), which will throw an error. If DataType or Alias is specified, data is overriden and has no effect.
data (DataType | Alias) – Tokens that correspond to data types which each {} matches to.
ignore (list[str] | str) – Values that match any item in ignore are not matched. Currently only supports str, in future versions will support Generic types.
extensions (list[str] | str) – Valid extensions to match to. This will be whatever is after the ., i.e. txt. Files without extensions are not allowed, but can be instead parsed as a Generic.
disable_warnings (bool) – Disables the warnings that incur when pattern includes . in it. This may be useful when the filenames do indeed include . without it being the ext.
- class dynamicdl.parsing.genericlist.GenericList(form: list[Any] | Any)[source]
Bases:
object
Generic list item. Items inside the list are expected to repeat mod len(form).
Example:
{ "bounding_box": [ 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0 ] }
Suppose that we wish to parse the bounding boxes for this particular json file. Let each value represent X1, Y1, X2, Y2 as needed. Then we can parse the form as follows:
form = { "bounding_box": [ DataTypes.X1, DataTypes.Y1, DataTypes.X2, DataTypes.Y2 ] }
Suppose the format was changed to x, y pairs:
{ "bounding_box": [ [1.0, 2.0], [3.0, 4.0], [5.0, 6.0], [7.0, 8.0] ] }
Its corresponding form:
form = { "bounding_box": [ [DataTypes.X1, DataTypes.Y1], [DataTypes.X2, DataTypes.Y2] ] }
During parsing, the standard python list is always inferred to be a GenericList. When the list items are 1:1, GenericList parses properly regardless.
- Parameters:
form (list[Any] | Any) – The form to stick to. Each entry in form must be some valid generic-like form, and all items inside the form list will be combined into one object upon parsing. Further lines in the list are expected to conform to the same scheme as the first entry.
- class dynamicdl.parsing.impliedlist.ImpliedList(form: GenericList | list | Any, indexer: DataType, start: int = 0)[source]
Bases:
object
The ImpliedList class is meant to pair its objects with their associated index. Implied list, meant especially for pairing object. Allows the index of each item to be associated with that object. This is especially useful for datasets such as the YOLOv8 set, where list of class names are provided with their index inferred to be the class id.
- expand(path: list[str], dataset: list) dict[Static, Any] [source]
Expand implied list into dict of statics.
- Parameters:
dataset (list) – The dataset data, which must be a list of values following some format.
- Returns:
The parsed expansion of Static values, always a list. Single values are converted to lists of length 1. Note: for consistency lists are converted to dicts with int keys.
- Return type:
dict[int, Any]
- class dynamicdl.parsing.namespace.Namespace(*names: str | Static | Generic)[source]
Bases:
object
The Namespace class functions as a collection of str, Static, and Generic objects which can all be viable values in some given text.
- Parameters:
names (str | Static | Generic) – Arguments to be provided which are valid str/Static/Generic objects that are all viable in the same key type.
- match(entry: str)[source]
Return a list of the tokens’ string values provided an entry string which follows the pattern.
- Parameters:
entry (str) – The entry string to be matched to the namespace patterns.
- Returns:
A boolean indicating success of the matching, and a list of the DataItems passed.
- Return type:
tuple[bool, list[DataItem]]
- class dynamicdl.parsing.pairing.Pairing(form: Any, *paired: DataType)[source]
Bases:
object
Pairing is a wrapper class used to specify when two or more nonunique datatypes should be associated together. Most commonly used to pair ID and name together.
- Parameters:
form (Any) – Whatever follows the DynamicData specified form as required. Pairing is a wrapper class so let it behave as it should.
paired (DataType) – Items which should be associated together.
- find_pairings(path: str | list[str], dataset: Any, pbar: tqdm | None = None, curr_path: list[str] | None = None, in_file: bool = True, depth: int = 0) None [source]
Similar to other processes’ expand function. Finds the pairing values and stores the data internally.
- Parameters:
dataset (Any) – The dataset data, which should follow the syntax of DynamicData data.
in_file (bool) – Distinguisher to check usage of either expand_generics or expand_file_generics.
- class dynamicdl.parsing.segmentationobject.SegmentationObject(form: GenericList | list)[source]
Bases:
object
Object to represent a collection of polygonal coordinates for segmentation. Functionally serves the purpose of being a wrapper class for GenericList and should be instantiated when the only contents inside are DataTypes.X and DataTypes.Y items as well as non-data items. This class therefore provides a way to bundle together POLYGON data types with variable length points for handling thereafter.
- Parameters:
form (GenericList | list) – Either a GenericList object or a list which will create a GL.
- expand(path: list[str], dataset: list[Any]) tuple[dict[Static, Any], list] [source]
Evaluate object by expanding and merging, and extracting the corresponding X, Y values which define the SegmentationObject.
- Parameters:
dataset (list[Any]) – The dataset data, which should follow the syntax of DynamicData data.
- class dynamicdl.parsing.static.Static(name: str, data: list[DataItem] | DataItem | DataType | None = None)[source]
Bases:
object
Represents an object with a static name. Can contain data in the form of DataItem objects.
Example:
# instantiate a static with specific data item Static('specific_name', DataItem(DataTypes.SOME_TYPE, 'some_other_specific_data')) # instantiate a static with the name inferred as a data type Static('specific_name_as_data', DataTypes.SOME_TYPE)
- Parameters:
- match(entry: str) tuple[bool, list[DataItem]] [source]
Return status and DataItem objects (optional) if matched successfully. Used for internal processing functions. The return values are to be consistent with internal processing by Generics.
- Parameters:
entry (str) – The entry string to be matched to the static pattern.
- Returns:
A boolean indicating success of the matching, and a list of the DataItems passed.
- Return type:
tuple[bool, list[DataItem]]
Module contents
The dynamicdl.parsing module handles the objects used for DynamicDL format creation. These objects are to be used in the form when constructing the DynamicDL loader.
Classes:
Static
Generic
Folder
File
ImageFile
Alias
Namespace
GenericList
SegmentationObject
AmbiguousList
ImpliedList
Pairing