File Processing

class dynamicdl.processing.jsonfile.JSONFile(form: dict[str | DataType | Static | Generic | Alias, Any] | list[Any])[source]

Bases: DataFile

The JSONFile class represents an annotation object and has the simplest conversion from the form to parsing. Data essentially follows the dict/list format in Python.

Example:

{
    "images": [
        {
            "id": 0,
            "file_name": "sample.jpg"
        }
    ],
    "categories": [
        {
            "id": 0,
            "name": "my_class"
        }
    ],
    "annotations": [
        {
            "image_id": 0,
            "category_id": 0,
            "bbox": [1.0, 2.0, 3.0, 4.0]
        }
    ]
}
JSONFile({
    'images': [{
        'id': DT.IMAGE_ID,
        'file_name': Generic('{}.jpg', DT.IMAGE_NAME)
    }],
    'categories': Pairing([{
        'id': DT.BBOX_CLASS_ID,
        'name': DT.BBOX_CLASS_NAME
    }], DT.BBOX_CLASS_ID, DT.BBOX_CLASS_NAME),
    'annotations': [{
        'image_id': DT.IMAGE_ID,
        'category_id': DT.BBOX_CLASS_ID,
        'bbox': [DT.XMIN, DT.YMIN, DT.WIDTH, DT.HEIGHT]
    }]
})

Notice how the JSONFile constructor matches exactly the style of the json data, denoting areas which can represent data items respectively.

Parameters:

form (dict[str | DataType | Static | Generic | Alias, Any] | list[Any]) – The form which matches the data to be read from JSONFile.

parse(path: str, curr_path: list[str]) dict[source]

Parses a file.

  • path (str): the path to the file.

class dynamicdl.processing.csvfile.CSVFile(form: Iterable[DataType | Static | Generic | Alias], header: bool = True)[source]

Bases: DataFile

Utility functions for parsing csv files.

Parameters:
  • form (Iterable[Union[DataType, Static, Generic, Alias]]) – A list of items which parses data, one for each column.

  • header (bool) – Whether a header row is included. If included, the row will be skipped by default. Default: True

parse(path: str, curr_path: list[str]) dict[source]

Parses a file.

  • path (str): the path to the file.

class dynamicdl.processing.txtfile.TXTFile(form: dict[str | DataType | Static | Generic | Alias, Any] | list[Any], ignore_type: list[Generic | str] | Generic | str | None = None)[source]

Bases: DataFile

The TXTFile class is an annotation object notator specifically for .txt file parsing. It also can parse anything that is represented in plaintext, i.e. with UTF-8 encoding. It takes a form similar to any nested dict structure, but it is also dangerous and should be noted that distinct lines must take distinct forms for differentiation and disambiguation.

An example of a txt file that we want to parse:

imageset1
class1
image1
1.0 2.0 3.0 4.0
5.0 6.0 7.0 8.0
image2
2.0 3.0 5.6 2.43
image3
5.4 12.4 543.2 12.3
2.0 3.0 5.6 2.44
2.0 3.0 5.6 2.46
2.0 3.0 5.6 2.48
class2
image4
32.54 21.4 32.43 12.23
image5
imageset2
class1
image6
32.54 21.4 32.43 12.256

classes
class1 abc
class2 def
class3 ghi

Observe that each line can be distinctly classified in a hierarchical sense. That is, each individual line can be attributed to a single purpose.

TXTFile({
    Generic('imageset{}', DT.IMAGE_SET_ID): {
        Generic('class{}', DT.CLASS_ID): {
            Generic('image{}', DT.IMAGE_ID): [
                Generic('{} {} {} {}', DT.X1, DT.X2, DT.Y1, DT.Y2)
            ]
        }
    },
    'classes': Pairing([
        Generic('class{} {}', DT.CLASS_ID, DT.CLASS_NAME)
    ], DT.CLASS_ID, DT.CLASS_NAME)
})

Notice the natural structure which is inherited. Each generic ends up distinct from each other, so the dataset is not ambiguous. A hierarchical structure would look as follows:

imageset1
    class1
        image1
            1.0 2.0 3.0 4.0
            5.0 6.0 7.0 8.0
        image2
            2.0 3.0 5.6 2.43
        image3
            5.4 12.4 543.2 12.3
            2.0 3.0 5.6 2.44
            2.0 3.0 5.6 2.46
            2.0 3.0 5.6 2.48
    class2
        image4
            32.54 21.4 32.43 12.23
        image5
imageset2
    class1
        image6
            32.54 21.4 32.43 12.256
classes
    class1 abc
    class2 def
    class3 ghi

Notice that this is exactly the structure reflected in the above code used to parse the file. We can also specify an ignore_type such that any line which matches the Generic or string passed in is skipped.

Parameters:
  • form (dict[str | DataType | Static | Generic | Alias, Any] | list[Any]) – The form which matches the data to be read from TXTFile.

  • ignore_type (Optional[Union[list[Union[Generic, str]], Generic, str]]) – A list, or one value of Generic/str objects which if matched will ignore the line parsed.

parse(path: str, curr_path: list[str]) dict[source]

Parses a file.

  • path (str): the path to the file.

class dynamicdl.processing.xmlfile.XMLFile(form: dict[Static | Generic, Any])[source]

Bases: DataFile

The XMLFile class represents an annotation object and is similar to the JSONFile class in terms of hierarchical structure and parsing. The one key difference is the needed usage of AmbiguousList over GenericList, as the presence of multiple tags of the same name will be parsed as a list, while tags of one name will be parsed as an item. The algorithm appropriately interprets list objects as AmbiguousList for this exact reason in XMLFile, but if one desires a GenericList it will have to be instantiated manually.

The structure follows suit to the hierarchy, just as in JSONFile. Here is a snippet from the Oxford-IIIT Pets Dataset:

<annotation>
    <folder>OXIIIT</folder>
    <filename>Abyssinian_1.jpg</filename>
    <source>
        <database>OXFORD-IIIT Pet Dataset</database>
        <annotation>OXIIIT</annotation>
        <image>flickr</image>
    </source>
    <size>
        <width>600</width>
        <height>400</height>
        <depth>3</depth>
    </size>
    <segmented>0</segmented>
    <object>
        <name>cat</name>
        <pose>Frontal</pose>
        <truncated>0</truncated>
        <occluded>0</occluded>
        <bndbox>
            <xmin>333</xmin>
            <ymin>72</ymin>
            <xmax>425</xmax>
            <ymax>158</ymax>
        </bndbox>
        <difficult>0</difficult>
    </object>
</annotation>

Here we do not specify the extraneous information and get straight to the point:

XMLFile({
    "annotation": {
        "filename": Generic("{}.jpg", DT.IMAGE_NAME),
        "object": AmbiguousList({
            "name": DT.BBOX_CLASS_NAME,
            "bndbox": {
                "xmin": DT.XMIN,
                "ymin": DT.YMIN,
                "xmax": DT.XMAX,
                "ymax": DT.YMAX
            }
        })
    }
})
Parameters:

form (dict[str | DataType | Static | Generic | Alias, Any] | list[Any]) – The form which matches the data to be read from XMLFile.

parse(path: str, curr_path: list[str]) dict[source]

Parses a file.

  • path (str): the path to the file.

class dynamicdl.processing.yamlfile.YAMLFile(form: dict[Static | Generic, Any])[source]

Bases: DataFile

The XMLFile class represents an annotation object and is similar to the JSONFile class in terms of hierarchical structure and parsing.

The structure follows suit to the hierarchy, just as in JSONFile. Here is a snippet from the Tomato Leaf Diseases Dataset:

train: ../train/images
val: ../valid/images
test: ../test/images

nc: 7
names: ['Bacterial Spot', 'Early_Blight', 'Healthy', 'Late_blight', 'Leaf Mold', 'Target_Spot', 'black spot']

roboflow:
workspace: sylhet-agricultural-university
project: tomato-leaf-diseases-detect
version: 3
license: Public Domain

Of particular interest is the names list, in which we need an ImpliedList to set up a pairing between class ID and class name. We do exactly that:

YAMLFile({
    'names': Pairing(
        ImpliedList([DT.BBOX_CLASS_NAME], indexer=DT.BBOX_CLASS_ID),
        DT.BBOX_CLASS_NAME, DT.BBOX_CLASS_ID
    )
})
Parameters:

form (dict[str | DataType | Static | Generic | Alias, Any] | list[Any]) – The form which matches the data to be read from JSONFile.

parse(path: str, curr_path: list[str]) dict[source]

Parses a file.

  • path (str): the path to the file.

Image dummy classes.

class dynamicdl.processing.images.ImageEntry[source]

Bases: object

Arbitrary image file to be used as a value in the key-value pairing of DynamicDL filestructure formats. It is a dummy object which provides absolute file and image data during processing, and is a marker object to recognize the presence of an image.

class dynamicdl.processing.images.SegmentationImage[source]

Bases: object

Arbitrary segmentation image file to be used as a value in the key-value pairing of DynamicDL filestructure formats. It is a dummy object which provides absolute file and segmentation image map data during processing, and is a marker object to recognize the presence of an image.

Module contents

The dynamicdl.processing module handles file processing, including annotation files and image files. These are to be used in describing DynamicDL dataset formats as values following a File key indicator.

Classes:

  • CSVFile

  • JSONFile

  • TXTFile

  • XMLFile

  • YAMLFile

  • ImageEntry

  • SegmentationImage