udt-to-cvat v0.1.1
UDT to YOLO
Converts files in the Universal Data Tool format to the YOLOv1.1 format.
Note: We do a variation on the YOLO format. Each sample gets it's own subdirectory which is a valid YOLO dataset. We may change this in the future.
Usage
udt-to-yolo ./dataset.udt.json -o yolo-dir
YOLOv1.1 Format
There doesn't seem to be a formal spec for the YOLOv1.1 format, but the directory structure is simple enough that we can describe it based on the output of programs like CVAT.
The YOLOv1.1 format is a directory organized like this:
.
├── obj.data ini-like file with dataset stats and paths
├── obj.names labels, each label has new line
├── train.txt lists all the images
└── obj_train_data directory containing images and bounding box txt files
├── frame_000000.PNG image
├── frame_000000.txt bounding boxes for image with same name
├── frame_000001.PNG etc.
├── frame_000001.txt
├── frame_000002.PNG
├── frame_000002.txt
├── frame_000003.PNG
└── frame_000003.txt
obj.data
Contains key-value pairs.
Key | Description |
---|---|
classes | Number of labels |
train | path to train.txt (relative to "data", the main directory) |
names | path to label names (relative to "data", the main directory) |
backup | ??? |
classes = 3
train = data/train.txt
names = data/obj.names
backup = backup/
obj.names
Each label that appears in the dataset.
label1
label2
label3
train.txt
Paths to every image frame relative to "data" (main directory).
data/obj_train_data/frame_000000.PNG
data/obj_train_data/frame_000001.PNG
data/obj_train_data/frame_000002.PNG
data/obj_train_data/frame_000003.PNG
data/obj_train_data/frame_000004.PNG
data/obj_train_data/frame_000005.PNG
obj_train_data/frame_XXXXXX.PNG
Each frame of the video, or each image of the dataset.
obj_train_data/frame_XXXXXX.txt
Lists all the bounding boxes of the image. Each line is a bounding box. The line represents
the <label index (starting at 1)> <leftmost X position> <topmost Y position> <width> <height>
.
The unit of the X, Y, Width and Height are all fractions of the image. So for example, if you have an image that is (1000 width, 800 height), and has a bounding box that starts at position (100px from left, 200px from top) with a width of 250 pixels and a height of 300 pixels. Let's say this box uses the second label. You would have the following YOLO line:
2 0.1 0.25 0.25 0.375
Here's the step in-between:
2 100/1000 200/800 250/1000 300/800
1 0.813552 0.562875 0.033875 0.035104
2 0.813552 0.562875 0.033875 0.035104