Udt-to-cvat NPM

UDT to YOLO

Converts files in the Universal Data Tool format to the YOLOv1.1 format.

Note: We do a variation on the YOLO format. Each sample gets it's own subdirectory which is a valid YOLO dataset. We may change this in the future.

Usage

udt-to-yolo ./dataset.udt.json -o yolo-dir

YOLOv1.1 Format

There doesn't seem to be a formal spec for the YOLOv1.1 format, but the directory structure is simple enough that we can describe it based on the output of programs like CVAT.

The YOLOv1.1 format is a directory organized like this:

.
├── obj.data                 ini-like file with dataset stats and paths
├── obj.names                labels, each label has new line
├── train.txt                lists all the images
└── obj_train_data           directory containing images and bounding box txt files
    ├── frame_000000.PNG     image
    ├── frame_000000.txt     bounding boxes for image with same name
    ├── frame_000001.PNG     etc.
    ├── frame_000001.txt
    ├── frame_000002.PNG
    ├── frame_000002.txt
    ├── frame_000003.PNG
    └── frame_000003.txt

obj.data

Contains key-value pairs.

Key	Description
classes	Number of labels
train	path to train.txt (relative to "data", the main directory)
names	path to label names (relative to "data", the main directory)
backup	???

classes = 3
train = data/train.txt
names = data/obj.names
backup = backup/

obj.names

Each label that appears in the dataset.

label1
label2
label3

train.txt

Paths to every image frame relative to "data" (main directory).

data/obj_train_data/frame_000000.PNG
data/obj_train_data/frame_000001.PNG
data/obj_train_data/frame_000002.PNG
data/obj_train_data/frame_000003.PNG
data/obj_train_data/frame_000004.PNG
data/obj_train_data/frame_000005.PNG

obj_train_data/frame_XXXXXX.PNG

Each frame of the video, or each image of the dataset.

obj_train_data/frame_XXXXXX.txt

Lists all the bounding boxes of the image. Each line is a bounding box. The line represents the <label index (starting at 1)> <leftmost X position> <topmost Y position> <width> <height>.

The unit of the X, Y, Width and Height are all fractions of the image. So for example, if you have an image that is (1000 width, 800 height), and has a bounding box that starts at position (100px from left, 200px from top) with a width of 250 pixels and a height of 300 pixels. Let's say this box uses the second label. You would have the following YOLO line:

2 0.1 0.25 0.25 0.375

Here's the step in-between: 2 100/1000 200/800 250/1000 300/800

1 0.813552 0.562875 0.033875 0.035104
2 0.813552 0.562875 0.033875 0.035104

@ffprobe-installer/ffprobe cli-progress color-rgba download ffmpeg-extract-frames fluent-ffmpeg mkdirp node-ffprobe tmp yargs

@everything-registry/sub-chunk-3003 @infinitebrahmanuniverse/nolb-ud

0.1.1

5 years ago

0.1.0

5 years ago