data

Main data module for training in HAT, which contains datasets, transforms, samplers.

data

collates

MemberSummary
collates.CocktailCollateCocktailCollate.
collates.collate_mmfusion_3d

dataloaders

MemberSummary
passthrough_dataloader.PassThroughDataLoaderDirectly pass through input example.

datasets

MemberSummary
argoverse2_dataset.Argoverse2Base
argoverse2_dataset.Argoverse2PackedDataset
argoverse_dataset.Argoverse1SamplerSampler for argoverse dataset.
argoverse_dataset.Argoverse1DatasetArgoverse dataset v1.
argoverse_dataset.Argoverse1PackerPacker for converting argoverse dataset from csv format into lmdb format.
batch_transform_dataset.BatchTransformDatasetDataset which uses different transforms in different epochs.
carfusion_keypoints_dataset.CarfusionPackDataCarfusion Dataset of packed lmdb format.
carfusion_keypoints_dataset.CarfusionCroppedDataCropped Carfusion Dataset.
cityscapes.CityscapesCityscapes provides the method of reading cityscapes data from target pack type.
culane_dataset.CuLaneDatasetCuLaneDataset provides the method of reading CuLaneDataset data from target pack type.
culane_dataset.CuLanePackerCuLanePacker is used for converting Culane dataset to target DataType format.
culane_dataset.CuLaneFromImageCuLane dataset which gets img data and gt lines from the data_path.
dataset_wrappers.RepeatDatasetA wrapper of repeated dataset.
dataset_wrappers.ComposeDatasetDataset wrapper for multiple datasets with precise batch size.
dataset_wrappers.DistributedComposeRandomDatasetDataset wrapper for multiple datasets fair sample weights accross multi workers in a distributed environment.
dataset_wrappers.ComposeIterableDatasetDataset wrapper built on ComposeDataset, shuffle, supporting multi workers.
dataset_wrappers.ResampleDatasetA wrapper of resample dataset.
dataset_wrappers.ConcatDatasetA wrapper of concatenated dataset with group flag.
dataset_wrappers.CBGSDatasetA wrapper of class sampled dataset.
dataset_wrappers.ChunkShuffleDatasetDataset wrapper for chunk shuffle.
flyingchairs_dataset.FlyingChairsFlyingChairs provides the method of reading flyingChairs data from target pack type.
flyingchairs_dataset.FlyingChairsFromImageDataset which gets img data from the data_path.
flyingchairs_dataset.FlyingChairsPackerFlyingChairsPacker is used for converting FlyingChairs dataset to target DataType format.
imagenet.ImageNetImageNet provides the method of reading imagenet data from target pack type.
imagenet.ImageNetFromImageImageNet from image by torchvison.
mscoco.CocoCoco provides the method of reading coco data from target pack type.
mscoco.CocoFromImageCoco from image by torchvision.
kitti2d.Kitti2DKitti2D provides the method of reading kitti2d data from target pack type.
kitti3d.Kitti3DDetectionKitti 3D Detection Dataset.
kitti3d.Kitti3DKitti3D provides the method of reading kitti3d data from target pack type.
mot17_dataset.Mot17DatasetMot17Dataset provides the method of reading Mot17 data from target pack type.
mot17_dataset.Mot17PackerMot17Packer is used for converting MOT17 dataset to target DataType format.
mot17_dataset.Mot17FromImageMot17FromImage which gets img data and gt from the data_path.
nuscenes_dataset.NuscenesMonoDataset
nuscenes_dataset.NuscenesBevDatasetBev Dataset object for packed NuScenes.
nuscenes_dataset.NuscenesBevSequenceDataset
nuscenes_dataset.NuscenesLidarDatasetLidar Dataset object for packed NuScenes.
nuscenes_dataset.NuscenesLidarWithSegDatasetLidar Dataset object for packed NuScenes.
nuscenes_dataset.NuscenesFromImageRead NuScenes from image.
nuscenes_dataset.NuscenesFromImageSequence
nuscenes_dataset.NuscenesMonoFromImage
nuscenes_map_dataset.NuscenesMapDatasetDataset object for packed NuScenes.
occ3d_nuscenes_dataset.Occ3dNuscenesDatasetOccupancy Dataset object for packed NuScenes.
rand_dataset.RandDataset
rand_dataset.SimpleDataset
sceneflow_dataset.SceneFlowSceneFlow provides the method of reading SceneFlow data from target pack type.
sceneflow_dataset.SceneFlowPackerSceneFlowPacker is used for converting sceneflow dataset to target DataType format.
sceneflow_dataset.SceneFlowFromImageSceneFlowFromImage which gets img data and gt from the data_path.
voc.PascalVOCPascalVOC provides the method of reading voc data from target pack type.
voc.VOCFromImageVOC from image by torchvision.

samplers

MemberSummary
dist_cycle_sampler_multi_dataset.DistributedCycleMultiDatasetSamplerIn one epoch period, do cyclic sampling on the dataset according to iter_time.
dist_group_sampler.DistributedGroupSamplerSampler that restricts data loading to a subset of the dataset.
dist_sampler.DistSamplerHookThe hook api for torch.utils.data.DistributedDampler.
dist_set_epoch_dataset_sampler.DistSetEpochDatasetSamplerDistributed sampler that supports set epoch in dataset.
dist_stream_sampler.DistStreamBatchSampler
selected_sampler.SelectedSamplerDistributed sampler that supports user-defined indices.

transforms

MemberSummary
common.ListToDictConvert list args to dict.
common.DeleteKeysDelete keys in input dict.
common.RenameKeysRename keys in input dict.
common.RepeatKeysRepeat keys in input dict.
common.UndistortionConvert a PIL Image or numpy.ndarray to
common.PILToTensorConvert PIL Image to Tensor.
common.PILToNumpyConvert PIL Image to Numpy.
common.TensorToNumpyConvert tensor to numpy.
common.ToCUDAMove Tensor to cuda device.
common.AddKeysAdd new key-value in input dict.
common.CopyKeysCopy new key in input dict.
common.TaskFilterTransformApply transform on assign task.
common.RandomSelectOneSelect one of transforms to apply.
common.MultiTaskAnnoWrapperWrapper for multi-task anno generating.
common.ConvertDataTypeConvert data type.
common.FixLengthPad
common.BgrToYuv444BgrToYuv444 is used for color format convert.
common.BgrToYuv444V2BgrToYuv444V2 is used for color format convert.
classification.ConvertLayoutConvertLayout is used for layout convert.
classification.OneHotOneHot is used for convert layer to one-hot format.
classification.LabelSmoothLabelSmooth is used for label smooth.
classification.TimmTransformsTransforms of timm.
classification.TimmMixupMixup of timm.
detection.ResizeResize image & bbox & mask & seg.
detection.Resize3DResize 3D labels.
detection.RandomFlipFlip image & bbox & mask & seg & flow.
detection.Pad
detection.NormalizeNormalize image.
detection.RandomCrop
detection.FixedCropCrop image with fixed position and size.
detection.PresetCropCrop image with preset roi param.
detection.RandomSizeCrop
detection.ToTensorConvert objects of various python types to torch.Tensor and convert the img to yuv444 format if to_yuv is True.
detection.Batchify
detection.ColorJitterRandomly change the brightness, contrast, saturation and hue of an image.
detection.RandomExpandRandom expand the image & bboxes.
detection.MinIoURandomCropRandom crop the image & bboxes, the cropped patches have minimum IoU requirement with original image & bboxes, the IoU threshold is randomly selected from min_ious.
detection.AugmentHSVRandom add color disturbance.
detection.IterableDetRoITransformIterable transformer base on rois for object detection.
detection.PadDetData
detection.DetAffineAugTransformerAffine augmentation for object detection.
detection.DetInputPadding
detection.ToFasterRCNNDataPrepare faster-rcnn input data.
detection.ToLdmkRCNNDataTransform dataset to RCNN input need.
detection.ToMultiTaskFasterRCNNDataConvert multi-classes detection data to multi-task data.
detection.PadTensorListToBatchList of image tensor to be stacked vertically.
detection.PlainCopyPasteCopy and paste instances plainly.
detection.HueSaturationValueRandomly change hue, saturation and value of the input image.
detection.RGBShiftRandomly shift values for each channel of the input image.
detection.MeanBlurApply mean blur to the input image using a fix-sized kernel.
detection.MedianBlurApply median blur to the input image using a fix-sized kernel.
detection.RandomBrightnessContrastRandomly change brightness and contrast of the input image.
detection.ShiftScaleRotateRandomly apply affine transforms: translate, scale and rotate the input.
detection.RandomResizedCropTorchvision's variant of crop a random part of the input, and rescale it to some size.
detection.AlbuImageOnlyTransformAlbuImageOnlyTransform used on img only.
detection.BoxJitterJitter box to simulate the box predicted by the model.
detection.DetYOLOv5MixUpMixUp augmentation.
detection.DetMosaicMosaic augmentation for detection task.
detection.MosaicMosaic augmentation for detection task.
detection.ToPositionFasterRCNNDataTransform person potion dataset to RCNN input need.
detection.IterableDetRoIListTransformIterable transformer base on roi list for object detection.
faceid.RandomGrayTransform RGB or BGR format into Gray format.
faceid.JPEGCompressDo JPEG compression to downgrade image quality.
faceid.SpatialVariantBrightnessSpatial variant brightness, Enhanced Edition.
faceid.ContrastRandomly jitters image contrast with a factor.
faceid.GaussianBlurRandomly add guass blur on an image.
faceid.MotionBlurRandomly add motion blur on an image.
faceid.RandomDownSampleFirst downsample and upsample to original size.
flashocc_transforms.ImageAugmentationAugment PIL Images according to the given data_config.
flashocc_transforms.BevFeatureAugAugment bev feature.
grid_mask.GridMaskGenerate GridMask for grid masking augmentation.
keypoints.RandomPadLdmkDataRandomPadLdmkData is a class for randomly padding landmark data.
keypoints.AddGaussianNoiseGenerate gaussian noise on img.
keypoints.GenerateHeatmapTargetGenerateHeatmapTarget is a class for generating heatmap targets.
lidar.BBoxSelectorFilter out GT BBoxes.
lidar.VoxelizationPerform voxelization for points in multiple frames.
lidar.DetectionTargetGeneratorCreate detection training targets.
lidar.DetectionAnnoToBEVFormat
lidar.ParsePointCloudParse point cloud from bytes to numpy array.
lidar.Point2VCSTransform pointclouds from lidar CS to VCS.
multi_views.MultiViewsSpiltImgTransformWrapperWrapper split img transform for image inputs.
multi_views.MultiViewsImgTransformWrapperWrapper img transform for image inputs.
multi_views.MultiViewsImgResizeResize PIL Images to the given size and modify intrinsics.
multi_views.MultiViewsImgCropCrop PIL Images to the given size and modify intrinsics.
multi_views.MultiViewsImgFlipFlip PIL Images and modify intrinsics.
multi_views.MultiViewsImgRotateRotate PIL Images.
multi_views.MultiViewsGridMaskFor grid masking augmentation.
multi_views.MultiViewsPhotoMetricDistortion
multi_views.BevBBoxRotation
multi_views.BevFeatureRotateRotate feat.
multi_views.BevFeatureFlipFlip bev feature.
segmentation.SegRandomCropRandom crop on data with gt_seg label, can only be used for segmentation
segmentation.ReformatLanePolygon
segmentation.PolygonToMask
segmentation.SegReWeightByAreaCalculate the weight of each category according to the area of each category.
segmentation.LabelRemapRemap labels.
segmentation.SegOneHotOneHot is used for convert layer to one-hot format.
segmentation.SegResizeApply resize for both image and label.
segmentation.SegResizeAffineResize image & seg.
segmentation.SegRandomAffineApply random for both image and label.
segmentation.ScaleScale input according to a scale list.
segmentation.FlowRandomAffineScale
segmentation.SegRandomCutOutCutOut operation for segmentation task.
seq_transform.SeqRandomFlipFlip image & bbox & mask & seg & flow for sequence.
seq_transform.SeqAugmentHSVRandom add color disturbance for sequence.
seq_transform.SeqResize
seq_transform.SeqPad
seq_transform.SeqToFasterRCNNData
seq_transform.SeqAlbuImageOnlyTransform
seq_transform.SeqBgrToYuv444BgrToYuv444 for sequence.
seq_transform.SeqToTensorToTensor for sequence.
seq_transform.SeqNormalizeNormalize for sequence.
seq_transform.SeqRandomSizeCropRandomSizeCrop for sequence.

gaze

MemberSummary
gaze.GazeYUVTransformYUVTransform for Gaze Task.
gaze.GazeRandomCropWoResizeRandom crop without resize.
gaze.ClipClip Data to [minimum, maximum].
gaze.RandomColorJitterRandomly change the brightness, contrast, saturation and hue of an image.
gaze.GazeRotate3DWithCropRandom rotate image, calculate ROI and random crop if necessary.

lidar_utils

MemberSummary
preprocess.DBFilterByDifficultyFilter sampled data by diffculties.
preprocess.DBFilterByMinNumPointFilter sampled data by NumPoint.
lidar_transform_3d.ObjectSampleSample GT objects to the data.
lidar_transform_3d.ObjectNoiseApply noise to each GT objects in the scene.
lidar_transform_3d.PointRandomFlipFlip the points & bbox.
lidar_transform_3d.PointGlobalRotationApply global rotation to a 3D scene.
lidar_transform_3d.PointGlobalScalingApply global scaling to a 3D scene.
lidar_transform_3d.ShufflePointsShuffle Points.
lidar_transform_3d.PointCloudSegPreprocessPoint cloud preprocessing transforms for segmentation.
lidar_transform_3d.LidarMultiPreprocessPoint cloud preprocessing transforms for segmentation.
lidar_transform_3d.ObjectRangeFilterFilter objects by point cloud range.
lidar_transform_3d.AssignSegLabelAssign segmentation labels for lidar data.
lidar_transform_3d.LidarReformatReformat data.
sample_ops.DataBaseSampler

API Reference

class hat.data.collates.collates.CocktailCollate(ignore_id: int = -1, batch_first: bool = True, mode: str = 'train')

CocktailCollate.

鸡尾酒(多模)算法批量数据collate的Callable类. 默认需要处理的是 dict 类型数据的列表。

首先,将List[Dict[str, …]]转换成Dict[str, List] 然后,对dict中的 ‘images’, ‘audio’, ‘label’ 跟训练相关的数据。 进行 pad_sequence 操作。对 ‘tokens’ 直接跳过。 其他的key使用default_collate

  • Parameters:
    • ignore_id – 被忽略的标签ID, 默认使用wenet中的-1即-1. 处理标签数据时,使用-1的值作为padding值
    • batch_first – 处理批量数据时, batch 的维度是否在第1位(数组编号0). 如果batch_first是True, 数组为 BxTx* 如果batch_first是False, 数组为 TxBx*
    • mode – 以什么模式进行 collates. train, calibration

class hat.data.dataloaders.passthrough_dataloader.PassThroughDataLoader(data: Any, *, length: int, clone: bool = False)

Directly pass through input example.

  • Parameters:
    • data – Input data
    • length – Length of dataloader
    • clone – Whether clone input data

class hat.data.datasets.argoverse2_dataset.Argoverse2PackedDataset(data_path, split, transforms=None, pack_type='lmdb', input_dim=2, pack_kwargs: dict | None = None)

class hat.data.datasets.argoverse_dataset.Argoverse1Dataset(data_path: str, map_path: str, transforms: Callable | None = None, pred_step: int = 20, max_distance: float = 50.0, max_lane_num: int = 64, max_lane_poly: int = 9, max_traj_num: int = 32, max_goals_num: int = 2048, use_subdivide: bool = True, pack_type: str | None = None, pack_kwargs: dict | None = None)

Argoverse dataset v1.

  • Parameters:
    • data_path – The path of the parent directory of data.
    • map_path – The path of map data.
    • transforms – A function transform that takes input sample and its target as entry and returns a transformed version.
    • pred_step – Steps for traj prediction.
    • max_distance – Max distance for map range.
    • max_lane_num – Max num of lane vector.
    • max_lane_poly – Max num of lane poly.
    • max_traj_num – Max num of traj num.
    • max_goals_num – Max goals num .
    • use_subdivide – Whether use subdivide for goals generation.
    • pack_type – The pack type.
    • pack_kwargs – Kwargs for pack type.
    • transforms – Optional[Callable] = None,

class hat.data.datasets.argoverse_dataset.Argoverse1Packer(src_data_path: str, mode: str, target_data_path: str, num_workers: int, pack_type: str, **kwargs)

Packer for converting argoverse dataset from csv format into lmdb format.

  • Parameters:
    • src_data_path – The path of the parent directory of data.
    • mode – The name of the dataset directory.
    • target_data_path – The target path to store lmdb dataset.
    • num_workers – Num workers for reading original data. while num_workers <= 0 means pack by single process. num_workers >= 1 mean pack by num_workers process.
    • pack_type – The file type for packing.
    • **kwargs – Kwargs for Packer.

pack_data(idx)

Read orginal data from Folder with some process.

  • Parameters: idx – Idx for reading.
  • Returns: Processed data for pack.

class hat.data.datasets.argoverse_dataset.Argoverse1Sampler(map_path: str, pred_step: int = 20, traj_scale: int = 50, max_distance: float = 50.0, max_lane_num: int = 64, max_lane_poly: int = 9, max_traj_num: int = 32, max_goals_num: int = 2048, use_subdivide: bool = True)

Sampler for argoverse dataset.

  • Parameters:
    • map_path – The path of map data.
    • pred_step – Steps for traj prediction.
    • traj_scale – Scale for traj feat. Needed for qat.
    • max_distance – Max distance for map range.
    • max_lane_num – Max num of lane vector.
    • max_lane_poly – Max num of lane poly.
    • max_traj_num – Max num of traj num.
    • max_goals_num – Max goals num .
    • use_subdivide – Whether use subdivide for goals generation.
    • pack_type – The pack type.
    • pack_kwargs – Kwargs for pack type.

class hat.data.datasets.batch_transform_dataset.BatchTransformDataset(dataset: Dataset, transforms_cfgs: List, epoch_steps: List)

Dataset which uses different transforms in different epochs.

  • Parameters:
    • dataset – Target dataset.
    • transforms_cfgs – The list of different transform configs.
    • epoch_steps – Effective epoch of different transforms.

class hat.data.datasets.carfusion_keypoints_dataset.CarfusionCroppedData(data_path: str, anno_json_file: str, transforms: list | None = None)

Cropped Carfusion Dataset. The car instances are cropped.

carfusion is a car keypoints datasets, see http://www.cs.cmu.edu/~ILIM/projects/IM/CarFusion/cvpr2018/index.html

  • Parameters:
    • data_path – The path to the dataset.
    • anno_json_file – The path to the annotation JSON file in COCO format.
    • transforms – List of data transformations to apply. Defaults to None.

class hat.data.datasets.carfusion_keypoints_dataset.CarfusionPackData(data_path: str, transforms: List | None = None, pack_type: str | None = None, pack_kwargs: dict | None = None)

Carfusion Dataset of packed lmdb format.

carfusion is a car keypoints datasets, see http://www.cs.cmu.edu/~ILIM/projects/IM/CarFusion/cvpr2018/index.html

  • Parameters:
    • data_path – The path to the packed dataset.
    • transforms – List of data transformations to apply.
    • pack_type – The type of packing used for the dataset. here is “lmdb”
    • pack_kwargs – Additional keyword arguments for dataset packing.

class hat.data.datasets.cityscapes.Cityscapes(data_path: str, transforms: list | None = None, pack_type: str | None = None, pack_kwargs: dict | None = None, color_space: str = 'bgr')

Cityscapes provides the method of reading cityscapes data from target pack type.

  • Parameters:
    • data_path – The path of packed file.
    • pack_type – The pack type.
    • transfroms – Transfroms of cityscapes before using.
    • pack_kwargs – Kwargs for pack type.
    • color_space – color space of data.

class hat.data.datasets.culane_dataset.CuLaneDataset(data_path: str, transforms: List | None = None, pack_type: str | None = None, pack_kwargs: dict | None = None, to_rgb: bool = True)

CuLaneDataset provides the method of reading CuLaneDataset data from target pack type.

  • Parameters:
    • data_path – The path of packed file.
    • transforms – Transfroms of data before using.
    • pack_type – The pack type.
    • pack_kwargs – Kwargs for pack type.
    • to_rgb – Whether to convert to rgb color_space.

class hat.data.datasets.culane_dataset.CuLaneFromImage(data_path: str, transforms: List | None = None, to_rgb: bool = False, train_flag: bool = False)

CuLane dataset which gets img data and gt lines from the data_path.

  • Parameters:
    • data_path – The path where the image and gt lines is stored.
    • transforms – List of transform.
    • to_rgb – Whether to convert to rgb color_space.
    • train_flag – Whether the data use to train or test.

class hat.data.datasets.culane_dataset.CuLanePacker(src_data_dir: str, target_data_dir: str, split_name: str, num_workers: int, pack_type: str, num_samples: int | None = None, **kwargs)

CuLanePacker is used for converting Culane dataset to target DataType format.

  • Parameters:
    • src_data_dir – The dir of original culane data.
    • target_data_dir – Path for packed file.
    • split_name – Split name of data, must be train or test.
    • num_workers – Num workers for reading data using multiprocessing.
    • pack_type – The file type for packing.
    • num_samples – the number of samples you want to pack. You will pack all the samples if num_samples is None.

pack_data(idx)

Read orginal data from Folder with some process.

  • Parameters: idx – Idx for reading.
  • Returns: Processed data for pack.

class hat.data.datasets.dataset_wrappers.CBGSDataset(dataset)

A wrapper of class sampled dataset.

Implementation of paper Class-balanced Grouping and Sampling for Point Cloud 3D Object Detection.

Balance the number of scenes under different classes.

  • Parameters: dataset – The dataset to be class sampled.

class hat.data.datasets.dataset_wrappers.ChunkShuffleDataset(dataset, chunk_size_in_worker=1024, drop_last=True, sort_by_str=False, seed=0)

Dataset wrapper for chunk shuffle.

Chunk shuffle will divide the entire dataset into chunks, then shuffle within chunks and shuffle between chunks.

  • Parameters:
    • dataset – datasets for shuffle.
    • chunk_size_in_worker – Chunk size for shuffle in each worker.
    • drop_last – if drop last.
    • sort_by_str – whether to sort key by str. Str is the sort method of lmdb.
    • seed – random seed for shuffle

class hat.data.datasets.dataset_wrappers.ComposeDataset(datasets: List[Dict], batchsize_list: List[int])

Dataset wrapper for multiple datasets with precise batch size.

  • Parameters:
    • datasets – config for each dataset.
    • batchsize_list – batchsize for each task dataset.

class hat.data.datasets.dataset_wrappers.ComposeIterableDataset(datasets: List[Dict], batchsize_list: List[int], multi_sample_output: bool = True)

Dataset wrapper built on ComposeDataset, shuffle, supporting multi workers.

  • Parameters:
    • datasets – config for each dataset.
    • batchsize_list – batchsize for each dataset.
    • multi_sample_output – whether dataset outputs multiple samples at the same time.

class hat.data.datasets.dataset_wrappers.ConcatDataset(datasets: List, with_flag: bool = False, with_pack_flag: bool = False, record_index: bool = False, accumulate_flag: bool = False)

A wrapper of concatenated dataset with group flag.

Same as torch.utils.data.dataset.ConcatDataset, addititionally concatenat the group flag of all dataset.

  • Parameters:
    • datasets – A list of datasets.
    • with_flag – Whether to concatenate datasets flags. If True, concatenate all datasets flag ( all datasets must has flag attribute in this case). Default to False.
    • with_pack_flag – Whether to concatenate dataset.pack_flag. If True, aggregates and concatenates all datasets pack_flag (all datasets must has pack_flag attribute in this case). Default to False. Pack_flag identities data belonging to different packs. Data belonging to the same pack has the same pack_flag and vice versa.
    • record_index – Whether to record the index. If True, record the index. Default to False.

class hat.data.datasets.dataset_wrappers.DistributedComposeRandomDataset(datasets: List[Dataset], sample_weights: List[int], shuffle=True, seed=0, multi_sample_output=False)

Dataset wrapper for multiple datasets fair sample weights accross multi workers in a distributed environment.

Each datsaet is cutted by (num_workers x num_ranks).

  • Parameters:
    • datasets – list of datasets.
    • sample_weights – sample weights for each dataset.
    • shuffle – shuffle each dataset when set to True
    • seed – random seed for shuffle
    • multi_sample_output – whether dataset outputs multiple samples at the same time.

class hat.data.datasets.dataset_wrappers.RepeatDataset(dataset: Dataset, times: int)

A wrapper of repeated dataset.

Using RepeatDataset can reduce the data loading time between epochs.

  • Parameters:
    • dataset – The datasets for repeating.
    • times – Repeat times.

class hat.data.datasets.dataset_wrappers.ResampleDataset(dataset: Dict, with_flag: bool = False, with_pack_flag: bool = False, resample_interval: int = 1)

A wrapper of resample dataset.

Using ResampleDataset can resample on original dataset : with specific interval.

  • Parameters:
    • dataset – The datasets for resampling.
    • with_flag – Whether to use dataset.flag. If True, resampling dataset.flag with resample_interval ( dataset must has flag attribute in this case.)
    • with_pack_flag – Whether to use dataet.pack_flag. If True, resampling pack_flag with resample_interval (dataset must has flag attribute in this case.) Default to False. Pack_flag identities samples belonging to different packs. Data belonging to the same pack has the same pack_flag and vice versa.
    • resample_interval – resample interval.

class hat.data.datasets.flyingchairs_dataset.FlyingChairs(data_path: str, transforms: list | None = None, pack_type: str | None = None, pack_kwargs: dict | None = None, to_rgb: bool = True)

FlyingChairs provides the method of reading flyingChairs data from target pack type.

  • Parameters:
    • data_path – The path of packed file.
    • transforms – Transfroms of data before using.
    • pack_type – The pack type.
    • pack_kwargs – Kwargs for pack type.
    • to_rgb – Whether to convert to rgb color_space.

class hat.data.datasets.flyingchairs_dataset.FlyingChairsFromImage(data_path: str, transforms: list | None = None, to_rgb: bool = True, train_flag: bool = False, image1_name: str = '_img1', image2_name: str = '_img2', image_type: str = '.ppm', flow_name: str = '_flow', flow_type: str = '.flo')

Dataset which gets img data from the data_path.

  • Parameters:
    • data_path – The path where the image and gt_flow is stored.
    • transforms – List of transform.
    • to_rgb – Whether to convert to rgb color_space.
    • train_flag – Whether the data use to train or test.
    • image1_name – The name suffix of image1.
    • image2_name – The name suffix of image2.
    • image_type – The image type of image1 and image2.
    • flow_name – The name suffix of flow.
    • flow_type – The flow type of flow.

class hat.data.datasets.flyingchairs_dataset.FlyingChairsPacker(src_data_dir: str, target_data_dir: str, split_name: str, num_workers: int, pack_type: str, num_samples: int | None = None, **kwargs)

FlyingChairsPacker is used for converting FlyingChairs dataset to target DataType format.

  • Parameters:
    • src_data_dir – The dir of original cityscapes data.
    • target_data_dir – Path for packed file.
    • split_name – Split name of data, such as train, val and so on.
    • num_workers – Num workers for reading data using multiprocessing.
    • pack_type – The file type for packing.
    • num_samples – the number of samples you want to pack. You will pack all the samples if num_samples is None.

pack_data(idx)

Read orginal data from Folder with some process.

  • Parameters: idx – Idx for reading.
  • Returns: Processed data for pack.

class hat.data.datasets.imagenet.ImageNet(data_path: str, out_pil: bool = False, transforms: List | None = None, pack_type: str | None = None, pack_kwargs: dict | None = None)

ImageNet provides the method of reading imagenet data from target pack type.

  • Parameters:
    • data_path – The path of packed file.
    • transforms – Transforms of voc before using.
    • pack_type – The pack type.
    • pack_kwargs – Kwargs for pack type.

class hat.data.datasets.imagenet.ImageNetFromImage(transforms=None, *args, **kwargs)

ImageNet from image by torchvison.

The params of ImageNetFromImage are same as params of torchvision.datasets.ImageNet.

class hat.data.datasets.mscoco.Coco(data_path: str, transforms: List | None = None, pack_type: str | None = None, pack_kwargs: dict | None = None)

Coco provides the method of reading coco data from target pack type.

  • Parameters:
    • data_path – The path of packed file.
    • transforms – Transfroms of data before using.
    • pack_type – The pack type.
    • pack_kwargs – Kwargs for pack type.

class hat.data.datasets.mscoco.CocoFromImage(*args, **kwargs)

Coco from image by torchvision.

The params of COCOFromImage is same as params of torchvision.dataset.CocoDetection.

class hat.data.datasets.kitti2d.Kitti2D(data_path: str, transforms: List | None = None, pack_type: str | None = None, pack_kwargs: dict | None = None)

Kitti2D provides the method of reading kitti2d data from target pack type.

  • Parameters:
    • data_path – The path of LMDB file.
    • transforms – Transforms of voc before using.
    • pack_type – The pack type.
    • pack_kwargs – Kwargs for pack type.

class hat.data.datasets.kitti3d.Kitti3D(data_path: str, num_point_feature: int = 4, transforms: List | None = None, pack_type: str | None = None, pack_kwargs: dict | None = None)

Kitti3D provides the method of reading kitti3d data from target pack type.

  • Parameters:
    • data_path – The path of LMDB file.
    • transforms – Transforms of voc before using.
    • pack_type – The pack type.
    • pack_kwargs – Kwargs for pack type.

class hat.data.datasets.kitti3d.Kitti3DDetection(source_path: str, split_name: str, transforms: Callable | None = None, num_point_feature: int = 4)

Kitti 3D Detection Dataset.

  • Parameters:
    • source_path – Root directory where images are downloaded to.
    • split_name – Dataset split, ‘train’ or ‘val’.
    • transforms – A function transform that takes input sample and its target as entry and returns a transformed version.
    • num_point_feature – Number of feature in points, default 4 (x, y, z, r).

class hat.data.datasets.mot17_dataset.Mot17Dataset(data_path: str, sampler_lengths: List[int] = (1,), sample_mode: str = 'fixed_interval', sample_interval: int = 10, sampler_steps: List[int] | None = None, transforms: List | None = None, pack_type: str | None = None, pack_kwargs: dict | None = None, to_rgb: bool = True)

Mot17Dataset provides the method of reading Mot17 data from target pack type.

  • Parameters:
    • data_path – The path of packed file.
    • sampler_lengths – The length of the sequence data.
    • sample_mode – The sampling mode, only support ‘fixed_interval’ or ‘random_interval’.
    • sample_interval – The sampling interval, if sample_mode is ‘random_interval’, randomly select from [1, sample_interval].
    • sampler_steps – Sequence length changes according to the epoch.
    • transforms – Transfroms of data before using.
    • pack_type – The pack type.
    • pack_kwargs – Kwargs for pack type.
    • to_rgb – Whether to convert to rgb color_space.

class hat.data.datasets.mot17_dataset.Mot17FromImage(data_path: str, sampler_lengths: List[int] = (1,), sample_mode: str = 'fixed_interval', sample_interval: int = 10, sampler_steps: List[int] | None = None, transforms: List | None = None, to_rgb: bool = True)

Mot17FromImage which gets img data and gt from the data_path.

  • Parameters:
    • data_path – The dir of mot17 data.
    • sampler_lengths – The length of the sequence data.
    • sample_mode – The sampling mode, only support ‘fixed_interval’ or ‘random_interval’.
    • sample_interval – The sampling interval, if sample_mode is ‘random_interval’, randomly select from [1, sample_interval].
    • sampler_steps – Sequence length changes according to the epoch.
    • transforms – List of transform.
    • to_rgb – Whether to convert to rgb color_space.

class hat.data.datasets.mot17_dataset.Mot17Packer(src_data_dir: str, target_data_dir: str, split_name: str, num_workers: int, pack_type: str, num_samples: int | None = None, **kwargs)

Mot17Packer is used for converting MOT17 dataset to target DataType format.

  • Parameters:
    • src_data_dir – The dir of original mot17 data.
    • target_data_dir – Path for packed file.
    • split_name – Split name of data, must be train or test.
    • num_workers – Num workers for reading data using multiprocessing.
    • pack_type – The file type for packing.
    • num_samples – the number of samples you want to pack. You will pack all the samples if num_samples is None.

pack_data(idx)

Read orginal data from Folder with some process.

  • Parameters: idx – Idx for reading.
  • Returns: Processed data for pack.

class hat.data.datasets.nuscenes_dataset.NuscenesBevDataset(with_bev_bboxes: bool = True, with_ego_bboxes: bool = False, with_lidar_bboxes: bool = False, with_bev_mask: bool = True, secondary_transforms: Callable | None = None, map_path: str | None = None, line_classes=None, ped_crossing_classes=None, contour_classes=None, bev_size: Tuple | None = None, bev_range: Tuple | None = None, map_size: Tuple | None = None, need_lidar=False, need_mono_data=False, switch_steps=0, **kwargs)

Bev Dataset object for packed NuScenes.

  • Parameters:
    • with_bev_bboxes – Whether include bev bboxes.
    • with_bev_mask – Whether include bev bboxes.
    • map_path – Path to Nuscenes Map, needed if include bev mask.
    • line_classes – Classes of line. ex. road divider, lane divider.
    • ped_crossing_classes – Classes of ped corssing. ex. ped_crossing
    • contour_classes – Classes of contour. ex. road segment, lane.
    • bev_size – Size for bev using meter. ex. (51.2, 51.2, 0.2)
    • bev_range – range for bev, alternative of bev_size. ex.(-61.2, -61.2, -2, 61.2, 61.2, 10)
    • map_size – size for seg map.

class hat.data.datasets.nuscenes_dataset.NuscenesBevSequenceDataset(num_seq, **kwargs)

class hat.data.datasets.nuscenes_dataset.NuscenesFromImage(version, src_data_dir, split_name='train', transforms=None, with_bev_bboxes: bool = True, with_ego_bboxes: bool = False, with_lidar_bboxes: bool = False, with_bev_mask: bool = True, map_path: str | None = None, line_classes=None, ped_crossing_classes=None, contour_classes=None, bev_size: Tuple | None = None, bev_range: Tuple | None = None, map_size: Tuple | None = None, need_lidar: bool = False)

Read NuScenes from image.

  • Parameters:
    • version – Version for nuscenes.
    • src_data_dir – Path for data.
    • split_name – Split_name for dataset.(ex. “train”, “val”)

class hat.data.datasets.nuscenes_dataset.NuscenesFromImageSequence(num_seq, **kwargs)

class hat.data.datasets.nuscenes_dataset.NuscenesLidarDataset(num_sweeps: int, info_path: str | None = None, load_dim: int | None = 5, use_dim: List[int] | None = None, time_dim: int | None = 4, pad_empty_sweeps: bool | None = True, remove_close: bool | None = True, use_valid_flag: bool | None = False, with_velocity: bool | None = True, classes: List[str] | None = None, test_mode: bool | None = False, filter_empty_gt: bool | None = True, **kwargs)

Lidar Dataset object for packed NuScenes.

  • Parameters:
    • num_sweeps – Max number of sweeps. Default: 10.
    • load_dim – Dimension number of the loaded points. Defaults to 5.
    • use_dim – Which dimension to use.
    • time_dim – Which dimension to represent the timestamps. Defaults to 4.
    • pad_empty_sweeps – Whether to repeat keyframe when sweeps is empty.
    • remove_close – Whether to remove close points.
    • use_valid_flag – Whether to use use_valid_flag key.
    • with_velocity – Whether include velocity prediction.
    • classes – Classes used in the dataset.
    • test_mode – If test_mode=True, it will not randomly sample sweeps but select the nearest N frames.
    • filter_empty_gt – Whether to filter empty GT.

get_cat_ids(idx: int)

Get category distribution of single scene.

  • Parameters: idx – Index of the data_info.
  • Returns: for each category, if the current scene : contains such boxes, store a list containing idx, otherwise, store empty list.
  • Return type: list

class hat.data.datasets.nuscenes_dataset.NuscenesLidarWithSegDataset(num_sweeps: int, info_path: str | None = None, load_dim: int | None = 5, use_dim: List[int] | None = None, time_dim: int | None = 4, pad_empty_sweeps: bool | None = True, remove_close: bool | None = True, use_valid_flag: bool | None = False, with_velocity: bool | None = True, classes: List[str] | None = None, test_mode: bool | None = False, filter_empty_gt: bool | None = True, **kwargs)

Lidar Dataset object for packed NuScenes.

  • Parameters:
    • num_sweeps – Max number of sweeps. Default: 10.
    • load_dim – Dimension number of the loaded points. Defaults to 5.
    • use_dim – Which dimension to use.
    • time_dim – Which dimension to represent the timestamps. Defaults to 4.
    • pad_empty_sweeps – Whether to repeat keyframe when sweeps is empty.
    • remove_close – Whether to remove close points.
    • use_valid_flag – Whether to use use_valid_flag key.
    • with_velocity – Whether include velocity prediction.
    • classes – Classes used in the dataset.
    • test_mode – If test_mode=True, it will not randomly sample sweeps but select the nearest N frames.
    • filter_empty_gt – Whether to filter empty GT.

class hat.data.datasets.nuscenes_dataset.NuscenesMonoDataset(**kwargs)

class hat.data.datasets.nuscenes_dataset.NuscenesMonoFromImage(version, src_data_dir, split_name='val', transforms=None)

class hat.data.datasets.nuscenes_map_dataset.NuscenesMapDataset(pc_range: List[int], map_ann_file: str | None = None, queue_length: int = 4, bev_size: Tuple[int, int] = (200, 200), fixed_ptsnum_per_line: int = -1, padding_value: int = -10000, map_classes: Tuple[str] | None = None, map_path: str | None = None, aux_seg: any | None = None, test_mode: bool | None = False, filter_empty_gt: bool | None = True, use_lidar_gt: bool = True, add_canbus: bool = False, **kwargs)

Dataset object for packed NuScenes.

This dataset adds static map elements.

  • Parameters:
    • pc_range – Range of the point cloud.
    • map_ann_file – Path to the map annotation file.
    • queue_length – Length of the queue.
    • bev_size – Size of the BEV image. Default is (200, 200).
    • fixed_ptsnum_per_line – Fixed number of points per line. Default is -1.
    • padding_value – Value to use for padding. Default is -10000.
    • map_classes – Tuple of map classes. Default is None.
    • map_path – Path to the map. Default is None.
    • aux_seg – Auxiliary segmentation information. Default is None.
    • test_mode – Whether in test mode. Default is False.
    • filter_empty_gt – Whether to filter empty ground truth. Default is True.
    • use_lidar_gt – Whether to use LiDAR ground truth. Default is True.
    • add_canbus – Whether to add CAN bus data. Default is False.
    • **kwargs – Additional keyword arguments.

classmethod get_map_classes(map_classes: Sequence[str] | None = None)

Get class names of current dataset.

  • Parameters: map_classes – If classes is None, use default CLASSES defined by builtin dataset. If classes is a string, take it as a file name. The file contains the name of classes where each line contains one class name. If classes is a tuple or list, override the CLASSES defined by the dataset.
  • Returns: A list of class names.

class hat.data.datasets.occ3d_nuscenes_dataset.Occ3dNuscenesDataset(data_path: str, load_interval: int = 1, transforms: Callable | None = None, pack_type: str | None = None, pack_kwargs: dict | None = None)

Occupancy Dataset object for packed NuScenes.

  • Parameters:
    • data_path – packed dataset path.
    • load_interval (int , optional) – Interval of loading the dataset. It is used to uniformly sample the dataset. Defaults to 1.
    • transforms – A function transform that takes input sample and its target as entry and returns a transformed version.
    • pack_type – The pack type.
    • pack_kwargs – Kwargs for pack type.

class hat.data.datasets.rand_dataset.RandDataset(length: int, example: Any, clone: bool = True, flag: int = 1)

class hat.data.datasets.rand_dataset.SimpleDataset(start: int, length: int, flag: int = 1)

class hat.data.datasets.sceneflow_dataset.SceneFlow(data_path: str, transforms: List | None = None, pack_type: str | None = None, pack_kwargs: dict | None = None)

SceneFlow provides the method of reading SceneFlow data from target pack type.

  • Parameters:
    • data_path – The path of packed file.
    • transforms – Transfroms of data before using.
    • pack_type – The pack type.
    • pack_kwargs – Kwargs for pack type.

class hat.data.datasets.sceneflow_dataset.SceneFlowFromImage(data_path: str, data_list: str, transforms: List | None = None)

SceneFlowFromImage which gets img data and gt from the data_path.

  • Parameters:
    • data_path – The dir of sceneflow data.
    • data_list – The filelist of data.
    • transforms – List of transform.

class hat.data.datasets.sceneflow_dataset.SceneFlowPacker(src_data_dir: str, target_data_dir: str, split_name: str, num_workers: int, pack_type: str, num_samples: int | None = None, **kwargs)

SceneFlowPacker is used for converting sceneflow dataset to target DataType format.

  • Parameters:
    • src_data_dir – The dir of original sceneflow data.
    • target_data_dir – Path for packed file.
    • split_name – Split name of data, must be train or test.
    • num_workers – Num workers for reading data using multiprocessing.
    • pack_type – The file type for packing.
    • num_samples – the number of samples you want to pack. You will pack all the samples if num_samples is None.

pack_data(idx)

Read orginal data from Folder with some process.

  • Parameters: idx – Idx for reading.
  • Returns: Processed data for pack.

class hat.data.datasets.voc.PascalVOC(data_path: str, transforms: List | None = None, pack_type: str | None = None, pack_kwargs: dict | None = None)

PascalVOC provides the method of reading voc data from target pack type.

  • Parameters:
    • data_path – The path of packed file.
    • transforms – Transforms of voc before using.
    • pack_type – The pack type.
    • pack_kwargs – Kwargs for pack type.

class hat.data.datasets.voc.VOCFromImage(size=416, *args, **kwargs)

VOC from image by torchvision.

The params of VOCFromImage is same as params of torchvision.dataset.VOCDetection.

class hat.data.samplers.dist_cycle_sampler_multi_dataset.DistributedCycleMultiDatasetSampler(dataset:ComposeDataset, batchsize_list: List[int], num_replicas: int | None = None, rank: int | None = None, shuffle: bool = True, seed: int = 0)

In one epoch period, do cyclic sampling on the dataset according to iter_time.

  • Parameters:
    • dataset – compose dataset
    • num_replicas – same as DistributedSampler
    • rank – Same as DistributedSampler
    • shuffle – if shuffle data
    • seed – random seed

class hat.data.samplers.dist_group_sampler.DistributedGroupSampler(dataset, samples_per_gpu: int = 1, num_replicas: int | None = None, rank: int | None = None, seed: int = 0, shuffle: bool = True)

Sampler that restricts data loading to a subset of the dataset.

Each batch data indices are sampled from one group in all of the groups. Groups are organized according to the dataset flags.

NOTE

Dataset is assumed to be constant size and must has flag attribute. Different number in flag array represent different groups. for example, in aspect ratio group flag, there are two groups, in which 0 represent h/w >= 1 and 1 represent h/w < 1 group. Dataset flag must is numpy array instance, the dtype must is np.uint8 and length at axis 0 must equal to the dataset length.

  • Parameters:
    • dataset – Dataset used for sampling.
    • samples_per_gpu – Number samplers for each gpu. Default is 1.
    • num_replicas – Number of processes participating in distributed training.
    • rank – Rank of the current process within num_replicas.
    • seed – random seed used in torch.Generator(). This number should be identical across all processes in the distributed group. Default: 0.

set_epoch(epoch)

Sets the epoch for this sampler. When shuffle=True, this ensures all replicas use a different random ordering for each epoch. Otherwise, the next iteration of this sampler will yield the same ordering.

  • Parameters: epoch (int) – Epoch number.

class hat.data.samplers.dist_sampler.DistSamplerHook(dataset, num_replicas: int | None = None, rank: int | None = None, shuffle: bool = True, seed: int = 0, drop_last: bool = False)

The hook api for torch.utils.data.DistributedDampler. Used to get local rank and num_replicas before create DistributedSampler.

  • Parameters:
    • dataset – compose dataset
    • num_replicas – same as DistributedSampler
    • rank – Same as DistributedSampler
    • shuffle – if shuffle data
    • seed – random seed

class hat.data.samplers.dist_set_epoch_dataset_sampler.DistSetEpochDatasetSampler(dataset, num_replicas: int | None = None, rank: int | None = None, shuffle: bool = True, seed: int = 0, drop_last: bool = False)

Distributed sampler that supports set epoch in dataset.

  • Parameters:
    • dataset – compose dataset
    • num_replicas – same as DistributedSampler
    • rank – Same as DistributedSampler
    • shuffle – if shuffle data
    • seed – random seed

set_epoch(epoch: int)

Sets the epoch for this sampler. When shuffle=True, this ensures all replicas use a different random ordering for each epoch. Otherwise, the next iteration of this sampler will yield the same ordering.

  • Parameters: epoch (int) – Epoch number.

class hat.data.samplers.dist_stream_sampler.DistStreamBatchSampler(dataset, batch_size=1, seed=0, skip_prob=0.5, max_skip_num=1, sequence_flip_prob=0.1, keep_consistent_seq_aug=True)

class hat.data.samplers.selected_sampler.SelectedSampler(indices_function: Callable, dataset: Dataset, *, num_replicas: int | None = None, rank: int | None = None, shuffle: bool = True, seed: int = 0, drop_last: bool = False)

Distributed sampler that supports user-defined indices.

  • Parameters:
    • indices_function – Callback function given by user. Input are dataset and return a indices list.
    • dataset – Dataset used for sampling.
    • num_replicas – Number of processes participating in distributed training. By default, world_size is retrieved from the current distributed group.
    • rank – Rank of the current process in num_replicas. By default, rank is retrieved from the current distributed group.
    • shuffle – If True (default), sampler will shuffle the indices.
    • seed – random seed used to shuffle the sampler if shuffle=True. This number should be identical across all processes in the distributed group. Default: 0.
    • drop_last – if True, then the sampler will drop the tail of the data to make it evenly divisible across the number of replicas. If False, the sampler will add extra indices to make the data evenly divisible across the replicas. Default: False.

WARNING

In distributed mode, calling the set_epoch() method at the beginning of each epoch before creating the DataLoader iterator is necessary to make shuffling work properly across multiple epochs. Otherwise, the same ordering will be always used.

set_epoch(epoch: int)

Sets the epoch for this sampler. When shuffle=True, this ensures all replicas use a different random ordering for each epoch. Otherwise, the next iteration of this sampler will yield the same ordering.

  • Parameters: epoch (int) – Epoch number.

class hat.data.transforms.common.AddKeys(kv: Dict[str, Any])

Add new key-value in input dict.

Frequently used when you want to add dummy keys to data dict but don’t want to change code.

  • Parameters: kv – key-value data dict.

class hat.data.transforms.common.BgrToYuv444(affect_key: str = 'img', rgb_input: bool = False)

BgrToYuv444 is used for color format convert.

NOTE

Affected keys: ‘img’.

  • Parameters: rgb_input (bool) – The input is rgb input or not.

class hat.data.transforms.common.BgrToYuv444V2(rgb_input: bool = False, swing: str = 'full')

BgrToYuv444V2 is used for color format convert.

BgrToYuv444V2 implements by calling rgb2centered_yuv functions which has been verified to get the basically same YUV output on J5.

NOTE

Affected keys: ‘img’.

  • Parameters:
    • rgb_input – The input is rgb input or not.
    • swing – “studio” for YUV studio swing (Y: -112107, U, V: -112112). “full” for YUV full swing (Y, U, V: -128~127). default is “full”

class hat.data.transforms.common.ConvertDataType(convert_map: Dict | None = None)

Convert data type.

  • Parameters: convert_map – The mapping dict for to be converted data name and type. Only for np.ndarray and torch.Tensor.

class hat.data.transforms.common.CopyKeys(keys: List[str], split: str = '|')

Copy new key in input dict.

Frequently used when you want to cache keys to data dict but don’t want to change code.

  • Parameters: kv – key-value data dict.

class hat.data.transforms.common.DeleteKeys(keys: List[str])

Delete keys in input dict.

  • Parameters: keys – key list to detele

class hat.data.transforms.common.ListToDict(keys: List[str])

Convert list args to dict.

  • Parameters: keys – keys for each object in args.

class hat.data.transforms.common.MultiTaskAnnoWrapper(sub_transforms: Dict[str, Any], unikeys: Tuple[str] = (), repkeys: Tuple[str] = ())

Wrapper for multi-task anno generating.

  • Parameters:
    • sub_transforms – The mapping dict for task-wise transforms.
    • unikeys – Keys of unique annotations in each task.
    • repkeys – Keys of repeated annotations for all tasks.

class hat.data.transforms.common.PILToNumpy

Convert PIL Image to Numpy.

class hat.data.transforms.common.PILToTensor

Convert PIL Image to Tensor.

class hat.data.transforms.common.RandomSelectOne(transforms: List, p: float = 0.5, p_trans: List | None = None)

Select one of transforms to apply.

  • Parameters:
    • transforms – list of transformations to compose.
    • p – probability of applying selected transform. Default: 0.5.
    • p_trans – list of possibility of transformations.

class hat.data.transforms.common.RenameKeys(keys: List[str], split: str = '|')

Rename keys in input dict.

  • Parameters: keys – key list to rename, in “old_name | new_name” format.

class hat.data.transforms.common.RepeatKeys(keys: List[str], repeat_times: int)

Repeat keys in input dict.

  • Parameters:
    • keys – key list to repeat.
    • repeat_times – keys repeat times.

class hat.data.transforms.common.TaskFilterTransform(task_name: str, transform: Callable)

Apply transform on assign task.

  • Parameters: task_name (str) – Assign task name.

class hat.data.transforms.common.TensorToNumpy

Convert tensor to numpy.

class hat.data.transforms.common.ToCUDA(device: int | None = None)

Move Tensor to cuda device.

  • Parameters: device (int , optional) – The destination GPU device idx. Defaults to the current CUDA device.

class hat.data.transforms.common.Undistortion

Convert a PIL Image or numpy.ndarray to : undistor PIL Image or numpy.ndarray.

class hat.data.transforms.classification.ConvertLayout(hwc2chw: bool = True, keys: List | None = None)

ConvertLayout is used for layout convert.

NOTE

Affected keys: ‘img’.

  • Parameters:
    • hwc2chw (bool) – Whether to convert hwc to chw.
    • keys (list)

class hat.data.transforms.classification.LabelSmooth(num_classes: int, eta: float = 0.1)

LabelSmooth is used for label smooth.

NOTE

Affected keys: ‘labels’.

  • Parameters:
    • num_classes (int) – Num classes.
    • eta (float) – Eta of label smooth.

class hat.data.transforms.classification.OneHot(num_classes: int)

OneHot is used for convert layer to one-hot format.

NOTE

Affected keys: ‘labels’.

  • Parameters: num_classes (int) – Num classes.

class hat.data.transforms.classification.TimmMixup(*args, **kwargs)

Mixup of timm.

NOTE

Affected keys: ‘img’, ‘labels’.

  • Parameters: timm.data.Mixup (args are the same as)

class hat.data.transforms.classification.TimmTransforms(*args, **kwargs)

Transforms of timm.

NOTE

Affected keys: ‘img’.

  • Parameters: timm.data.create_transform (args are the same as)

class hat.data.transforms.detection.AlbuImageOnlyTransform(albu_params: List[Dict])

AlbuImageOnlyTransform used on img only.

Composed by list of albu ImageOnlyTransform.

  • Parameters: albu_params – List of albu iamge only transform.

Examples:

dict( type="AlbuImageOnlyTransform", albu_params=[ dict( name="RandomBrightnessContrast", p=0.3, ), dict( name="GaussNoise", var_limit=50.0, p=0.5, ), dict( name="Blur", p=0.2, blur_limit=(3, 15), ), dict( name="ToGray", p=0.2, ), ], )

check_transform(transform)

Check transform is ImageOnlyTransform.

only support ImageOnlyTransform till now.

class hat.data.transforms.detection.AugmentHSV(hgain: float = 0.5, sgain: float = 0.5, vgain: float = 0.5, p: float = 1.0)

Random add color disturbance.

Convert RGB img to HSV, and then randomly change the hue, saturation and value.

NOTE

Affected keys: ‘img’.

  • Parameters:
    • hgain (float) – Gain of hue.
    • sgain (float) – Gain of saturation.
    • vgain (float) – Gain of value.
    • p (float) – Prob.

class hat.data.transforms.detection.BoxJitter(exp_ratio: float = 1.0, exp_jitter: float = 0.0, center_shift: float = 0.0)

Jitter box to simulate the box predicted by the model.

Usually used in tasks that use ground truth boxes for training.

  • Parameters:
    • exp_ratio – Ratio of the expansion of box. Defaults to 1.0.
    • exp_jitter – Jitter of expansion ratio . Defaults to 0.0.
    • center_shift – Box center shift range. Defaults to 0.0.

class hat.data.transforms.detection.ColorJitter(brightness: float | Tuple[float] = 0.5, contrast: float | Tuple[float] = (0.5, 1.5), saturation: float | Tuple[float] = (0.5, 1.5), hue: float = 0.1)

Randomly change the brightness, contrast, saturation and hue of an image.

For det and dict input are the main differences with ColorJitter in torchvision and the default settings have been changed to the most common settings.

NOTE

Affected keys: ‘img’.

  • Parameters:
    • brightness (float or tuple of float *(*min , max )) – How much to jitter brightness.
    • contrast (float or tuple of float *(*min , max )) – How much to jitter contrast.
    • saturation (float or tuple of float *(*min , max )) – How much to jitter saturation.
    • hue (float or tuple of float *(*min , max )) – How much to jitter hue.

class hat.data.transforms.detection.DetAffineAugTransformer(target_wh, flip_prob, scale_type='W', inter_method=10, use_pyramid=True, pyramid_min_step=0.7, pyramid_max_step=0.8, pixel_center_aligned=True, center_aligned=False, rand_scale_range=(1.0, 1.0), rand_translation_ratio=0.0, rand_aspect_ratio=0.0, rand_rotation_angle=0.0, norm_wh=None, norm_scale=None, resize_wh: Tuple[int, int] | List[int] | None = None, min_valid_area=8, min_valid_clip_area_ratio=0.5, min_edge_size=2, clip_bbox=True, keep_aspect_ratio=False, complete_boxes: bool = False)

Affine augmentation for object detection.

  • Parameters:
    • resize_wh – Resize input image to target size, by default None
    • complete_boxes – Using the uncliped boxes, by default False.
    • **kwargs – Please see get_affine_image_resize() and ImageAffineTransform

class hat.data.transforms.detection.DetMosaic(img_scale: Tuple[int, int] = (640, 640), center_ratio_range: Tuple[float, float] = (0.5, 1.5), bbox_clip_border: bool = True, pad_val: float = 114.0, p: float = 1.0, use_cached: bool = True, max_cached_images: int = 40, random_pop: bool = True, max_refetch: int = 15)

Mosaic augmentation for detection task.

  • Parameters:
    • img_scale – Image size after mosaic pipeline of a single image. The size of the output image is four times that of a single image. The output image comprises 4 single images. Default: (640, 640).
    • center_ratio_range – Center ratio range of mosaic output. Defaults to (0.5, 1.5).
    • bbox_clip_border – Whether to clip the objects outside the border of the image. In some dataset like MOT17, the gt bboxes are allowed to cross the border of images. Therefore, we don’t need to clip the gt bboxes in these cases. Defaults to True.
    • pad_val – Pad value. Defaults to 114.
    • p – Probability of applying this transformation. Defaults to 1.0.
    • use_cached – Whether to use cache. Defaults to False.
    • max_cached_images – The maximum length of the cache. The larger the cache, the stronger the randomness of this transform. As a rule of thumb, providing 10 caches for each image suffices for randomness. Defaults to 40.
    • random_pop – Whether to randomly pop a result from the cache when the cache is full. If set to False, use FIFO popping method. Defaults to True.
    • max_refetch – The maximum number of retry iterations for getting valid results from the pipeline. If the number of iterations is greater than max_refetch, but results is still None, then the iteration is terminated and raise the error. Defaults to 15.

get_indexes(dataset)

Create indexes of selected images in dataset.

mix_img_transform(data)

Do data transform.

class hat.data.transforms.detection.DetYOLOv5MixUp(alpha: float = 32.0, beta: float = 32.0, p: float = 1.0, use_cached: bool = True, max_cached_images: int = 20, random_pop: bool = True, max_refetch: int = 15)

MixUp augmentation.

  • Parameters:
    • alpha – parameter of beta distribution to get mixup ratio. Defaults to 32.
    • beta – parameter of beta distribution to get mixup ratio. Defaults to 32.
    • p – Probability of applying this transformation. Defaults to 1.0.
    • use_cached – Whether to use cache. Defaults to False.
    • max_cached_images – The maximum length of the cache. The larger the cache, the stronger the randomness of this transform. As a rule of thumb, providing 10 caches for each image suffices for randomness. Defaults to 20.
    • random_pop – Whether to randomly pop a result from the cache when the cache is full. If set to False, use FIFO popping method. Defaults to True.
    • max_refetch – The maximum number of iterations. If the number of iterations is greater than max_refetch, but gt_bbox is still empty, then the iteration is terminated. Defaults to 15.

get_indexes(dataset)

Create indexes of selected images in dataset.

mix_img_transform(data)

Do data transform.

class hat.data.transforms.detection.FixedCrop(size: Tuple[int] | None = None, min_area: int = -1, min_iou: int = -1, dynamic_roi_params: Dict | None = None, discriminate_ignore_classes: bool | None = False, allow_smaller: bool = False)

Crop image with fixed position and size.

NOTE

Affected keys: ‘img’, ‘img_shape’, ‘pad_shape’, ‘layout’, ‘before_crop_shape’, ‘crop_offset’, ‘gt_bboxes’, ‘gt_classes’.

inverse_transform(inputs: Tensor, task_type: str, inverse_info: Dict)

Inverse option of transform to map the prediction to the original image.

  • Parameters:
    • inputs (array) – Prediction
    • task_type (str) – detection or segmentation.
    • inverse_info (dict) – The transform keyword is the key, and the corresponding value is the value.

class hat.data.transforms.detection.HueSaturationValue(hue_range: Tuple[float, float] = (-20, 20), sat_range: Tuple[float, float] = (-30, 30), val_range: Tuple[float, float] = (-20, 20), p: float = 0.5)

Randomly change hue, saturation and value of the input image.

Used for unit8 np.ndarray, RGB image input. Unlike AugmentHSV, this transform uses addition to shift value. This transform is same as albumentations.augmentations.transforms.HueSaturationValue

  • Parameters:
    • hue_range – range for changing hue. Default: (-20, 20).
    • sat_range – range for changing saturation. Default: (-30, 30).
    • val_range – range for changing value. Default: (-20, 20).
    • p – probability of applying the transform. Default: 0.5.

class hat.data.transforms.detection.IterableDetRoIListTransform(target_wh, flip_prob, img_scale_range=(0.5, 2.0), roi_scale_range=(0.8, 1.25), min_sample_num=1, max_sample_num=1, center_aligned=True, inter_method=10, use_pyramid=True, pyramid_min_step=0.7, pyramid_max_step=0.8, pixel_center_aligned=True, min_valid_area=8, min_valid_clip_area_ratio=0.5, min_edge_size=2, rand_translation_ratio=0, rand_aspect_ratio=0, rand_rotation_angle=0, reselect_ratio=0, clip_bbox=True, rand_sampling_bbox=True, resize_wh=None, keep_aspect_ratio=False, roi_list=None, append_gt=False, complete_boxes=False)

Iterable transformer base on roi list for object detection.

  • Parameters:
    • resize_wh (list/tuple of 2 int , optional) – Resize input image to target size, by default None
    • roi_list (ndarray , optional) – Transform the specified image region
    • append_gt (bool , optional) – Append the groundtruth to roi_list
    • complete_boxes (bool , optional) – Using the uncliped boxes, by default False.
    • **kwargs – Please see AffineMatFromROIBoxGenerator and ImageAffineTransform

class hat.data.transforms.detection.IterableDetRoITransform(target_wh, flip_prob, img_scale_range=(0.5, 2.0), roi_scale_range=(0.8, 1.25), min_sample_num=1, max_sample_num=1, center_aligned=True, inter_method=10, use_pyramid=True, pyramid_min_step=0.7, pyramid_max_step=0.8, pixel_center_aligned=True, min_valid_area=8, min_valid_clip_area_ratio=0.5, min_edge_size=2, rand_translation_ratio=0, rand_aspect_ratio=0, rand_rotation_angle=0, reselect_ratio=0, clip_bbox=True, rand_sampling_bbox=True, resize_wh=None, keep_aspect_ratio=False, complete_boxes=False)

Iterable transformer base on rois for object detection.

  • Parameters:
    • resize_wh (list/tuple of 2 int , optional) – Resize input image to target size, by default None
    • complete_boxes (bool , optional) – Using the uncliped boxes, by default False.
    • **kwargs – Please see AffineMatFromROIBoxGenerator and ImageAffineTransform

class hat.data.transforms.detection.MeanBlur(ksize: int = 3, p: float = 0.5)

Apply mean blur to the input image using a fix-sized kernel.

Used for np.ndarray.

  • Parameters:
    • ksize – maximum kernel size for blurring the input image. Default: 3.
    • p – probability of applying the transform. Default: 0.5.

class hat.data.transforms.detection.MedianBlur(ksize: int = 3, p: float = 0.5)

Apply median blur to the input image using a fix-sized kernel.

Used for np.ndarray.

  • Parameters:
    • ksize – maximum kernel size for blurring the input image. Default: 3.
    • p – probability of applying the transform. Default: 0.5.

class hat.data.transforms.detection.MinIoURandomCrop(min_ious: Tuple[float] = (0.1, 0.3, 0.5, 0.7, 0.9), min_crop_size: float = 0.3, bbox_clip_border: bool = True, repeat_num: int = 50)

Random crop the image & bboxes, the cropped patches have minimum IoU requirement with original image & bboxes, the IoU threshold is randomly selected from min_ious.

NOTE

Affected keys: ‘img’, ‘gt_bboxes’, ‘gt_classes’, ‘gt_difficult’.

  • Parameters:
    • min_ious (tuple) – minimum IoU threshold for all intersections with
    • boxes (bounding)
    • min_crop_size (float) – minimum crop’s size (i.e. h,w := a*h, a*w,
    • min_crop_size**)****.** (where a >=)
    • bbox_clip_border (bool) – Whether clip the objects outside the border of the image. Defaults to True.
    • repeat_num (float) – Max repeat num for finding avaiable bbox.

class hat.data.transforms.detection.Mosaic(image_size: int = 512, degrees: int = 10, translate: float = 0.1, scale: float = 0.1, shear: int = 10, perspective: float = 0.0, mixup: bool = True)

Mosaic augmentation for detection task.

  • Parameters:
    • image_size – Image size after mosaic pipeline. Default: (512, 512).
    • degrees – Rotation degree. Defaults to 10.
    • translate – translate value for warpPerspective. Defaults to 0.1.
    • scale – Random scale value. Defaults to 0.1.
    • shear – Shear value for warpPerspective. Defaults to 10.
    • perspective – perspective value for warpPerspective. Defaults to 0.0.
    • mixup – Whether use mixup. Defaults to True.

class hat.data.transforms.detection.Normalize(mean: float | Sequence[float], std: float | Sequence[float], raw_norm: bool = False, split_transform: bool = False)

Normalize image.

NOTE

Affected keys: ‘img’, ‘layout’.

  • Parameters:
    • mean – mean of normalize.
    • std – std of normalize.
    • raw_norm (bool) – Whether to open raw_norm.

class hat.data.transforms.detection.PadTensorListToBatch(pad_val: int = 0, seg_pad_val: int | None = 255)

List of image tensor to be stacked vertically.

Used for diff shape tensors list.

  • Parameters:
    • pad_val – Values to be filled in padding areas for img. Default to 0.
    • seg_pad_val – Value to be filled in padding areas for gt_seg. Default to 255.

class hat.data.transforms.detection.PlainCopyPaste(min_ins_num: int = 1, cp_prob: float = 0.0)

Copy and paste instances plainly.

  • Parameters:
    • min_ins_num – Min instances num of the image after paste.
    • cp_prob – Probability of applying this transformation.

class hat.data.transforms.detection.PresetCrop(crop_top: int = 220, crop_bottom: int = 128, crop_left: int = 0, crop_right: int = 0, min_area: float = -1, min_iou: float = -1, truncate_gt: bool = True)

Crop image with preset roi param.

inverse_transform(inputs: Tensor, task_type: str, inverse_info: Dict)

Inverse option of transform to map the prediction to the original image.

  • Parameters:
    • inputs – Prediction
    • task_type – detection or segmentation.
    • inverse_info – not used yet.

class hat.data.transforms.detection.RGBShift(r_shift_limit: Tuple[float, float] = (-20, 20), g_shift_limit: Tuple[float, float] = (-20, 20), b_shift_limit: Tuple[float, float] = (-20, 20), p: float = 0.5)

Randomly shift values for each channel of the input image.

Used for np.ndarray. This transform is same as albumentations.augmentations.transforms.RGBShift.

  • Parameters:
    • r_shift_limit – range for changing values for the red channel. Default: (-20, 20).
    • g_shift_limit – range for changing values for the green channel. Default: (-20, 20).
    • b_shift_limit – range for changing values for the blue channel. Default: (-20, 20).
    • p – probability of applying the transform. Default: 0.5.

class hat.data.transforms.detection.RandomBrightnessContrast(brightness_limit: Tuple[float, float] = (-0.2, 0.2), contrast_limit: Tuple[float, float] = (-0.2, 0.2), brightness_by_max: bool = True, p=0.5)

Randomly change brightness and contrast of the input image.

Used for unit8 np.ndarray. This transform is same as albumentations.augmentations.transforms.RandomBrightnessContrast.

  • Parameters:
    • brightness_limit – factor range for changing brightness. Default: (-0.2, 0.2).
    • contrast_limit – factor range for changing contrast. Default: (-0.2, 0.2).
    • brightness_by_max – If True adjust contrast by image dtype maximum, else adjust contrast by image mean.
    • p – probability of applying the transform. Default: 0.5.

class hat.data.transforms.detection.RandomExpand(mean: Tuple = (0, 0, 0), ratio_range: Tuple = (1, 4), prob: float = 0.5)

Random expand the image & bboxes.

Randomly place the original image on a canvas of ‘ratio’ x original image size filled with mean values. The ratio is in the range of ratio_range.

NOTE

Affected keys: ‘img’, ‘gt_bboxes’.

  • Parameters:
    • ratio_range (tuple) – range of expand ratio.
    • prob (float) – probability of applying this transformation

class hat.data.transforms.detection.RandomFlip(px: float | None = 0.5, py: float | None = 0)

Flip image & bbox & mask & seg & flow.

NOTE

Affected keys: ‘img’, ‘ori_img’, ‘img_shape’, ‘pad_shape’, ‘gt_bboxes’, ‘gt_tanalphas’, ‘gt_seg’, ‘gt_flow’, ‘gt_mask’, ‘gt_ldmk’, ‘ldmk_pairs’.

  • Parameters:
    • px – Horizontal flip probability, range between [0, 1].
    • py – Vertical flip probability, range between [0, 1].

class hat.data.transforms.detection.RandomResizedCrop(height: int, width: int, scale: Tuple[float, float] = (0.08, 1.0), ratio: Tuple[float, float] = (0.75, 1.3333333333333333), interpolation: int = 1, p: float = 1.0)

Torchvision’s variant of crop a random part of the input, and rescale it to some size.

Used for np.ndarray. This transform is same as albumentations.augmentations.transforms.RandomResizedCrop.

  • Parameters:
    • height – height after crop and resize.
    • width – width after crop and resize.
    • scale – range of size of the origin size cropped
    • ratio – range of aspect ratio of the origin aspect ratio cropped.
    • interpolation – flag that is used to specify the interpolation algorithm. Should be one of: cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4. Default: cv2.INTER_LINEAR.
    • p – probability of applying the transform. Default: 1.

class hat.data.transforms.detection.Resize(img_scale: Sequence[int] | Sequence[Sequence[int]] | None = None, max_scale: Sequence[int] | Sequence[Sequence[int]] | None = None, multiscale_mode: str = 'range', ratio_range: Tuple[float, float] | None = None, keep_ratio: bool = True, pad_to_keep_ratio: bool = False, raw_scaler_enable: bool = False, sample1c_enable: bool = False, divisor: int = 1, rm_neg_coords: bool = True, split_transform: bool = False, split_trans_w: int = 256, split_trans_h: int = 256)

Resize image & bbox & mask & seg.

NOTE

Affected keys: ‘img’, ‘ori_img’, ‘img_shape’, ‘pad_shape’, ‘resized_shape’, ‘pad_shape’, ‘scale_factor’, ‘gt_bboxes’, ‘gt_seg’, ‘gt_ldmk’.

  • Parameters:
    • img_scale – See above.
    • max_scale – The max size of image. If the image’s shape > max_scale, The image is resized to max_scale
    • multiscale_mode – Value must be one of “max_size”, “range” or “value”. This transform resizes the input image and bbox to same scale factor. There are 3 multiscale modes: ‘ratio_range’ is not None: randomly sample a ratio from the ratio range and multiply with the image scale. e.g. Resize(img_scale=(400, 500)), multiscale_mode=’range’, ratio_range=(0.5, 2.0) ‘ratio_range’ is None and ‘multiscale_mode’ == “range”: randomly sample a scale from a range, the length of img_scale[tuple] must be 2, which represent small img_scale and large img_scale. e.g. Resize(img_scale=((100, 200), (400,500)), multiscale_mode=’range’) ‘ratio_range’ is None and ‘multiscale_mode’ == “value”: randomly sample a scale from multiple scales. e.g. Resize(img_scale=((100, 200), (300, 400), (400, 500)), multiscale_mode=’value’)))
    • ratio_range – Scale factor range like (min_ratio, max_ratio).
    • keep_ratio – Whether to keep the aspect ratio when resizing the image.
    • pad_to_keep_ratio – Whether to pad image to keep the same shape and aspect ratio when resizing the image to target shape.
    • raw_scaler_enable – Whether to enable raw scaler when resize the image.
    • sample1c_enable – Whether to sample one channel after resize the image.
    • divisor – Width and height are rounded to multiples of divisor.
    • rm_neg_coords – Whether to rm negative coordinates.

inverse_transform(inputs: ndarray | Tensor, task_type: str, inverse_info: Dict)

Inverse option of transform to map the prediction to the original image.

  • Parameters:
    • inputs – Prediction.
    • task_type (str) – detection or segmentation.
    • inverse_info (dict) – The transform keyword is the key, and the corresponding value is the value.

class hat.data.transforms.detection.Resize3D(img_scale=None, multiscale_mode='range', ratio_range=None, keep_ratio=True, bbox_clip_border=True, backend='cv2', interpolation='nearest', override=False, cam2img_keep_ratio=False)

Resize 3D labels.

Different from 2D Resize, we accept img_scale=None and ratio_range is not None. In that case we will take the input img scale as the ori_scale for rescaling with ratio_range.

  • Parameters:
    • img_scale – Images scales for resizing.
    • multiscale_mode – Either “range” or “value”.
    • ratio_range – (min_ratio, max_ratio).
    • keep_ratio – Whether to keep the aspect ratio when resizing the image.
    • bbox_clip_border – Whether to clip the objects outside the border of the image.
    • backend (str) – Image resize backend, choices are ‘cv2’ and ‘pillow’.
    • interpolation (str) – Interpolation method, accepted values are “nearest”, “bilinear”, “bicubic”, “area”, “lanczos” for ‘cv2’ backend, “nearest”, “bilinear” for ‘pillow’ backend.
    • override (bool , optional) – Whether to override scale and scale_factor so as to call resize twice.

class hat.data.transforms.detection.ShiftScaleRotate(shift_limit: Tuple[float, float] = (-0.0625, 0.0625), scale_limit: Tuple[float, float] = (-0.1, 0.1), rotate_limit: Tuple[float, float] = (-45.0, 45.0), interpolation: int = 1, border_mode: int = 4, value: int | None = None, p: float = 0.5)

Randomly apply affine transforms: translate, scale and rotate the input.

Used for np.ndarray hwc img. This transform is same as albumentations.augmentations.transforms.ShiftScaleRotate.

  • Parameters:
    • shift_limit – shift factor range for both height and width. Absolute values for lower and upper bounds should lie in range [0, 1]. Default: (-0.0625, 0.0625).
    • scale_limit – scaling factor range. Default: (-0.1, 0.1).
    • rotate_limit – rotation range. Default: (-45, 45).
    • interpolation – flag that is used to specify the interpolation algorithm. Should be one of: cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4. Default: cv2.INTER_LINEAR.
    • border_mode – flag that is used to specify the pixel extrapolation method. Should be one of: cv2.BORDER_CONSTANT, cv2.BORDER_REPLICATE, cv2.BORDER_REFLECT, cv2.BORDER_WRAP, cv2.BORDER_REFLECT_101. Default: cv2.BORDER_REFLECT_101
    • value – padding value if border_mode is cv2.BORDER_CONSTANT.
    • p – probability of applying the transform. Default: 0.5.

class hat.data.transforms.detection.ToFasterRCNNData(max_gt_boxes_num: int = 500, max_ig_regions_num: int = 500)

Prepare faster-rcnn input data.

Convert gt_bboxes (n, 4) & gt_classes (n, ) to gt_boxes (n, 5), gt_boxes_num (1, ), ig_regions (m, 5), ig_regions_num (m, ); If gt_ids exists, it will be concated into gt_boxes, resulting in gt_boxes array shape expanding from nx5 to nx6.

Convert key img_shape to im_hw; Convert image Layout to chw;

  • Parameters:
    • max_gt_boxes_num (int) – Max gt bboxes number in one image, Default 500.
    • max_ig_regions_num (int) – Max ignore regions number in one image, Default 500.
  • Returns: Result dict with : gt_boxes (max_gt_boxes_num, 5 or 6), gt_boxes_num (1, ), ig_regions (max_ig_regions_num, 5 or 6), ig_regions_num (1, ), im_hw (2,) layout convert to “chw”.
  • Return type: dict

class hat.data.transforms.detection.ToLdmkRCNNData(num_ldmk=15, max_gt_boxes_num=1000, max_ig_regions_num=1000)

Transform dataset to RCNN input need.

This class is used to stack landmark with boxes, and typically used to facilitate landmark and boxes matching in anchor-based model.

  • Parameters:
    • num_ldmk – Number of landmark. Defaults to 15.
    • max_gt_boxes_num – Max gt bboxes number in one image. Defaults to 1000.
    • max_ig_regions_num – Max ignore regions number in one image. Defaults to 1000.

class hat.data.transforms.detection.ToMultiTaskFasterRCNNData(taskname_clsidx_map: Dict[str, int], max_gt_boxes_num: int = 500, max_ig_regions_num: int = 500, num_ldmk: int = 15)

Convert multi-classes detection data to multi-task data.

Each class will be convert to a detection task.

  • Parameters:
    • taskname_clsidx_map – {cls1: cls_idx1, cls2: cls_idx2}.
    • max_gt_boxes_num – Same as ToFasterRCNNData. Defaults to 500.
    • max_ig_regions_num – Same as ToFasterRCNNData. Defaults to 500.
    • num_ldmk – Number of human ldmk. Defaults to 15.
  • Returns: Result dict with : ”task1”: FasterRCNNDataDict1, “task2”: FasterRCNNDataDict2,
  • Return type: dict

class hat.data.transforms.detection.ToPositionFasterRCNNData(max_gt_boxes_num: int = 500, max_ig_regions_num: int = 500)

Transform person potion dataset to RCNN input need.

This class is used to stack position label with boxes and camera type, and typically used to facilitate position label and boxes matching in anchor-based model.

class hat.data.transforms.detection.ToTensor(to_yuv: bool = False, use_yuv_v2: bool = True, split_transform: bool = False)

Convert objects of various python types to torch.Tensor and convert the img to yuv444 format if to_yuv is True.

Supported types are: numpy.ndarray, torch.Tensor, Sequence, int, float.

NOTE

Affected keys: ‘img’, ‘img_shape’, ‘pad_shape’, ‘layout’, ‘gt_bboxes’, ‘gt_seg’, ‘gt_seg_weights’, ‘gt_flow’, ‘color_space’.

  • Parameters:
    • to_yuv – If true, convert the img to yuv444 format.
    • use_yuv_v2 – If true, use BgrToYuv444V2 when convert img to yuv format.

class hat.data.transforms.faceid.Contrast(p: float = 0.08, contrast: float = 0.5)

Randomly jitters image contrast with a factor.

NOTE

Affected keys: ‘img’.

  • Parameters:
    • p – prob
    • contrast – How much to jitter contrast.
    • range (The contrast jitter ratio)
    • **[**0
    • 1**]**

class hat.data.transforms.faceid.GaussianBlur(p: float = 0.08, kernel_size_min: int = 2, kernel_size_max: int = 9, sigma_min: float = 0.0, sigma_max: float = 0.0)

Randomly add guass blur on an image.

NOTE

Affected keys: ‘img’.

  • Parameters:
    • p – prob
    • kernel_size_min – min size of guass kernel
    • kernel_size_max – max size of guass kernel
    • sigma_min – min sigma of guass kernel
    • sigma_max – max sigma of guass kernel

class hat.data.transforms.faceid.JPEGCompress(p: float = 0.08, max_quality: int = 95, min_quality: int = 30)

Do JPEG compression to downgrade image quality.

NOTE

Affected keys: ‘img’.

  • Parameters:
    • p – prob
    • max_quality – (0, 100] JPEG compression highest quality
    • min_quality – (0, 100] JPEG compression lowest quality

class hat.data.transforms.faceid.MotionBlur(p: float = 0.08, length_min: int = 9, length_max: int = 18, angle_min: float = 1, angle_max: float = 359)

Randomly add motion blur on an image.

NOTE

Affected keys: ‘img’.

  • Parameters:
    • p – prob
    • length_min – min size of motion blur
    • length_max – max size of motion blur
    • angle_min – min angle of motion blur
    • angle_max – max angle of motion blur

class hat.data.transforms.faceid.RandomDownSample(p: float = 0.2, data_shape: Tuple | None = (3, 112, 112), min_downsample_width: int = 60, inter_method: int = 1)

First downsample and upsample to original size.

NOTE

Affected keys: ‘img’.

  • Parameters:
    • p – prob
    • data_shape – C, H, W
    • min_downsample_width – minimum downsample width
    • inter_method – interpolation method index

class hat.data.transforms.faceid.RandomGray(p: float = 0.08, rgb_data: bool = True, only_one_channel: bool = False)

Transform RGB or BGR format into Gray format.

NOTE

Affected keys: ‘img’.

  • Parameters:
    • p – prob
    • rgb_data – Default=True Whether the input data is in RGB format. If not, it should be in BGR format.
    • only_one_channel – If ture, the returned gray image contains only one channel. Default to False.

class hat.data.transforms.faceid.SpatialVariantBrightness(p: float = 0.08, brightness: float = 0.6, max_template_type: int = 3, online_template: bool = False)

Spatial variant brightness, Enhanced Edition. Powered by xin.wang@horizon.ai.

NOTE

Affected keys: ‘img’.

  • Parameters:
    • p – prob
    • brightness – default is 0.6 Brightness ratio for this augmentation, the value choice in Uniform ~ [-brightness, brigheness].
    • max_template_type – default is 3 Max number of template type in once process. Note, the selection process is repeated.
    • online_template – default is False Template generated online or offline. “False” is recommended to get fast speed.

class hat.data.transforms.flashocc_transforms.BevFeatureAug(bda_aug_conf: Dict, is_train: bool = True)

Augment bev feature.

Args: bda_aug_conf: a dict including augmentation transform.

ex. bda_aug_conf = dict( : rot_lim=(-0.0, 0.0), scale_lim=(1.0, 1.0), flip_dx_ratio=0.5, flip_dy_ratio=0.5, )

rot_lim: Random rotation angle range. scale_lim: The range of random scaling, in [0-1]. flip_dx_ratio: Probability for horizontal. flip_dy_ratio: Probability for vertical.

sample_bda_augmentation()

Generate bda augmentation values based on bda_config.

class hat.data.transforms.flashocc_transforms.ImageAugmentation(data_config: dict, is_train: bool = False)

Augment PIL Images according to the given data_config.

  • Parameters:
    • is_train – if it is for training. default False.
    • data_config – Dictionary containing data augmentation transformations, such as resize, crop, flip, etc .

ego2img_add_post(ego2img, post_tran, post_rot)

Update image enhancement transformation to ego2img matrix.

  • Parameters:
    • ego2img – (4, 4)
    • post_tran – (3,)
    • post_rot – (3,3)
  • Returns: (4, 4)
  • Return type: ego2img

img_augs_transform(data: Dict, flip: bool | None = None, scale: float | None = None)

Img augmentation transform.

  • Parameters:
    • data
    • flip
    • scale
  • Returns: (N_views, 3, H, W) N_views = 6 * (N_history + 1) sensor2egos: (N_views, 4, 4) ego2globals: (N_views, 4, 4) intrins: (N_views, 3, 3) post_rots: (N_views, 3, 3) post_trans: (N_views, 3) ego2img: (N_views, 4, 4)
  • Return type: imgs

img_transform(img, post_rot, post_tran, resize, resize_dims, crop, flip, rotate)

Image transform.

  • Parameters:
    • img – PIL.Image
    • post_rot – torch.eye(2)
    • post_tran – torch.eye(2)
    • resize – float, resize ratio.
    • resize_dims – Tuple(W, H), size after resize
    • crop – (crop_w, crop_h, crop_w + fW, crop_h + fH)
    • flip – bool
    • rotate – float
  • Returns: PIL.Image post_rot: Tensor (2, 2) post_tran: Tensor (2, )
  • Return type: img

sample_augmentation(H: int, W: int, flip: bool | None = None, scale: float | None = None)

Sample augmentation.

  • Parameters:
    • H
    • W
    • flip
    • scale
  • Returns: resizeratio,float. resize_dims: (resize_W, resize_H) crop: (crop_w, crop_h, crop_w + fW, crop_h + fH) flip: 0 / 1 rotate: Random rotation angle,float
  • Return type: resize

class hat.data.transforms.grid_mask.GridMask(use_h: bool, use_w: bool, rotate: float = 1.0, offset: bool = False, ratio: float = 0.5, limit_d_ratio_min: float = 0.0, limit_d_ratio_max: float = 1.0, mode: int = 0, prob: float = 1.0)

Generate GridMask for grid masking augmentation.

  • Parameters:
    • use_h (bool) – if gen grid for height dim.
    • use_w (bool) – if gen grid for weight dim.
    • rotate (float) – Rotation of grid mesh.
    • offset (bool) – if randomly add offset.
    • ratio (float) – black grid mask ratio.
    • limit_d_ratio_min (float) – min black add white mask ratio.
    • limit_d_ratio_max (float) – max black add white mask ratio.
    • mode (int) – 0 or 1, if use ~mask.
    • prob (float) – probablity of occurance.

class hat.data.transforms.keypoints.AddGaussianNoise(prob: float, mean: float = 0, sigma: float = 2)

Generate gaussian noise on img.

  • Parameters:
    • prob – Prob to generate gaussian noise.
    • mean – Mean of gaussian distribution. Defaults to 0.
    • sigma – Sigma of gaussian distribution. Defaults to 2.

class hat.data.transforms.keypoints.GenerateHeatmapTarget(num_ldmk: int, feat_stride: int, heatmap_shape: Tuple[int], sigma: float)

GenerateHeatmapTarget is a class for generating heatmap targets.

This class generates heatmap targets for a given number of landmarks using a Gaussian distribution.

  • Parameters:
    • num_ldmk – The number of landmarks.
    • feat_stride – The stride of the feature map.
    • heatmap_shape – The shape of the heatmap (height, width).
    • sigma – The standard deviation for the Gaussian kernel.

class hat.data.transforms.keypoints.RandomPadLdmkData(size: Tuple[int], random: bool = True)

RandomPadLdmkData is a class for randomly padding landmark data.

  • Parameters:
    • size – The target size for padding.
    • random – Whether to apply random padding. Defaults to True.

class hat.data.transforms.lidar.BBoxSelector(category2id_map: Dict, vcs_range: Tuple[float, float, float, float], min_points_in_gt: int = 0)

Filter out GT BBoxes.

Support multiframe and multimodal data.

class hat.data.transforms.lidar.DetectionTargetGenerator(feature_stride: int, id2label: Dict, pc_range: Tuple[float, ...], feat_shape: Tuple[int, int], voxel_size: Tuple[float, float, float], to_bev: bool = True, max_objs: int = 500, min_gaussian_overlap: float = 0.1, min_gaussian_radius: float = 2.0, use_gaussian_reg_loss: bool = False)

Create detection training targets.

class hat.data.transforms.lidar.ParsePointCloud(dtype: ~numpy.dtype = <class 'numpy.float32'>, load_dim: int = 4, keep_dim: int = 4)

Parse point cloud from bytes to numpy array.

class hat.data.transforms.lidar.Point2VCS(shuffle_points: bool = False)

Transform pointclouds from lidar CS to VCS.

class hat.data.transforms.lidar.Voxelization(range: Tuple[float, ...], voxel_size: Tuple[float, float, float], max_points_in_voxel: int, max_voxel_num: int = 20000, voxel_key: str = 'voxel', nframe: int = 1)

Perform voxelization for points in multiple frames.

class hat.data.transforms.multi_views.BevFeatureFlip(prob_x: float, prob_y: float, bev_size: Tuple[float, float, float])

Flip bev feature.

  • Parameters:
    • bev_size – Size of bev view.
    • prob_x – Probability for horizontal.
    • prob_y – Probability for vertical.

class hat.data.transforms.multi_views.BevFeatureRotate(bev_size: Tuple[float, float, float], rot: Tuple[float, float] = (-0.3925, 0.3925))

Rotate feat.

  • Parameters:
    • bev_size – Size of bev view.
    • rot – Rotate radian.

class hat.data.transforms.multi_views.MultiViewsGridMask(**kwargs)

For grid masking augmentation.

class hat.data.transforms.multi_views.MultiViewsImgCrop(size: Tuple[int, int], random: bool = False)

Crop PIL Images to the given size and modify intrinsics.

  • Parameters:
    • size – Desired output size. If size is a sequence like (h, w), output size will be matched to this.
    • random – Whether choosing min x randomly.

class hat.data.transforms.multi_views.MultiViewsImgFlip(prob: float = 0.5)

Flip PIL Images and modify intrinsics.

  • Parameters: prob – Probility for flip image.

class hat.data.transforms.multi_views.MultiViewsImgResize(size: Tuple[int, int] | None = None, scales: Tuple[float, float] | None = None, interpolation: str = 'bilinear')

Resize PIL Images to the given size and modify intrinsics.

  • Parameters:
    • size – Desired output size. If size is a sequence like (h, w), output size will be matched to this.
    • scales – Scale for random choosen.
    • interpolation – Desired interpolation. Default is ‘nearest’.

class hat.data.transforms.multi_views.MultiViewsImgRotate(rot: Tuple[float, float])

Rotate PIL Images.

  • Parameters: rot – Rotate angle. print(xmin, xmax) print(xmin, xmax)

class hat.data.transforms.multi_views.MultiViewsImgTransformWrapper(transforms: Sequence[Module])

Wrapper img transform for image inputs.

  • Parameters: trnsforms – List of image transforms.

class hat.data.transforms.multi_views.MultiViewsSpiltImgTransformWrapper(transforms: Sequence[Module], numsplit: int = 3)

Wrapper split img transform for image inputs.

  • Parameters: trnsforms – List of image transforms.

class hat.data.transforms.segmentation.LabelRemap(mapping: Sequence)

Remap labels.

NOTE

Affected keys: ‘gt_seg’.

  • Parameters: mapping (Sequence) – Mapping from input to output.

class hat.data.transforms.segmentation.Scale(scales: Real | Sequence, mode: str = 'nearest', mul_scale: bool = False)

Scale input according to a scale list.

NOTE

Affected keys: ‘img’, ‘gt_flow’, ‘gt_ori_flow’, ‘gt_seg’.

  • Parameters:
    • scales (Union *[*Real , Sequence ]) – The scales to apply on input.
    • mode (str) – algorithm used for upsampling: 'nearest' | 'bilinear' | 'area'. Default: 'nearest'
    • mul_scale (bool) – Whether to multiply the scale coefficient.

class hat.data.transforms.segmentation.SegOneHot(num_classes: int)

OneHot is used for convert layer to one-hot format.

NOTE

Affected keys: ‘gt_seg’.

  • Parameters: num_classes (int) – Num classes.

class hat.data.transforms.segmentation.SegRandomAffine(degrees: Sequence | float = 0, translate: Tuple = None, scale: Tuple = None, shear: Sequence | float = None, interpolation: InterpolationMode = InterpolationMode.NEAREST, fill: tuple | int = 0, label_fill_value: tuple | int = -1, rotate_p: float = 1.0, translate_p: float = 1.0, scale_p: float = 1.0)

Apply random for both image and label.

Please refer to RandomAffine for details.

NOTE

Affected keys: ‘img’, ‘gt_flow’, ‘gt_seg’.

  • Parameters:
    • label_fill_value (tuple or int , optional) – Fill value for label. Defaults to -1.
    • translate_p – Translate flip probability, range between [0, 1].
    • scale_p – Scale flip probability, range between [0, 1].

class hat.data.transforms.segmentation.SegRandomCrop(size, cat_max_ratio=1.0, ignore_index=255)

Random crop on data with gt_seg label, can only be used for segmentation : task.

NOTE

Affected keys: ‘img’, ‘img_shape’, ‘pad_shape’, ‘layout’, ‘gt_seg’.

  • Parameters:
    • size (tuple) – Expected size after cropping, (h, w).
    • cat_max_ratio (float , optional) – The maximum ratio that single category could occupy.
    • ignore_index (int , optional) – When considering the cat_max_ratio condition, the area corresponding to ignore_index will be ignored.

get_crop_bbox(data)

Randomly get a crop bounding box.

class hat.data.transforms.segmentation.SegRandomCutOut(prob: float, n_holes: int | Tuple[int, int], cutout_shape: Tuple[int, int] | Tuple[Tuple[int, int], ...] | None = None, cutout_ratio: Tuple[int, int] | Tuple[Tuple[int, int], ...] | None = None, fill_in: Tuple[float, float, float] = (0, 0, 0), seg_fill_in: int | None = None)

CutOut operation for segmentation task.

Randomly drop some regions of image used in Cutout.

  • Parameters:
    • prob – Cutout probability.
    • n_holes – Number of regions to be dropped. If it is given as a list,
    • interval (number of holes will be randomly selected from the closed) – [n_holes[0], n_holes[1]].
    • cutout_shape – The candidate shape of dropped regions. It can be tuple[int, int] to use a fixed cutout shape, or list[tuple[int, int]] to randomly choose shape from the list.
    • cutout_ratio – The candidate ratio of dropped regions. It can be tuple[float, float] to use a fixed ratio or list[tuple[float, float]] to randomly choose ratio from the list. Please note that cutout_shape and cutout_ratio cannot be both given at the same time.
    • fill_in – The value of pixel to fill in the dropped regions. Default is (0, 0, 0).
    • seg_fill_in – The labels of pixel to fill in the dropped regions. If seg_fill_in is None, skip. Default is None.

class hat.data.transforms.segmentation.SegReWeightByArea(seg_num_classes, lower_bound: int = 0.5, ignore_index: int = 255)

Calculate the weight of each category according to the area of each category.

For each category, the calculation formula of weight is as follows: weight = max(1.0 - seg_area / total_area, lower_bound)

NOTE

Affected keys: ‘gt_seg’, ‘gt_seg_weight’.

  • Parameters:
    • seg_num_classes (int) – Number of segmentation categories.
    • lower_bound (float) – Lower bound of weight.
    • ignore_index (int) – Index of ignore class.

class hat.data.transforms.segmentation.SegResize(size, interpolation=InterpolationMode.BILINEAR)

Apply resize for both image and label.

NOTE

Affected keys: ‘img’, ‘gt_seg’.

  • Parameters:
    • size – target size of resize.
    • interpolation – interpolation method of resize.

forward(data)

  • Parameters: img (PIL Image or Tensor) – Image to be scaled.
  • Returns: Rescaled image.
  • Return type: PIL Image or Tensor

class hat.data.transforms.segmentation.SegResizeAffine(img_scale: Sequence[int] | Sequence[Sequence[int]] | None = None, max_scale: Sequence[int] | Sequence[Sequence[int]] | None = None, multiscale_mode: str = 'range', ratio_range: Tuple[float, float] | None = None, keep_ratio: bool = True)

Resize image & seg.

NOTE

Affected keys: ‘img’, ‘img_shape’, ‘pad_shape’, ‘resized_shape’, ‘scale_factor’, ‘gt_seg’, ‘gt_polygons’.

  • Parameters:
    • img_scale – (height, width) or a list of [(height1, width1), (height2, width2), …] for image resize.
    • max_scale – The max size of image. If the image’s shape > max_scale, The image is resized to max_scale
    • multiscale_mode – Value must be one of “range” or “value”. This transform resizes the input image and bbox to same scale factor. There are 3 multiscale modes: ‘ratio_range’ is not None: randomly sample a ratio from the ratio range and multiply with the image scale. e.g. Resize(img_scale=(400, 500)), multiscale_mode=’range’, ratio_range=(0.5, 2.0) ‘ratio_range’ is None and ‘multiscale_mode’ == “range”: randomly sample a scale from a range, the length of img_scale[tuple] must be 2, which represent small img_scale and large img_scale. e.g. Resize(img_scale=((100, 200), (400,500)), multiscale_mode=’range’) ‘ratio_range’ is None and ‘multiscale_mode’ == “value”: randomly sample a scale from multiple scales. e.g. Resize(img_scale=((100, 200), (300, 400), (400, 500)), multiscale_mode=’value’)))
    • ratio_range – Scale factor range like (min_ratio, max_ratio).
    • keep_ratio – Whether to keep the aspect ratio when resizing the image.

inverse_transform(inputs: ndarray, task_type: str, inverse_info: Dict[str, Any])

Inverse option of transform to map the prediction to the original image.

  • Parameters:
    • inputs – Prediction.
    • task_type – support segmentation only.
    • inverse_info – The transform keyword is the key, and the corresponding value is the value.

class hat.data.transforms.seq_transform.SeqAlbuImageOnlyTransform(albu_params: List[Dict])

class hat.data.transforms.seq_transform.SeqAugmentHSV(hgain: float = 0.5, sgain: float = 0.5, vgain: float = 0.5, p: float = 1.0)

Random add color disturbance for sequence.

class hat.data.transforms.seq_transform.SeqBgrToYuv444(affect_key: str = 'img', rgb_input: bool = False)

BgrToYuv444 for sequence.

class hat.data.transforms.seq_transform.SeqNormalize(mean: float | Sequence[float], std: float | Sequence[float], raw_norm: bool = False, split_transform: bool = False)

Normalize for sequence.

class hat.data.transforms.seq_transform.SeqRandomFlip(px: float | None = 0.5, py: float | None = 0)

Flip image & bbox & mask & seg & flow for sequence.

class hat.data.transforms.seq_transform.SeqRandomSizeCrop(min_size: int, max_size: int, **kwargs)

RandomSizeCrop for sequence.

class hat.data.transforms.seq_transform.SeqResize(img_scale: Sequence[int] | Sequence[Sequence[int]] | None = None, max_scale: Sequence[int] | Sequence[Sequence[int]] | None = None, multiscale_mode: str = 'range', ratio_range: Tuple[float, float] | None = None, keep_ratio: bool = True, pad_to_keep_ratio: bool = False, raw_scaler_enable: bool = False, sample1c_enable: bool = False, divisor: int = 1, rm_neg_coords: bool = True, split_transform: bool = False, split_trans_w: int = 256, split_trans_h: int = 256)

class hat.data.transforms.seq_transform.SeqToFasterRCNNData(max_gt_boxes_num: int = 500, max_ig_regions_num: int = 500)

class hat.data.transforms.seq_transform.SeqToTensor(to_yuv: bool = False, use_yuv_v2: bool = True, split_transform: bool = False)

ToTensor for sequence.

class hat.data.transforms.gaze.gaze.Clip(minimum=0.0, maximum=255.0)

Clip Data to [minimum, maximum].

  • Parameters:
    • minimum – The minimum number of data. Defaults 0.
    • maximum – The maximum number of data. Defaults 255.

class hat.data.transforms.gaze.gaze.GazeRandomCropWoResize(size=(192, 320), area=(0.08, 1.0), ratio=(0.75, 1.3333333333333333), prob: float = 1.0, is_train: bool = True)

Random crop without resize.

More notes ref to https://horizonrobotics.feishu.cn/docx/LKhddopAeoXJmXxa6KocbwJdnSg. # noqa

class hat.data.transforms.gaze.gaze.GazeRotate3DWithCrop(is_train=True, head_pose_type='euler z-xy degree', rand_crop_scale=(0.85, 1.0), rand_crop_ratio=(1.25, 2), rand_crop_cropper_border=5, rotate_type='pos_map_uniform', rotate_augm_prob: float = 1, pos_map_range_pitch=(-17, 17), pos_map_range_yaw=(-20, 20), pos_map_range_roll=(-20, 20), delta_rpy_range=([0, 0], [0, 0], [0, 0]), seperate_ldmk=False, seperate_ldmk_roll_range=(0, 0), crop_size=(256, 128), to_yuv420sp=True, standard_focal=600, cropping_ratio=0.25, rand_inter_type=False)

Random rotate image, calculate ROI and random crop if necessary.

Meanwhile, pos map is generated.

  • Parameters:
    • is_train – To apply 3d rotate augm in train mod or test mod. Defaults to True.
    • head_pose_type – Type of head pose. Defaults to “euler z-xy degree”.
    • rand_crop_scale – Scale of rand crop. Defaults to (0.85, 1.0).
    • rand_crop_ratio – Ratio of rand crop. Defaults to (1.25, 2).
    • rand_crop_cropper_border – Expanded pixel size. Defaults to 5.
    • rotate_type – 3D rotate augm type. Defaults to “pos_map_uniform”.
    • rotate_augm_prob – Prob to do 3d rotate augm. Defaults to 1.
    • pos_map_range_pitch – Rotate range in pitch dimension.
    • pos_map_range_yaw – Rotate range in yaw dimension.
    • pos_map_range_roll – Rotate range in roll dimension.
    • delta_rpy_range – _description_.
    • seperate_ldmk – _description_. Defaults to False.
    • seperate_ldmk_roll_range – _description_. Defaults to (0, 0).
    • crop_size – Crop size. Defaults to (256, 128).
    • to_yuv420sp – Whether transform to yuv420sp. Defaults to True.
    • standard_focal – Standard focal of camera. Defaults to 600.
    • cropping_ratio – Ratio of crop when calc crop roi with rotated face ldmks.
    • rand_inter_type – Whether use rand inter type. Defaults to False.

class hat.data.transforms.gaze.gaze.GazeYUVTransform(rgb_data=False, nc=3, equalize_hist=True, equalize_hist_method=None)

YUVTransform for Gaze Task.

This pipeline: bgr_to_yuv444 -> equalizehist -> yuv444_to_yuv444_int8 :param rgb_data: whether input data is rgb format :param nc: output channels of data :param equalize_hist: do histogram equalization or not :param equalize_hist_method: method for histogram equalization

Inputs: : - data: input tensor with (H x W x C) shape.

Outputs: : - out: output tensor with same shape as data.

class hat.data.transforms.gaze.gaze.RandomColorJitter(brightness=0.5, contrast=(0.5, 1.5), saturation=(0.5, 1.5), hue=0.1, prob=0.5)

Randomly change the brightness, contrast, saturation and hue of an image. # noqa

More notes ref to https://horizonrobotics.feishu.cn/docx/LKhddopAeoXJmXxa6KocbwJdnSg. # noqa

class hat.data.transforms.lidar_utils.preprocess.DBFilterByDifficulty(filter_by_difficulty)

Filter sampled data by diffculties.

  • Parameters: removed_difficulties (list) – class diffculties

class hat.data.transforms.lidar_utils.preprocess.DBFilterByMinNumPoint(filter_by_min_num_points)

Filter sampled data by NumPoint.

  • Parameters: min_gt_point_dict (dict) – class numpoint thershold

class hat.data.transforms.lidar_utils.lidar_transform_3d.AssignSegLabel(bev_size: List[int] | None = None, num_classes: int = 2, class_names: List[int] | None = None, point_cloud_range: List[float] | None = None, voxel_size: List[float] | None = None)

Assign segmentation labels for lidar data.

Return segmentation labels.

  • Parameters:
    • bev_size – list of bev featuremap size.
    • num_classes – number of classes for segmentation.
    • vision_range – align gt with vision_range.
    • point_cloud_range – point cloud range.
    • voxel_size – voxel size.

class hat.data.transforms.lidar_utils.lidar_transform_3d.LidarMultiPreprocess(class_names: List[str], global_rot_noise: Tuple[float] = (0.0, 0.0), global_scale_noise: Tuple[float] = (0.0, 0.0), db_sampler: Dict | None = None, shuffle_points: bool = False, flip_both: bool = False, flip_both_prob: float = 0.5, drop_points_in_gt: bool = False)

Point cloud preprocessing transforms for segmentation.

  • Parameters:
    • class_names – list of class name.
    • global_rot_noise – rotate noise of global points.
    • global_scale_noise – scale noise of global points.
    • shuffle_points – whether to shuffle points.
    • flip_both – flip points and gt box.
    • flip_both_prob – prob flip points and gt box.
    • drop_points_in_gt – whether to drop points in gt boxes.

class hat.data.transforms.lidar_utils.lidar_transform_3d.LidarReformat(with_gt: bool = False, **kwargs)

Reformat data.

  • Parameters: with_gt – Whether to expand gt labels.

class hat.data.transforms.lidar_utils.lidar_transform_3d.ObjectNoise(gt_rotation_noise: List[float], gt_loc_noise_std: List[float], global_random_rot_range: List[float], num_try: int = 100)

Apply noise to each GT objects in the scene.

  • Parameters:
    • gt_rotation_noise – Object rotation range.
    • gt_loc_noise_std – Object noise std.
    • global_random_rot_range – Global rotation to the scene.
    • num_try – Number of times to try if the noise applied is invalid.

class hat.data.transforms.lidar_utils.lidar_transform_3d.ObjectRangeFilter(point_cloud_range: List[float])

Filter objects by point cloud range.

  • Parameters: point_cloud_range – Point cloud range.

class hat.data.transforms.lidar_utils.lidar_transform_3d.ObjectSample(db_sampler: Callable, class_names: List[str], random_crop: bool = False, remove_points_after_sample: bool = False, remove_outside_points: bool = False)

Sample GT objects to the data.

  • Parameters:
    • db_sampler – Database sampler.
    • class_names – Class names.
    • random_crop – Whether to random crop.
    • remove_points_after_sample – Whether to remove points after sample.
    • remove_outside_points – Whether to remove outsize points.

class hat.data.transforms.lidar_utils.lidar_transform_3d.PointCloudSegPreprocess(global_rot_noise: Tuple[float] = (0.0, 0.0), global_scale_noise: Tuple[float] = (0.0, 0.0))

Point cloud preprocessing transforms for segmentation.

  • Parameters:
    • global_rot_noise – rotate noise of global points.
    • global_scale_noise – scale noise of global points.

class hat.data.transforms.lidar_utils.lidar_transform_3d.PointGlobalRotation(rotation: float = 0.78)

Apply global rotation to a 3D scene.

  • Parameters: rotation – Range of rotation angle.

class hat.data.transforms.lidar_utils.lidar_transform_3d.PointGlobalScaling(min_scale: float = 0.95, max_scale: float = 1.05)

Apply global scaling to a 3D scene.

  • Parameters:
    • min_scale – Min scale ratio.
    • max_scale – Max scale ratio.

class hat.data.transforms.lidar_utils.lidar_transform_3d.PointRandomFlip(probability: float = 0.5)

Flip the points & bbox.

  • Parameters: probability – The flipping probability.

class hat.data.transforms.lidar_utils.lidar_transform_3d.ShufflePoints(shuffle: bool = True)

Shuffle Points.

  • Parameters: shuffle – Whether to shuffle
页面目录