profiler

Profilers widely used for perf in HAT.

profiler

MemberSummary
profilers.PassThroughProfilerThis class should be used when you don't want the (small) overhead of profiling.
profilers.SimpleProfilerThis profiler simply records the duration of actions (in seconds) and reports the mean duration of each action and the total time spent over the entire training run.
memory_profiler.GPUMemoryProfiler
memory_profiler.CPUMemoryProfiler
memory_profiler.StageCPUMemoryProfiler
model_profiler.BaseModelProfilerBase class for defining the process of model analysis.
model_profiler.FeaturemapSimilarityCompute the similarity of two models.
model_profiler.ProfileFeaturemapProfile featuremap value with log or tensorboard.
model_profiler.CheckSharedChecking if model has shared ops.
model_profiler.CheckFusedChecking if model has unfused ops.
model_profiler.CompareWeightsCompare weights of float/qat/quantized models.
model_profiler.CheckDeployDeviceCheck deploy device(BPU or CPU) of hybrid model.
model_profiler.ModelProfilerv2Run model and save each op info.
model_profiler.HbirModelProfilerRun hbir model and save each op info.
quant_analysis.QuantAnalysisQuantization models precision analysis.

API Reference

This file is modified from pytorch-lightning.

checking if there are any bottlenecks in your code.

class hat.profiler.profilers.PassThroughProfiler(dirpath: str | Path | None = None, filename: str | None = None, auto_describe: bool = False, schedule: Callable[[int], ProfilerAction] | None = None, summary_interval: int = -1)

This class should be used when you don’t want the (small) overhead of profiling. The Trainer uses this class by default.

start(action_name: str)

Define how to start recording an action.

stop(action_name: str)

Define how to record the duration once an action is complete.

summary()

Create profiler summary in text format.

class hat.profiler.profilers.SimpleProfiler(dirpath: str | Path | None = None, filename: str | None = None, warmup_step: int = 1, use_real_duration: bool = False, auto_describe: bool = False, schedule: Callable[[int], ProfilerAction] | None = None, summary_interval: int = -1)

This profiler simply records the duration of actions (in seconds) and reports the mean duration of each action and the total time spent over the entire training run.

start(action_name: str)

Define how to start recording an action.

stop(action_name: str)

Define how to record the duration once an action is complete.

summary()

Create profiler summary in text format.

Memory Profiling.

Help profiling the GPU or CPU memory bottleneck in the process of model training.

class hat.profiler.memory_profiler.CPUMemoryProfiler(dirpath: str | Path | None = None, filename: str | None = None, auto_describe: bool = False, schedule: Callable[[int], ProfilerAction] | None = None, summary_interval: int = -1)

start(action_name: str)

Define how to start recording an action.

stop(action_name: str)

Define how to record the duration once an action is complete.

summary()

Create profiler summary in text format.

class hat.profiler.memory_profiler.GPUMemoryProfiler(dirpath: str | Path | None = None, filename: str | None = None, record_snapshot: bool = False, snapshot_interval: int = 1, record_functions: Set[str] | None = None, auto_describe: bool = False, schedule: Callable[[int], ProfilerAction] | None = None, summary_interval: int = -1)

save_snapshots()

Dump all snapshots.

snapshots format (dict(dict)): : {

step_id : {

start(action_name: str)

Define how to start recording an action.

stop(action_name: str)

Define how to record the duration once an action is complete.

summary()

Create profiler summary in text format.

class hat.profiler.memory_profiler.StageCPUMemoryProfiler(profile_action_name: str, leaks: bool = True, dirpath: str | Path = None, filename: str = None, auto_describe: bool = False)

describe()

Log a profile report after the conclusion of run.

profile(action_name: str)

Yield a context manager to encapsulate the scope of a profiled action.

Example:

with self.profile('load training data'): # load training data code

The profiler will start once you’ve entered the context and will automatically stop once you exit the code block.

class hat.profiler.model_profiler.BaseModelProfiler

Base class for defining the process of model analysis.

class hat.profiler.model_profiler.CheckDeployDevice(print_tabulate: bool = True, out_dir: str | None = None)

Check deploy device(BPU or CPU) of hybrid model.

  • Parameters:
    • print_tabulate (bool , optional) – Whether print the result as tabulate. Defaults to True.
    • out_dir – path to save the result txt ‘deploy_device.txt’. If None, will save in the current directory. Default: None
  • Returns: A dict of model deploy infos with schema : * KEY (str): module name
    • VALUE (Tuple): (deploy device(BPU or CPU), module type)

class hat.profiler.model_profiler.CheckFused(print_tabulate: bool = True)

Checking if model has unfused ops.

Check unfused modules in a model. NOTE: This function is only capable to check unfused modules. For the correctness of fusion, please use featuremap_similarity to compare the feature between fused and unfused model.

  • Parameters: print_tabulate (bool) – Whether print the result as tabulate. Default: True.
  • Returns: The qualified name of modules that can be fused.
  • Return type: List[List[str]]

class hat.profiler.model_profiler.CheckShared(check_leaf_module: callable | None = None, print_tabulate: bool = True)

Checking if model has shared ops.

Count called times for all leaf modules in a model.

  • Parameters:
    • check_leaf_module (callable , optional) – A function to check if a module is leaf. Pass None to use pre-defined is_leaf_module. Default: None.
    • print_tabulate (bool , optional) – Whether print the result as tabulate. Default: True.
  • Returns: The qualified name and called times of each leaf module.
  • Return type: Dict[str, int]

class hat.profiler.model_profiler.CompareWeights(similarity_func='Cosine', with_tensorboard: bool = False, tensorboard_dir: str | None = None, out_dir: str | None = None)

Compare weights of float/qat/quantized models.

This function compares weights of each layer based on torch.quantization._numeric_suite.compare_weights. The weight similarity and atol will be print on the screen and save in “weight_comparison.txt”. If you want to see histogram of weights, set with_tensorboard=True.

  • Parameters:

    • similarity_func – similarity computation function. Support “Cosine”, “MSE”, “L1”, “KL”, “SQNR” or any user-defined Callable object. If it is a user-defined object, it should return a scalar or tensor with only one number. Otherwise the result shown may be unexpected. Default: “Cosine”
    • with_tensorboard – whether to use tensorboard. Default: False
    • tensorboard_dir – tensorboard log file path. Default: None
    • out_dir – path to save the result txt and picture. If None, will save in the current directory. Default: None
  • Returns:

    • KEY (str): module name (Eg. layer1.0.conv.weight) : * VALUE (dict): a dict of the corresponding weights in two models: : ”float”: weight value in float model “quantized”: weight value in qat/quantized model

    A list of list. Each list is each layer weight similarity in format [module name, similarity, atol(N scale)]

  • Return type: A weight comparison dict with schema

class hat.profiler.model_profiler.FeaturemapSimilarity(similarity_func: str | callable = 'Cosine', threshold: Real | None = None, devices: device | tuple | None = None, out_dir: str | None = None)

Compute the similarity of two models.

Compute the similarity of feature maps. The input models can be float/ fused/calibration/qat/quantized model.

  • Parameters:
    • similarity_func – similarity computation function. Support “Cosine”, “MSE”, “L1”, “KL”, “SQNR”, or any user-defined Callable object. If it is a user-defined object, it should return a scalar or tensor with only one number. Otherwise the result shown may be unexpected. Default: “Cosine”
    • threshold – if similarity value exceeds or less than this threshold, the featuremap name will be shown in red color.If threshold is none, it will be set to different values according to different similarity functions. Default: None
    • devices – run model on which devices (cpu, gpu). If can be - None. Run model with given inputs - torch.device. Both models and given inputs will be moved on this specified device - tuple. A tuple of 2 torch.devices. The two models will be moved on specified devices seperatedly. It may be used to compare the CPU and GPU results difference.
    • out_dir – path to save the result txt and picture. If None, will save in the current directory. Default: None
  • Returns: A List of list. Each list is each layer similarity info in format [index, module name, module type, similarity, scale, atol, atol(N scale), single op error(N scale)]

class hat.profiler.model_profiler.HbirModelProfiler(show_table: bool = True, show_tensorboard: bool = False, prefixes: Tuple[str, ...] | None = None, types: Tuple[Type, ...] | None = None, with_stack: bool = False, force_per_channel: bool = False, out_dir: str | None = None)

Run hbir model and save each op info.

This function runs hbir model and save each op info on disk, which can be show in a table or in tensorboard.

  • Parameters:
    • show_table – whether show each op info in a table, which will also be saved in statistic.txt
    • show_tensorboard – whether show each op histogram in tensorboard.
    • prefixes – only show ops with the prefix of qualified name
    • types – only show ops with given types
    • with_stack – whether show op location in code
    • force_per_channel – whether show data in per channel in tensorboard
    • out_dir – dir to save op infos and result files

class hat.profiler.model_profiler.ModelProfilerv2(show_table: bool = True, show_tensorboard: bool = False, prefixes: Tuple[str, ...] | None = None, types: Tuple[Type, ...] | None = None, with_stack: bool = False, force_per_channel: bool = False, out_dir: str | None = None)

Run model and save each op info.

This function runs model and save each op info on disk, which can be show in a table or in tensorboard.

  • Parameters:
    • show_table – whether show each op info in a table, which will also be saved in statistic.txt
    • show_tensorboard – whether show each op histogram in tensorboard.
    • prefixes – only show ops with the prefix of qualified name
    • types – only show ops with given types
    • with_stack – whether show op location in code
    • force_per_channel – whether show data in per channel in tensorboard
    • out_dir – dir to save op infos and result files

class hat.profiler.model_profiler.ProfileFeaturemap(prefixes: Tuple = (), types: Tuple = (), device: device | None = None, preserve_int: bool = False, use_class_name: bool = False, skip_identity: bool = False, with_tensorboard: bool = False, tensorboard_dir: str | None = None, print_per_channel_scale: bool = False, show_per_channel: bool = False, out_dir: str | None = None, file_name: str | None = None, profile_func: callable | None = None)

Profile featuremap value with log or tensorboard.

Print min/max/mean/var/scale of each feature profiled by get_raw_features by default. If with_tensorboard set True, the histogram of each feature will be shown in tensorboard, which is useful to see the data distribution.

If you want to get more info about features, you can define your custom profile functions to process the results of get_raw_features.

  • Parameters:
    • prefixes – get features info by the prefix of qualified name Default: tuple().
    • types – get features info by module type. Default: tuple().
    • device – model run on which device. Default: None
    • preserve_int – if True, record each op result in int type. Default: False
    • use_class_name – if True, record class name not class type. Default: False
    • skip_identity – if True, will not record the result of Identity module. Default: False
    • with_tensorboard – whether to use tensorboard. Default: False
    • tensorboard_dir – tensorboard log file path. Default: None
    • print_per_channel_scale – whether to print per channel scales. Default: False
    • show_per_channel – show each featuremap in per channel ways in tensorboard. Default: False
    • out_dir – path to save the result txt and picture. If None, will save in the current directory. Default: None
    • file_name – result file name. If None, will save result and fig with name ‘statistic’.(statistic.txt and statistic.html). Default: None
    • profile_func (callable , None) – you custom featuremap profiler function. Default: None
  • Returns: A List of list. Each list is each layer statistic in format [index, module name, module type, attr, min, max, mean, var, scale]

class hat.profiler.quant_analysis.QuantAnalysis(model: Module, baseline_model_convert_pipeline, analysis_model_convert_pipeline, analysis_model_type, out_dir, post_process=None, dataloader=None, batch_transforms=None, num_steps=None, device_id=None, bad_case_input=None, analysis_pipeline=None)

Quantization models precision analysis.

This class helps analyze quantization precision errors between two models.

  • Parameters:
    • model – origin float model
    • baseline_model_convert_pipeline – baseline model convert pipeline
    • analysis_model_convert_pipeline – analysis model convert pipeline
    • analysis_model_type – the low precision model type. Support two types:
    • "fake_quant" ( -) – the analysis_model can be calibration/qat model and the baseline_model can be float model or a int8/16 mixed qconfig model with good precision.
    • "quantized" ( -) – the analysis_model must be quantized model and the baseline_model must be calibration/qat model.
    • out_dir – path to save advisor and comparsion result.
    • post_process – post process function which performs on model output.
    • data_generator – input dataloader or a custom iterable object. It must generate a data each iteration. Example:
    • dataloader ( - torch) – data_generator = torch.utils.data.DataLoader()
    • generator ( - a custom) – data_generator = [x for x in [1, 2, 3]]
    • batch_transforms – batch transforms
    • num_steps – num of steps to find bad case
    • device_id – device id
    • bad_case_input – manually set bad case. If set, will skip auto find bad case process.
    • analysis_pipeline – analysis pipeline. It is a list of analysis operations to do. If None, will do default analysis pipelines.
On This Page