profiler
Profilers widely used for perf in HAT.
profiler
| Member | Summary |
|---|
profilers.PassThroughProfiler | This class should be used when you don't want the (small) overhead of profiling. |
profilers.SimpleProfiler | This profiler simply records the duration of actions (in seconds) and reports the mean duration of each action and the total time spent over the entire training run. |
memory_profiler.GPUMemoryProfiler | |
memory_profiler.CPUMemoryProfiler | |
memory_profiler.StageCPUMemoryProfiler | |
model_profiler.BaseModelProfiler | Base class for defining the process of model analysis. |
model_profiler.FeaturemapSimilarity | Compute the similarity of two models. |
model_profiler.ProfileFeaturemap | Profile featuremap value with log or tensorboard. |
model_profiler.CheckShared | Checking if model has shared ops. |
model_profiler.CheckFused | Checking if model has unfused ops. |
model_profiler.CompareWeights | Compare weights of float/qat/quantized models. |
model_profiler.CheckDeployDevice | Check deploy device(BPU or CPU) of hybrid model. |
model_profiler.ModelProfilerv2 | Run model and save each op info. |
model_profiler.HbirModelProfiler | Run hbir model and save each op info. |
quant_analysis.QuantAnalysis | Quantization models precision analysis. |
API Reference
This file is modified from pytorch-lightning.
checking if there are any bottlenecks in your code.
class hat.profiler.profilers.PassThroughProfiler(dirpath: str | Path | None = None, filename: str | None = None, auto_describe: bool = False, schedule: Callable[[int], ProfilerAction] | None = None, summary_interval: int = -1)
This class should be used when you don’t want the (small) overhead of
profiling. The Trainer uses this class by default.
start(action_name: str)
Define how to start recording an action.
stop(action_name: str)
Define how to record the duration once an action is complete.
summary()
Create profiler summary in text format.
This profiler simply records the duration of actions (in seconds) and
reports the mean duration of each action and the total time spent over
the entire training run.
start(action_name: str)
Define how to start recording an action.
stop(action_name: str)
Define how to record the duration once an action is complete.
summary()
Create profiler summary in text format.
Memory Profiling.
Help profiling the GPU or CPU memory bottleneck in the process
of model training.
class hat.profiler.memory_profiler.CPUMemoryProfiler(dirpath: str | Path | None = None, filename: str | None = None, auto_describe: bool = False, schedule: Callable[[int], ProfilerAction] | None = None, summary_interval: int = -1)
start(action_name: str)
Define how to start recording an action.
stop(action_name: str)
Define how to record the duration once an action is complete.
summary()
Create profiler summary in text format.
class hat.profiler.memory_profiler.GPUMemoryProfiler(dirpath: str | Path | None = None, filename: str | None = None, record_snapshot: bool = False, snapshot_interval: int = 1, record_functions: Set[str] | None = None, auto_describe: bool = False, schedule: Callable[[int], ProfilerAction] | None = None, summary_interval: int = -1)
save_snapshots()
Dump all snapshots.
snapshots format (dict(dict)):
: {
step_id : {
start(action_name: str)
Define how to start recording an action.
stop(action_name: str)
Define how to record the duration once an action is complete.
summary()
Create profiler summary in text format.
class hat.profiler.memory_profiler.StageCPUMemoryProfiler(profile_action_name: str, leaks: bool = True, dirpath: str | Path = None, filename: str = None, auto_describe: bool = False)
describe()
Log a profile report after the conclusion of run.
profile(action_name: str)
Yield a context manager to encapsulate the scope of a profiled action.
Example:
with self.profile('load training data'):
# load training data code
The profiler will start once you’ve entered the context and will
automatically stop once you exit the code block.
class hat.profiler.model_profiler.BaseModelProfiler
Base class for defining the process of model analysis.
class hat.profiler.model_profiler.CheckDeployDevice(print_tabulate: bool = True, out_dir: str | None = None)
Check deploy device(BPU or CPU) of hybrid model.
- Parameters:
- print_tabulate (bool , optional) – Whether print the result as tabulate.
Defaults to True.
- out_dir – path to save the result txt ‘deploy_device.txt’. If None, will
save in the current directory. Default: None
- Returns:
A dict of model deploy infos with schema
: * KEY (str): module name
- VALUE (Tuple): (deploy device(BPU or CPU), module type)
class hat.profiler.model_profiler.CheckFused(print_tabulate: bool = True)
Checking if model has unfused ops.
Check unfused modules in a model.
NOTE: This function is only capable to check unfused modules. For the
correctness of fusion, please use featuremap_similarity to compare
the feature between fused and unfused model.
- Parameters:
print_tabulate (bool) – Whether print the result as tabulate.
Default: True.
- Returns:
The qualified name of modules that can be fused.
- Return type:
List[List[str]]
class hat.profiler.model_profiler.CheckShared(check_leaf_module: callable | None = None, print_tabulate: bool = True)
Checking if model has shared ops.
Count called times for all leaf modules in a model.
- Parameters:
- check_leaf_module (callable , optional) – A function to check if
a module is leaf. Pass None to use pre-defined is_leaf_module.
Default: None.
- print_tabulate (bool , optional) – Whether print the result as tabulate.
Default: True.
- Returns:
The qualified name and called times of each leaf module.
- Return type:
Dict[str, int]
class hat.profiler.model_profiler.CompareWeights(similarity_func='Cosine', with_tensorboard: bool = False, tensorboard_dir: str | None = None, out_dir: str | None = None)
Compare weights of float/qat/quantized models.
This function compares weights of each layer based on
torch.quantization._numeric_suite.compare_weights. The weight similarity
and atol will be print on the screen and save in “weight_comparison.txt”.
If you want to see histogram of weights, set with_tensorboard=True.
-
Parameters:
- similarity_func – similarity computation function. Support “Cosine”,
“MSE”, “L1”, “KL”, “SQNR” or any user-defined Callable object. If
it is a user-defined object, it should return a scalar or tensor
with only one number. Otherwise the result shown may be unexpected.
Default: “Cosine”
- with_tensorboard – whether to use tensorboard. Default: False
- tensorboard_dir – tensorboard log file path. Default: None
- out_dir – path to save the result txt and picture. If None, will save in
the current directory. Default: None
-
Returns:
- KEY (str): module name (Eg. layer1.0.conv.weight)
: * VALUE (dict): a dict of the corresponding weights in two models:
: ”float”: weight value in float model
“quantized”: weight value in qat/quantized model
A list of list. Each list is each layer weight similarity in format
[module name, similarity, atol(N scale)]
-
Return type:
A weight comparison dict with schema
class hat.profiler.model_profiler.FeaturemapSimilarity(similarity_func: str | callable = 'Cosine', threshold: Real | None = None, devices: device | tuple | None = None, out_dir: str | None = None)
Compute the similarity of two models.
Compute the similarity of feature maps. The input models can be float/
fused/calibration/qat/quantized model.
- Parameters:
- similarity_func – similarity computation function. Support “Cosine”,
“MSE”, “L1”, “KL”, “SQNR”, or any user-defined Callable object. If
it is a user-defined object, it should return a scalar or tensor
with only one number. Otherwise the result shown may be unexpected.
Default: “Cosine”
- threshold – if similarity value exceeds or less than this threshold,
the featuremap name will be shown in red color.If threshold is
none, it will be set to different values according to different
similarity functions. Default: None
- devices – run model on which devices (cpu, gpu). If can be - None. Run model with given inputs - torch.device. Both models and given inputs will be moved on this specified device - tuple. A tuple of 2 torch.devices. The two models will be moved on specified devices seperatedly. It may be used to compare the
CPU and GPU results difference.
- out_dir – path to save the result txt and picture. If None, will save in
the current directory. Default: None
- Returns:
A List of list. Each list is each layer similarity info in format
[index, module name, module type, similarity, scale, atol,
atol(N scale), single op error(N scale)]
class hat.profiler.model_profiler.HbirModelProfiler(show_table: bool = True, show_tensorboard: bool = False, prefixes: Tuple[str, ...] | None = None, types: Tuple[Type, ...] | None = None, with_stack: bool = False, force_per_channel: bool = False, out_dir: str | None = None)
Run hbir model and save each op info.
This function runs hbir model and save each op info on disk, which can be
show in a table or in tensorboard.
- Parameters:
- show_table – whether show each op info in a table, which will also be
saved in statistic.txt
- show_tensorboard – whether show each op histogram in tensorboard.
- prefixes – only show ops with the prefix of qualified name
- types – only show ops with given types
- with_stack – whether show op location in code
- force_per_channel – whether show data in per channel in tensorboard
- out_dir – dir to save op infos and result files
class hat.profiler.model_profiler.ModelProfilerv2(show_table: bool = True, show_tensorboard: bool = False, prefixes: Tuple[str, ...] | None = None, types: Tuple[Type, ...] | None = None, with_stack: bool = False, force_per_channel: bool = False, out_dir: str | None = None)
Run model and save each op info.
This function runs model and save each op info on disk, which can be show
in a table or in tensorboard.
- Parameters:
- show_table – whether show each op info in a table, which will also be
saved in statistic.txt
- show_tensorboard – whether show each op histogram in tensorboard.
- prefixes – only show ops with the prefix of qualified name
- types – only show ops with given types
- with_stack – whether show op location in code
- force_per_channel – whether show data in per channel in tensorboard
- out_dir – dir to save op infos and result files
class hat.profiler.model_profiler.ProfileFeaturemap(prefixes: Tuple = (), types: Tuple = (), device: device | None = None, preserve_int: bool = False, use_class_name: bool = False, skip_identity: bool = False, with_tensorboard: bool = False, tensorboard_dir: str | None = None, print_per_channel_scale: bool = False, show_per_channel: bool = False, out_dir: str | None = None, file_name: str | None = None, profile_func: callable | None = None)
Profile featuremap value with log or tensorboard.
Print min/max/mean/var/scale of each feature profiled by get_raw_features
by default. If with_tensorboard set True, the histogram of each feature
will be shown in tensorboard, which is useful to see the data distribution.
If you want to get more info about features, you can define your custom
profile functions to process the results of get_raw_features.
- Parameters:
- prefixes – get features info by the prefix of qualified name
Default: tuple().
- types – get features info by module type. Default: tuple().
- device – model run on which device. Default: None
- preserve_int – if True, record each op result in int type.
Default: False
- use_class_name – if True, record class name not class type.
Default: False
- skip_identity – if True, will not record the result of Identity module.
Default: False
- with_tensorboard – whether to use tensorboard. Default: False
- tensorboard_dir – tensorboard log file path. Default: None
- print_per_channel_scale – whether to print per channel scales.
Default: False
- show_per_channel – show each featuremap in per channel ways
in tensorboard. Default: False
- out_dir – path to save the result txt and picture. If None, will save in
the current directory. Default: None
- file_name – result file name. If None, will save result and fig with
name ‘statistic’.(statistic.txt and statistic.html). Default: None
- profile_func (callable , None) – you custom featuremap profiler function.
Default: None
- Returns:
A List of list. Each list is each layer statistic in format
[index, module name, module type, attr, min, max, mean, var, scale]
class hat.profiler.quant_analysis.QuantAnalysis(model: Module, baseline_model_convert_pipeline, analysis_model_convert_pipeline, analysis_model_type, out_dir, post_process=None, dataloader=None, batch_transforms=None, num_steps=None, device_id=None, bad_case_input=None, analysis_pipeline=None)
Quantization models precision analysis.
This class helps analyze quantization precision errors between two models.
- Parameters:
- model – origin float model
- baseline_model_convert_pipeline – baseline model convert pipeline
- analysis_model_convert_pipeline – analysis model convert pipeline
- analysis_model_type – the low precision model type. Support two types:
- "fake_quant" ( -) – the analysis_model can be calibration/qat model
and the baseline_model can be float model or a int8/16 mixed
qconfig model with good precision.
- "quantized" ( -) – the analysis_model must be quantized model and the
baseline_model must be calibration/qat model.
- out_dir – path to save advisor and comparsion result.
- post_process – post process function which performs on model output.
- data_generator – input dataloader or a custom iterable object. It
must generate a data each iteration. Example:
- dataloader ( - torch) – data_generator = torch.utils.data.DataLoader()
- generator ( - a custom) – data_generator = [x for x in [1, 2, 3]]
- batch_transforms – batch transforms
- num_steps – num of steps to find bad case
- device_id – device id
- bad_case_input – manually set bad case. If set, will skip auto find
bad case process.
- analysis_pipeline – analysis pipeline. It is a list of analysis
operations to do. If None, will do default analysis pipelines.