# Megatron-LM utilities

## MegatronLMPlugin[[accelerate.utils.MegatronLMPlugin]]

<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">


<docstring><name>class accelerate.utils.MegatronLMPlugin</name><anchor>accelerate.utils.MegatronLMPlugin</anchor><source>https://github.com/huggingface/accelerate/blob/v1.11.0/src/accelerate/utils/dataclasses.py#L2215</source><parameters>[{"name": "tp_degree", "val": ": int = None"}, {"name": "pp_degree", "val": ": int = None"}, {"name": "num_micro_batches", "val": ": int = None"}, {"name": "gradient_clipping", "val": ": float = None"}, {"name": "sequence_parallelism", "val": ": bool = None"}, {"name": "recompute_activations", "val": ": bool = None"}, {"name": "use_distributed_optimizer", "val": ": bool = None"}, {"name": "pipeline_model_parallel_split_rank", "val": ": int = None"}, {"name": "num_layers_per_virtual_pipeline_stage", "val": ": int = None"}, {"name": "is_train_batch_min", "val": ": str = True"}, {"name": "train_iters", "val": ": int = None"}, {"name": "train_samples", "val": ": int = None"}, {"name": "weight_decay_incr_style", "val": ": str = 'constant'"}, {"name": "start_weight_decay", "val": ": float = None"}, {"name": "end_weight_decay", "val": ": float = None"}, {"name": "lr_decay_style", "val": ": str = 'linear'"}, {"name": "lr_decay_iters", "val": ": int = None"}, {"name": "lr_decay_samples", "val": ": int = None"}, {"name": "lr_warmup_iters", "val": ": int = None"}, {"name": "lr_warmup_samples", "val": ": int = None"}, {"name": "lr_warmup_fraction", "val": ": float = None"}, {"name": "min_lr", "val": ": float = 0"}, {"name": "consumed_samples", "val": ": list = None"}, {"name": "no_wd_decay_cond", "val": ": typing.Optional[typing.Callable] = None"}, {"name": "scale_lr_cond", "val": ": typing.Optional[typing.Callable] = None"}, {"name": "lr_mult", "val": ": float = 1.0"}, {"name": "megatron_dataset_flag", "val": ": bool = False"}, {"name": "seq_length", "val": ": int = None"}, {"name": "encoder_seq_length", "val": ": int = None"}, {"name": "decoder_seq_length", "val": ": int = None"}, {"name": "tensorboard_dir", "val": ": str = None"}, {"name": "set_all_logging_options", "val": ": bool = False"}, {"name": "eval_iters", "val": ": int = 100"}, {"name": "eval_interval", "val": ": int = 1000"}, {"name": "return_logits", "val": ": bool = False"}, {"name": "custom_train_step_class", "val": ": typing.Optional[typing.Any] = None"}, {"name": "custom_train_step_kwargs", "val": ": typing.Optional[dict[str, typing.Any]] = None"}, {"name": "custom_model_provider_function", "val": ": typing.Optional[typing.Callable] = None"}, {"name": "custom_prepare_model_function", "val": ": typing.Optional[typing.Callable] = None"}, {"name": "custom_megatron_datasets_provider_function", "val": ": typing.Optional[typing.Callable] = None"}, {"name": "custom_get_batch_function", "val": ": typing.Optional[typing.Callable] = None"}, {"name": "custom_loss_function", "val": ": typing.Optional[typing.Callable] = None"}, {"name": "other_megatron_args", "val": ": typing.Optional[dict[str, typing.Any]] = None"}]</parameters><paramsdesc>- **tp_degree** (`int`, defaults to `None`) --
  Tensor parallelism degree.
- **pp_degree** (`int`, defaults to `None`) --
  Pipeline parallelism degree.
- **num_micro_batches** (`int`, defaults to `None`) --
  Number of micro-batches.
- **gradient_clipping** (`float`, defaults to `None`) --
  Gradient clipping value based on global L2 Norm (0 to disable).
- **sequence_parallelism** (`bool`, defaults to `None`) --
  Enable sequence parallelism.
- **recompute_activations** (`bool`, defaults to `None`) --
  Enable selective activation recomputation.
- **use_distributed_optimizr** (`bool`, defaults to `None`) --
  Enable distributed optimizer.
- **pipeline_model_parallel_split_rank** (`int`, defaults to `None`) --
  Rank where encoder and decoder should be split.
- **num_layers_per_virtual_pipeline_stage** (`int`, defaults to `None`) --
  Number of layers per virtual pipeline stage.
- **is_train_batch_min** (`str`, defaults to `True`) --
  If both tran & eval dataloaders are specified, this will decide the `micro_batch_size`.
- **train_iters** (`int`, defaults to `None`) --
  Total number of samples to train over all training runs. Note that either train-iters or train-samples
  should be provided when using `MegatronLMDummyScheduler`.
- **train_samples** (`int`, defaults to `None`) --
  Total number of samples to train over all training runs. Note that either train-iters or train-samples
  should be provided when using `MegatronLMDummyScheduler`.
- **weight_decay_incr_style** (`str`, defaults to `'constant'`) --
  Weight decay increment function. choices=["constant", "linear", "cosine"].
- **start_weight_decay** (`float`, defaults to `None`) --
  Initial weight decay coefficient for L2 regularization.
- **end_weight_decay** (`float`, defaults to `None`) --
  End of run weight decay coefficient for L2 regularization.
- **lr_decay_style** (`str`, defaults to `'linear'`) --
  Learning rate decay function. choices=['constant', 'linear', 'cosine'].
- **lr_decay_iters** (`int`, defaults to `None`) --
  Number of iterations for learning rate decay. If None defaults to `train_iters`.
- **lr_decay_samples** (`int`, defaults to `None`) --
  Number of samples for learning rate decay. If None defaults to `train_samples`.
- **lr_warmup_iters** (`int`, defaults to `None`) --
  Number of iterations to linearly warmup learning rate over.
- **lr_warmup_samples** (`int`, defaults to `None`) --
  Number of samples to linearly warmup learning rate over.
- **lr_warmup_fraction** (`float`, defaults to `None`) --
  Fraction of lr-warmup-(iters/samples) to linearly warmup learning rate over.
- **min_lr** (`float`, defaults to `0`) --
  Minimum value for learning rate. The scheduler clip values below this threshold.
- **consumed_samples** (`List`, defaults to `None`) --
  Number of samples consumed in the same order as the dataloaders to `accelerator.prepare` call.
- **no_wd_decay_cond** (`Optional`, defaults to `None`) --
  Condition to disable weight decay.
- **scale_lr_cond** (`Optional`, defaults to `None`) --
  Condition to scale learning rate.
- **lr_mult** (`float`, defaults to `1.0`) --
  Learning rate multiplier.
- **megatron_dataset_flag** (`bool`, defaults to `False`) --
  Whether the format of dataset follows Megatron-LM Indexed/Cached/MemoryMapped format.
- **seq_length** (`int`, defaults to `None`) --
  Maximum sequence length to process.
- **encoder_seq_length** (`int`, defaults to `None`) --
  Maximum sequence length to process for the encoder.
- **decoder_seq_length** (`int`, defaults to `None`) --
  Maximum sequence length to process for the decoder.
- **tensorboard_dir** (`str`, defaults to `None`) --
  Path to save tensorboard logs.
- **set_all_logging_options** (`bool`, defaults to `False`) --
  Whether to set all logging options.
- **eval_iters** (`int`, defaults to `100`) --
  Number of iterations to run for evaluation validation/test for.
- **eval_interval** (`int`, defaults to `1000`) --
  Interval between running evaluation on validation set.
- **return_logits** (`bool`, defaults to `False`) --
  Whether to return logits from the model.
- **custom_train_step_class** (`Optional`, defaults to `None`) --
  Custom train step class.
- **custom_train_step_kwargs** (`Optional`, defaults to `None`) --
  Custom train step kwargs.
- **custom_model_provider_function** (`Optional`, defaults to `None`) --
  Custom model provider function.
- **custom_prepare_model_function** (`Optional`, defaults to `None`) --
  Custom prepare model function.
- **custom_megatron_datasets_provider_function** (`Optional`, defaults to `None`) --
  Custom megatron train_valid_test datasets provider function.
- **custom_get_batch_function** (`Optional`, defaults to `None`) --
  Custom get batch function.
- **custom_loss_function** (`Optional`, defaults to `None`) --
  Custom loss function.
- **other_megatron_args** (`Optional`, defaults to `None`) --
  Other Megatron-LM arguments. Please refer Megatron-LM.</paramsdesc><paramgroups>0</paramgroups></docstring>

Plugin for Megatron-LM to enable tensor, pipeline, sequence and data parallelism. Also to enable selective
activation recomputation and optimized fused kernels.




</div>

## MegatronLMDummyScheduler[[accelerate.utils.MegatronLMDummyScheduler]]

<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">


<docstring><name>class accelerate.utils.MegatronLMDummyScheduler</name><anchor>accelerate.utils.MegatronLMDummyScheduler</anchor><source>https://github.com/huggingface/accelerate/blob/v1.11.0/src/accelerate/utils/megatron_lm.py#L391</source><parameters>[{"name": "optimizer", "val": ""}, {"name": "total_num_steps", "val": " = None"}, {"name": "warmup_num_steps", "val": " = 0"}, {"name": "**kwargs", "val": ""}]</parameters><paramsdesc>- **optimizer** (`torch.optim.optimizer.Optimizer`) --
  The optimizer to wrap.
- **total_num_steps** (int) --
  Total number of steps.
- **warmup_num_steps** (int) --
  Number of steps for warmup.
- ****kwargs** (additional keyword arguments, *optional*) --
  Other arguments.</paramsdesc><paramgroups>0</paramgroups></docstring>

Dummy scheduler presents model parameters or param groups, this is primarily used to follow conventional training
loop when scheduler config is specified in the deepspeed config file.




</div>

## MegatronLMDummyDataLoader[[accelerate.utils.MegatronLMDummyDataLoader]]

<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">


<docstring><name>class accelerate.utils.MegatronLMDummyDataLoader</name><anchor>accelerate.utils.MegatronLMDummyDataLoader</anchor><source>https://github.com/huggingface/accelerate/blob/v1.11.0/src/accelerate/utils/megatron_lm.py#L175</source><parameters>[{"name": "**dataset_kwargs", "val": ""}]</parameters><paramsdesc>- ****dataset_kwargs** -- Megatron data arguments.</paramsdesc><paramgroups>0</paramgroups></docstring>

Dummy dataloader presents model parameters or param groups, this is primarily used to follow conventional training




</div>

## AbstractTrainStep[[accelerate.utils.AbstractTrainStep]]

<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">


<docstring><name>class accelerate.utils.AbstractTrainStep</name><anchor>accelerate.utils.AbstractTrainStep</anchor><source>https://github.com/huggingface/accelerate/blob/v1.11.0/src/accelerate/utils/megatron_lm.py#L428</source><parameters>[{"name": "name", "val": ""}]</parameters></docstring>
Abstract class for batching, forward pass and loss handler.

</div>

## GPTTrainStep[[accelerate.utils.GPTTrainStep]]

<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">


<docstring><name>class accelerate.utils.GPTTrainStep</name><anchor>accelerate.utils.GPTTrainStep</anchor><source>https://github.com/huggingface/accelerate/blob/v1.11.0/src/accelerate/utils/megatron_lm.py#L587</source><parameters>[{"name": "accelerator", "val": ""}, {"name": "args", "val": ""}]</parameters><paramsdesc>- **args** (`argparse.Namespace`) -- Megatron-LM arguments.</paramsdesc><paramgroups>0</paramgroups></docstring>

GPT train step class.




</div>

## BertTrainStep[[accelerate.utils.BertTrainStep]]

<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">


<docstring><name>class accelerate.utils.BertTrainStep</name><anchor>accelerate.utils.BertTrainStep</anchor><source>https://github.com/huggingface/accelerate/blob/v1.11.0/src/accelerate/utils/megatron_lm.py#L445</source><parameters>[{"name": "accelerator", "val": ""}, {"name": "args", "val": ""}]</parameters><paramsdesc>- **args** (`argparse.Namespace`) -- Megatron-LM arguments.</paramsdesc><paramgroups>0</paramgroups></docstring>

Bert train step class.




</div>

## T5TrainStep[[accelerate.utils.T5TrainStep]]

<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">


<docstring><name>class accelerate.utils.T5TrainStep</name><anchor>accelerate.utils.T5TrainStep</anchor><source>https://github.com/huggingface/accelerate/blob/v1.11.0/src/accelerate/utils/megatron_lm.py#L719</source><parameters>[{"name": "accelerator", "val": ""}, {"name": "args", "val": ""}]</parameters><paramsdesc>- **args** (`argparse.Namespace`) -- Megatron-LM arguments.</paramsdesc><paramgroups>0</paramgroups></docstring>

T5 train step class.




</div>

## avg_losses_across_data_parallel_group[[accelerate.utils.avg_losses_across_data_parallel_group]]

<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">


<docstring><name>accelerate.utils.avg_losses_across_data_parallel_group</name><anchor>accelerate.utils.avg_losses_across_data_parallel_group</anchor><source>https://github.com/huggingface/accelerate/blob/v1.11.0/src/accelerate/utils/megatron_lm.py#L1393</source><parameters>[{"name": "losses", "val": ""}]</parameters><paramsdesc>- **losses** (List[Tensor]) -- List of losses to average across data parallel group.</paramsdesc><paramgroups>0</paramgroups></docstring>

Average losses across data parallel group.




</div>

<EditOnGithub source="https://github.com/huggingface/accelerate/blob/main/docs/source/package_reference/megatron_lm.md" />