class accelerate.utils.MegatronLMPluginaccelerate.utils.MegatronLMPluginhttps://github.com/huggingface/accelerate/blob/v1.11.0/src/accelerate/utils/dataclasses.py#L2215[{"name": "tp_degree", "val": ": int = None"}, {"name": "pp_degree", "val": ": int = None"}, {"name": "num_micro_batches", "val": ": int = None"}, {"name": "gradient_clipping", "val": ": float = None"}, {"name": "sequence_parallelism", "val": ": bool = None"}, {"name": "recompute_activations", "val": ": bool = None"}, {"name": "use_distributed_optimizer", "val": ": bool = None"}, {"name": "pipeline_model_parallel_split_rank", "val": ": int = None"}, {"name": "num_layers_per_virtual_pipeline_stage", "val": ": int = None"}, {"name": "is_train_batch_min", "val": ": str = True"}, {"name": "train_iters", "val": ": int = None"}, {"name": "train_samples", "val": ": int = None"}, {"name": "weight_decay_incr_style", "val": ": str = 'constant'"}, {"name": "start_weight_decay", "val": ": float = None"}, {"name": "end_weight_decay", "val": ": float = None"}, {"name": "lr_decay_style", "val": ": str = 'linear'"}, {"name": "lr_decay_iters", "val": ": int = None"}, {"name": "lr_decay_samples", "val": ": int = None"}, {"name": "lr_warmup_iters", "val": ": int = None"}, {"name": "lr_warmup_samples", "val": ": int = None"}, {"name": "lr_warmup_fraction", "val": ": float = None"}, {"name": "min_lr", "val": ": float = 0"}, {"name": "consumed_samples", "val": ": list = None"}, {"name": "no_wd_decay_cond", "val": ": typing.Optional[typing.Callable] = None"}, {"name": "scale_lr_cond", "val": ": typing.Optional[typing.Callable] = None"}, {"name": "lr_mult", "val": ": float = 1.0"}, {"name": "megatron_dataset_flag", "val": ": bool = False"}, {"name": "seq_length", "val": ": int = None"}, {"name": "encoder_seq_length", "val": ": int = None"}, {"name": "decoder_seq_length", "val": ": int = None"}, {"name": "tensorboard_dir", "val": ": str = None"}, {"name": "set_all_logging_options", "val": ": bool = False"}, {"name": "eval_iters", "val": ": int = 100"}, {"name": "eval_interval", "val": ": int = 1000"}, {"name": "return_logits", "val": ": bool = False"}, {"name": "custom_train_step_class", "val": ": typing.Optional[typing.Any] = None"}, {"name": "custom_train_step_kwargs", "val": ": typing.Optional[dict[str, typing.Any]] = None"}, {"name": "custom_model_provider_function", "val": ": typing.Optional[typing.Callable] = None"}, {"name": "custom_prepare_model_function", "val": ": typing.Optional[typing.Callable] = None"}, {"name": "custom_megatron_datasets_provider_function", "val": ": typing.Optional[typing.Callable] = None"}, {"name": "custom_get_batch_function", "val": ": typing.Optional[typing.Callable] = None"}, {"name": "custom_loss_function", "val": ": typing.Optional[typing.Callable] = None"}, {"name": "other_megatron_args", "val": ": typing.Optional[dict[str, typing.Any]] = None"}]- **tp_degree** (`int`, defaults to `None`) -- Tensor parallelism degree. - **pp_degree** (`int`, defaults to `None`) -- Pipeline parallelism degree. - **num_micro_batches** (`int`, defaults to `None`) -- Number of micro-batches. - **gradient_clipping** (`float`, defaults to `None`) -- Gradient clipping value based on global L2 Norm (0 to disable). - **sequence_parallelism** (`bool`, defaults to `None`) -- Enable sequence parallelism. - **recompute_activations** (`bool`, defaults to `None`) -- Enable selective activation recomputation. - **use_distributed_optimizr** (`bool`, defaults to `None`) -- Enable distributed optimizer. - **pipeline_model_parallel_split_rank** (`int`, defaults to `None`) -- Rank where encoder and decoder should be split. - **num_layers_per_virtual_pipeline_stage** (`int`, defaults to `None`) -- Number of layers per virtual pipeline stage. - **is_train_batch_min** (`str`, defaults to `True`) -- If both tran & eval dataloaders are specified, this will decide the `micro_batch_size`. - **train_iters** (`int`, defaults to `None`) -- Total number of samples to train over all training runs. Note that either train-iters or train-samples should be provided when using `MegatronLMDummyScheduler`. - **train_samples** (`int`, defaults to `None`) -- Total number of samples to train over all training runs. Note that either train-iters or train-samples should be provided when using `MegatronLMDummyScheduler`. - **weight_decay_incr_style** (`str`, defaults to `'constant'`) -- Weight decay increment function. choices=["constant", "linear", "cosine"]. - **start_weight_decay** (`float`, defaults to `None`) -- Initial weight decay coefficient for L2 regularization. - **end_weight_decay** (`float`, defaults to `None`) -- End of run weight decay coefficient for L2 regularization. - **lr_decay_style** (`str`, defaults to `'linear'`) -- Learning rate decay function. choices=['constant', 'linear', 'cosine']. - **lr_decay_iters** (`int`, defaults to `None`) -- Number of iterations for learning rate decay. If None defaults to `train_iters`. - **lr_decay_samples** (`int`, defaults to `None`) -- Number of samples for learning rate decay. If None defaults to `train_samples`. - **lr_warmup_iters** (`int`, defaults to `None`) -- Number of iterations to linearly warmup learning rate over. - **lr_warmup_samples** (`int`, defaults to `None`) -- Number of samples to linearly warmup learning rate over. - **lr_warmup_fraction** (`float`, defaults to `None`) -- Fraction of lr-warmup-(iters/samples) to linearly warmup learning rate over. - **min_lr** (`float`, defaults to `0`) -- Minimum value for learning rate. The scheduler clip values below this threshold. - **consumed_samples** (`List`, defaults to `None`) -- Number of samples consumed in the same order as the dataloaders to `accelerator.prepare` call. - **no_wd_decay_cond** (`Optional`, defaults to `None`) -- Condition to disable weight decay. - **scale_lr_cond** (`Optional`, defaults to `None`) -- Condition to scale learning rate. - **lr_mult** (`float`, defaults to `1.0`) -- Learning rate multiplier. - **megatron_dataset_flag** (`bool`, defaults to `False`) -- Whether the format of dataset follows Megatron-LM Indexed/Cached/MemoryMapped format. - **seq_length** (`int`, defaults to `None`) -- Maximum sequence length to process. - **encoder_seq_length** (`int`, defaults to `None`) -- Maximum sequence length to process for the encoder. - **decoder_seq_length** (`int`, defaults to `None`) -- Maximum sequence length to process for the decoder. - **tensorboard_dir** (`str`, defaults to `None`) -- Path to save tensorboard logs. - **set_all_logging_options** (`bool`, defaults to `False`) -- Whether to set all logging options. - **eval_iters** (`int`, defaults to `100`) -- Number of iterations to run for evaluation validation/test for. - **eval_interval** (`int`, defaults to `1000`) -- Interval between running evaluation on validation set. - **return_logits** (`bool`, defaults to `False`) -- Whether to return logits from the model. - **custom_train_step_class** (`Optional`, defaults to `None`) -- Custom train step class. - **custom_train_step_kwargs** (`Optional`, defaults to `None`) -- Custom train step kwargs. - **custom_model_provider_function** (`Optional`, defaults to `None`) -- Custom model provider function. - **custom_prepare_model_function** (`Optional`, defaults to `None`) -- Custom prepare model function. - **custom_megatron_datasets_provider_function** (`Optional`, defaults to `None`) -- Custom megatron train_valid_test datasets provider function. - **custom_get_batch_function** (`Optional`, defaults to `None`) -- Custom get batch function. - **custom_loss_function** (`Optional`, defaults to `None`) -- Custom loss function. - **other_megatron_args** (`Optional`, defaults to `None`) -- Other Megatron-LM arguments. Please refer Megatron-LM.0 Plugin for Megatron-LM to enable tensor, pipeline, sequence and data parallelism. Also to enable selective activation recomputation and optimized fused kernels.

class accelerate.utils.MegatronLMDummyScheduleraccelerate.utils.MegatronLMDummySchedulerhttps://github.com/huggingface/accelerate/blob/v1.11.0/src/accelerate/utils/megatron_lm.py#L391[{"name": "optimizer", "val": ""}, {"name": "total_num_steps", "val": " = None"}, {"name": "warmup_num_steps", "val": " = 0"}, {"name": "**kwargs", "val": ""}]- **optimizer** (`torch.optim.optimizer.Optimizer`) -- The optimizer to wrap. - **total_num_steps** (int) -- Total number of steps. - **warmup_num_steps** (int) -- Number of steps for warmup. - ****kwargs** (additional keyword arguments, *optional*) -- Other arguments.0 Dummy scheduler presents model parameters or param groups, this is primarily used to follow conventional training loop when scheduler config is specified in the deepspeed config file.

class accelerate.utils.MegatronLMDummyDataLoaderaccelerate.utils.MegatronLMDummyDataLoaderhttps://github.com/huggingface/accelerate/blob/v1.11.0/src/accelerate/utils/megatron_lm.py#L175[{"name": "**dataset_kwargs", "val": ""}]- ****dataset_kwargs** -- Megatron data arguments.0 Dummy dataloader presents model parameters or param groups, this is primarily used to follow conventional training

class accelerate.utils.AbstractTrainStepaccelerate.utils.AbstractTrainStephttps://github.com/huggingface/accelerate/blob/v1.11.0/src/accelerate/utils/megatron_lm.py#L428[{"name": "name", "val": ""}] Abstract class for batching, forward pass and loss handler.

class accelerate.utils.GPTTrainStepaccelerate.utils.GPTTrainStephttps://github.com/huggingface/accelerate/blob/v1.11.0/src/accelerate/utils/megatron_lm.py#L587[{"name": "accelerator", "val": ""}, {"name": "args", "val": ""}]- **args** (`argparse.Namespace`) -- Megatron-LM arguments.0 GPT train step class.

class accelerate.utils.BertTrainStepaccelerate.utils.BertTrainStephttps://github.com/huggingface/accelerate/blob/v1.11.0/src/accelerate/utils/megatron_lm.py#L445[{"name": "accelerator", "val": ""}, {"name": "args", "val": ""}]- **args** (`argparse.Namespace`) -- Megatron-LM arguments.0 Bert train step class.

class accelerate.utils.T5TrainStepaccelerate.utils.T5TrainStephttps://github.com/huggingface/accelerate/blob/v1.11.0/src/accelerate/utils/megatron_lm.py#L719[{"name": "accelerator", "val": ""}, {"name": "args", "val": ""}]- **args** (`argparse.Namespace`) -- Megatron-LM arguments.0 T5 train step class.

accelerate.utils.avg_losses_across_data_parallel_groupaccelerate.utils.avg_losses_across_data_parallel_grouphttps://github.com/huggingface/accelerate/blob/v1.11.0/src/accelerate/utils/megatron_lm.py#L1393[{"name": "losses", "val": ""}]- **losses** (List[Tensor]) -- List of losses to average across data parallel group.0 Average losses across data parallel group.