Transformers documentation
Decision Transformer
Decision Transformer
Overview
Decision Transformer モデルは、Decision Transformer: Reinforcement Learning via Sequence Modeling で提案されました。 Lili Chen, Kevin Lu, Aravind Rajeswaran, Kimin Lee, Aditya Grover, Michael Laskin, Pieter Abbeel, Aravind Srinivas, Igor Mordatch.
論文の要約は次のとおりです。
強化学習(RL)をシーケンスモデリング問題として抽象化するフレームワークを紹介します。 これにより、Transformer アーキテクチャのシンプルさとスケーラビリティ、および関連する進歩を活用できるようになります。 GPT-x や BERT などの言語モデリングで。特に、Decision Transformer というアーキテクチャを紹介します。 RL の問題を条件付きシーケンス モデリングとして投げかけます。値関数に適合する以前の RL アプローチとは異なり、 ポリシー勾配を計算すると、Decision Transformer は因果的にマスクされたアルゴリズムを利用して最適なアクションを出力するだけです。 変成器。望ましいリターン (報酬)、過去の状態、アクションに基づいて自己回帰モデルを条件付けすることにより、 Decision Transformer モデルは、望ましいリターンを達成する将来のアクションを生成できます。そのシンプルさにも関わらず、 Decision Transformer は、最先端のモデルフリーのオフライン RL ベースラインのパフォーマンスと同等、またはそれを超えています。 Atari、OpenAI Gym、Key-to-Door タスク
このバージョンのモデルは、状態がベクトルであるタスク用です。
このモデルは、edbeeching によって提供されました。元のコードは ここ にあります。
DecisionTransformerConfig
class transformers.DecisionTransformerConfig
< source >( transformers_version: str | None = None architectures: list[str] | None = None output_hidden_states: bool | None = False return_dict: bool | None = True dtype: typing.Union[str, ForwardRef('torch.dtype'), NoneType] = None chunk_size_feed_forward: int = 0 is_encoder_decoder: bool = False id2label: dict[int, str] | dict[str, str] | None = None label2id: dict[str, int] | dict[str, str] | None = None problem_type: typing.Optional[typing.Literal['regression', 'single_label_classification', 'multi_label_classification']] = None state_dim: int = 17 act_dim: int = 4 hidden_size: int = 128 max_ep_len: int = 4096 action_tanh: bool = True vocab_size: int = 1 n_positions: int = 1024 n_layer: int = 3 n_head: int = 1 n_inner: int | None = None activation_function: str = 'relu' resid_pdrop: float = 0.1 embd_pdrop: float = 0.1 attn_pdrop: float = 0.1 layer_norm_epsilon: float = 1e-05 initializer_range: float = 0.02 scale_attn_weights: bool = True use_cache: bool = True bos_token_id: int | None = 50256 eos_token_id: int | list[int] | None = 50256 scale_attn_by_inverse_layer_idx: bool = False reorder_and_upcast_attn: bool = False add_cross_attention: bool = False )
Parameters
- state_dim (
int, optional, defaults to 17) — The state size for the RL environment - act_dim (
int, optional, defaults to 4) — The size of the output action space - hidden_size (
int, optional, defaults to128) — Dimension of the hidden representations. - max_ep_len (
int, optional, defaults to 4096) — The maximum length of an episode in the environment - action_tanh (
bool, optional, defaults to True) — Whether to use a tanh activation on action prediction - vocab_size (
int, optional, defaults to1) — Vocabulary size of the model. Defines the number of different tokens that can be represented by theinput_ids. - n_positions (
int, optional, defaults to1024) — The maximum sequence length that this model might ever be used with. - n_layer (
int, optional, defaults to3) — Number of hidden layers in the Transformer decoder. - n_head (
int, optional, defaults to1) — Number of attention heads for each attention layer in the Transformer decoder. - n_inner (
int, optional) — Dimension of the MLP representations. - activation_function (
str, optional, defaults torelu) — The non-linear activation function (function or string) in the decoder. For example,"gelu","relu","silu", etc. - resid_pdrop (
float, optional, defaults to0.1) — The dropout probability for all fully connected layers in the embeddings, encoder, and pooler. - embd_pdrop (
float, optional, defaults to0.1) — The dropout ratio for the embeddings. - attn_pdrop (
float, optional, defaults to0.1) — The dropout ratio for the attention probabilities. - layer_norm_epsilon (
float, optional, defaults to1e-05) — The epsilon used by the layer normalization layers. - initializer_range (
float, optional, defaults to0.02) — The standard deviation of the truncated_normal_initializer for initializing all weight matrices. - scale_attn_weights (
bool, optional, defaults toTrue) — Scale attention weights by dividing by sqrt(hidden_size).. - use_cache (
bool, optional, defaults toTrue) — Whether or not the model should return the last key/values attentions (not used by all models). Only relevant ifconfig.is_decoder=Trueor when the model is a decoder-only generative model. - bos_token_id (
int, optional, defaults to50256) — Token id used for beginning-of-stream in the vocabulary. - eos_token_id (
Union[int, list[int]], optional, defaults to50256) — Token id used for end-of-stream in the vocabulary. - scale_attn_by_inverse_layer_idx (
bool, optional, defaults toFalse) — Whether to additionally scale attention weights by1 / layer_idx + 1. - reorder_and_upcast_attn (
bool, optional, defaults toFalse) — Whether to scale keys (K) prior to computing attention (dot-product) and upcast attention dot-product/softmax to float() when training with mixed precision. - add_cross_attention (
bool, optional, defaults toFalse) — Whether cross-attention layers should be added to the model.
This is the configuration class to store the configuration of a DecisionTransformerModel. It is used to instantiate a Decision Transformer model according to the specified arguments, defining the model architecture. Instantiating a configuration with the defaults will yield a similar configuration to that of the
Configuration objects inherit from PreTrainedConfig and can be used to control the model outputs. Read the documentation from PreTrainedConfig for more information.
Example:
>>> from transformers import DecisionTransformerConfig, DecisionTransformerModel
>>> # Initializing a DecisionTransformer configuration
>>> configuration = DecisionTransformerConfig()
>>> # Initializing a model (with random weights) from the configuration
>>> model = DecisionTransformerModel(configuration)
>>> # Accessing the model configuration
>>> configuration = model.configDecisionTransformerGPT2Model
[[autodoc]] DecisionTransformerGPT2Model - forward
DecisionTransformerModel
[[autodoc]] DecisionTransformerModel - forward
Update on GitHub