Trackio documentation

Track

Hugging Face's logo
Join the Hugging Face community

and get access to the augmented documentation experience

to get started

Track

Introduction

Trackio helps you organize your experiments within a project. A project is a collection of runs, where each run represents a single execution of your code with a specific set of parameters and results.

Initialization

To start tracking an experiment with Trackio, you first need to initialize a project with the init() function:

import trackio

trackio.init(project="my_project")
  • If the project already exists, it will be loaded.
  • If not, Trackio will create a new one.

In both cases, a new run is started automatically, ready for you to log data.

Naming your run

It’s a good idea to give each run a meaningful name for easier organization and later reference. You can set a name using the name parameter:

trackio.init(project="my_project", name="my_first_run")

If no name is provided, Trackio generates a default one.

Grouping runs

You can organize related runs into groups using the group parameter. This is particularly useful when you’re running multiple experiments with different configurations but want to compare them together:

# Group runs by experiment type
trackio.init(project="my_project", name="baseline_run_1", group="baseline")
trackio.init(project="my_project", name="augmented_run_1", group="augmented")
trackio.init(project="my_project", name="tuned_run_1", group="tuned")

Runs with the same group name can be grouped together in sidebar, making it easier to compare related experiments. You can also group runs by any other configuration parameter (see Tracking Configuration below).

Logging Data

Once your run is initialized, you can start logging data using the log() function:

trackio.log({"loss": 0.05})

Each call to log() automatically increments the step counter. If you want to log multiple metrics at once, pass them together:

trackio.log({
    "loss": 0.05,
    "accuracy": 0.95,
})

Logging tables

You can log tabular data using the Table class. This is useful for tracking results like predictions, or any structured data. Tables can include image columns using the Image class.

import pandas as pd

df = pd.DataFrame(
    {
        "prompt": ["Trackio", "Logging is"],
        "completion": ["is great!", "easy and fun!"],
        "reward": [0.123, 0.456],
    }
)
trackio.log(
    {
        ...
        "texts": trackio.Table(dataframe=df),
    }
)

Logging images

You can log images using the Image class.

trackio.log({"image": trackio.Image(value="path/to/image.png", caption="Image caption")})

Images can be logged from a path, a numpy array, or a PIL Image.

Logging videos

You can log videos using the Video class.

import trackio
import numpy as np

# Create a simple video from numpy array
frames = np.random.randint(0, 255, (10, 3, 64, 64), dtype=np.uint8)
video = trackio.Video(frames, caption="Random video", fps=30)
trackio.log({"my_video": video})

# Create a batch of videos
batch_frames = np.random.randint(0, 255, (3, 10, 3, 64, 64), dtype=np.uint8)
batch_video = trackio.Video(batch_frames, caption="Batch of videos", fps=15)
trackio.log({"batch_videos": batch_video})

# Create video from file path
video = trackio.Video("path/to/video.mp4", caption="Video from file")
trackio.log({"file_video": video})

Videos can be logged from a file path or a numpy array.

Numpy array requirements:

  • Must be of type np.uint8 with RGB values in the range [0, 255]
  • Shape should be either:
    • (frames, channels, height, width) for a single video
    • (batch, frames, channels, height, width) for multiple videos (will be tiled into a grid)

Logging audio

You can log audio using the Audio class.

import trackio
import numpy as np

# Generate a 1-second 440 Hz sine wave (mono)
sr = 16000
t = np.linspace(0, 1, sr, endpoint=False)
wave = 0.2 * np.sin(2 * np.pi * 440 * t)
audio = trackio.Audio(wave, caption="A4 sine", sample_rate=sr, format="wav")
trackio.log({"tone": audio})

# Stereo from numpy array (shape: samples, 2)
stereo = np.stack([wave, wave], axis=1)
audio = trackio.Audio(stereo, caption="Stereo", sample_rate=sr, format="mp3")
trackio.log({"stereo": audio})

# From an existing file
audio = trackio.Audio("path/to/audio.wav", caption="From file")
trackio.log({"file_audio": audio})

Audio can be logged from a file path or a numpy array.

Numpy array requirements:

  • Shape should be either (samples,) for mono or (samples, 2) for stereo
  • sample_rate must be provided when logging from a numpy array
  • Values may be float or integer; floats are peak-normalized and converted to 16-bit PCM
  • format can be "wav" or "mp3" when logging from a numpy array (default "wav")

Logging GPU metrics

If you’re training on NVIDIA GPUs, you can log GPU metrics (utilization, memory, temperature, power, etc.). This requires the nvidia-ml-py package, which is automatically installed as part of the gpu extra:

pip install trackio[gpu]

Automatic logging (default):

When nvidia-ml-py is installed and an NVIDIA GPU is detected, GPU metrics are logged automatically in the background (every 10 seconds by default):

import trackio

# GPU logging is auto-enabled when nvidia-ml-py is installed and GPU is detected
trackio.init(project="my_project")

for step in range(100):
    # ... training code ...
    trackio.log({"loss": loss})
# GPU metrics are logged automatically in the background

trackio.finish()

You can customize the interval or disable auto-logging:

# Custom interval
trackio.init(project="my_project", gpu_log_interval=5.0)

# Disable auto-logging
trackio.init(project="my_project", auto_log_gpu=False)

Manual logging:

You can also log GPU metrics manually at specific times using log_gpu():

import trackio

trackio.init(project="my_project", auto_log_gpu=False)

for step in range(100):
    # ... training code ...
    trackio.log({"loss": loss})
    trackio.log_gpu()  # Log GPU metrics at current time

trackio.finish()

Logged metrics:

Per-GPU metrics (gpu/{i}/{metric}):

  • gpu/0/utilization - GPU utilization %
  • gpu/0/memory_utilization - Memory controller utilization %
  • gpu/0/allocated_memory - Memory allocated in GiB
  • gpu/0/total_memory - Total memory in GiB
  • gpu/0/memory_usage - Memory usage ratio (0-1)
  • gpu/0/temp - Temperature in Celsius
  • gpu/0/power - Power draw in watts
  • gpu/0/power_percent - Power as % of limit
  • gpu/0/power_limit - Power limit in watts
  • gpu/0/sm_clock - SM clock speed in MHz
  • gpu/0/memory_clock - Memory clock speed in MHz
  • gpu/0/fan_speed - Fan speed %
  • gpu/0/performance_state - Performance state (P0-P15)
  • gpu/0/energy_consumed - Energy consumed since run start in Joules
  • gpu/0/pcie_tx - PCIe transmit bandwidth in MB/s
  • gpu/0/pcie_rx - PCIe receive bandwidth in MB/s
  • gpu/0/throttle_thermal - Thermal throttling (0/1)
  • gpu/0/throttle_power - Power throttling (0/1)
  • gpu/0/throttle_hw_slowdown - Hardware slowdown (0/1)
  • gpu/0/throttle_apps - Application clock throttling (0/1)
  • gpu/0/corrected_memory_errors - ECC corrected errors
  • gpu/0/uncorrected_memory_errors - ECC uncorrected errors

Aggregated metrics:

  • gpu/mean_utilization - Mean GPU utilization across all GPUs
  • gpu/total_allocated_memory - Total memory used across all GPUs in GiB
  • gpu/total_power - Total power draw across all GPUs
  • gpu/max_temp - Maximum temperature across all GPUs

Finishing a Run

When your run is complete, finalize it with finish(). This marks the run as completed and saves all logged data:

trackio.finish()

Resuming a Run

If you need to continue a run (for example, after an interruption), you can resume it by calling init() again with the same project and run name, and setting resume="must":

trackio.init(project="my_project", name="my_first_run", resume="must")

This will load the existing run so you can keep logging data.

For more flexibility, use resume="allow". This will resume the run if it exists, or create a new one otherwise.

Tracking Configuration

You can also track configuration parameters for your runs. This is useful for keeping track of hyperparameters or other settings used in your experiments. You can log configuration data using the config parameter in the init() function:

for batch_size in [16, 32, 64]:
    for lr in [0.001, 0.01, 0.1]:
        trackio.init(
            project="hyperparameter_tuning",
            name=f"lr_{lr}_batch_{batch_size}_run",
            config={
                "learning_rate": lr,
                "batch_size": batch_size,
            }
        )
        # ... your training code ...
        trackio.finish()

In the dashboard, you can then group by “learning_rate” or “batch_size” to more easily compare runs with different hyperparameters.

Update on GitHub