Sharp Monocular View Synthesis in Less Than a Second (ONNX Edition)

Project Page arXiv

This software project is a communnity contribution and not affiliated with the original the research paper:

Sharp Monocular View Synthesis in Less Than a Second by Lars Mescheder, Wei Dong, Shiwei Li, Xuyang Bai, Marcel Santos, Peiyun Hu, Bruno Lecouat, Mingmin Zhen, AmaΓ«l Delaunoy, Tian Fang, Yanghai Tsin, Stephan Richter and Vladlen Koltun.

We present SHARP, an approach to photorealistic view synthesis from a single image. Given a single photograph, SHARP regresses the parameters of a 3D Gaussian representation of the depicted scene. This is done in less than a second on a standard GPU via a single feedforward pass through a neural network. The 3D Gaussian representation produced by SHARP can then be rendered in real time, yielding high-resolution photorealistic images for nearby views. The representation is metric, with absolute scale, supporting metric camera movements.

This release includes fully validated ONNX versions of SHARP (FP32 and FP16), optimized for cross-platform inference on Windows, Linux, and macOS.

Rendered using Splat Viewer

Getting started

πŸš€ Run Inference

Use the provided inference_onnx.py script to run SHARP inference:

# Run inference with FP16 model (faster, smaller)
python inference_onnx.py -m sharp_fp16.onnx -i test.png -o test.ply -d 0.5

CLI Options:

  • -m, --model: Path to ONNX model file
  • -i, --input: Path to input image (PNG, JPEG, etc.)
  • -o, --output: Path for output PLY file
  • -d, --decimate: Decimation ratio 0.0-1.0 (default: 1.0 = keep all)
  • --disparity-factor: Depth scale factor (default: 1.0)
  • --depth-scale: Depth exaggeration factor (default: 1.0)

Features:

  • Cross-platform ONNX Runtime inference (CPU/GPU)
  • Automatic image preprocessing and resizing
  • Gaussian decimation for reduced file sizes
  • PLY output compatible with all major 3D Gaussian viewers

Model Input and Output

πŸ“₯ Input

The ONNX model accepts two inputs:

  • image: A 3-channel RGB image in float32 format with shape (1, 3, H, W).

    • Values expected in range [0, 1] (normalized RGB).
    • Recommended resolution: 1536Γ—1536 (matches training size).
    • Aspect ratio preserved; input resized internally if needed.
  • disparity_factor: A scalar tensor of shape (1,) representing the ratio focal_length / image_width.

    • Use 1.0 for standard cameras (e.g., typical smartphone or DSLR).
    • Adjust to control depth scale: higher values = closer objects, lower values = farther scenes.

πŸ“€ Output

The model outputs five tensors representing a 3D Gaussian splat representation:

Output Shape Description
mean_vectors_3d_positions (1, N, 3) 3D positions in Normalized Device Coordinates (NDC) β€” x, y, z.
singular_values_scales (1, N, 3) Scale parameters along each principal axis (width, height, depth).
quaternions_rotations (1, N, 4) Unit quaternions [w, x, y, z] encoding orientation of each Gaussian.
colors_rgb_linear (1, N, 3) Linear RGB color values in range [0, 1] (no gamma correction).
opacities_alpha_channel (1, N) Opacity (alpha) values per Gaussian, in range [0, 1].

The total number of Gaussians N is approximately 1,179,648 for the default model.

Model Conversion

To convert SHARP from PyTorch to ONNX, use the provided conversion script:

# Convert to FP32 ONNX (higher precision)
python convert_onnx.py -o sharp.onnx --validate

# Convert to FP16 ONNX (faster inference, smaller model)
python convert_onnx.py -o sharp_fp16.onnx -q fp16 --validate

Conversion Options:

  • -c, --checkpoint: Path to PyTorch checkpoint (downloads from Apple if not provided)
  • -o, --output: Output ONNX model path
  • -q, --quantize: Quantization type (fp16 for half-precision)
  • --validate: Validate converted model against PyTorch reference
  • --input-image: Path to test image for validation

Requirements:

  • PyTorch and ml-sharp source code (automatically downloaded)
  • ONNX and ONNX Runtime for validation

Citation

If you find this work useful, please cite the original paper:

@inproceedings{Sharp2025:arxiv,
  title      = {Sharp Monocular View Synthesis in Less Than a Second},
  author     = {Lars Mescheder and Wei Dong and Shiwei Li and Xuyang Bai and Marcel Santos and Peiyun Hu and Bruno Lecouat and Mingmin Zhen and Ama\"{e}l Delaunoy and Tian Fang and Yanghai Tsin and Stephan R. Richter and Vladlen Koltun},
  journal    = {arXiv preprint arXiv:2512.10685},
  year       = {2025},
  url        = {https://arxiv.org/abs/2512.10685},
}
Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for pearsonkyle/Sharp-onnx

Base model

apple/Sharp
Quantized
(3)
this model

Paper for pearsonkyle/Sharp-onnx