Sharp Monocular View Synthesis in Less Than a Second (ONNX Edition)
This software project is a communnity contribution and not affiliated with the original the research paper:
Sharp Monocular View Synthesis in Less Than a Second by Lars Mescheder, Wei Dong, Shiwei Li, Xuyang Bai, Marcel Santos, Peiyun Hu, Bruno Lecouat, Mingmin Zhen, AmaΓ«l Delaunoy, Tian Fang, Yanghai Tsin, Stephan Richter and Vladlen Koltun.
We present SHARP, an approach to photorealistic view synthesis from a single image. Given a single photograph, SHARP regresses the parameters of a 3D Gaussian representation of the depicted scene. This is done in less than a second on a standard GPU via a single feedforward pass through a neural network. The 3D Gaussian representation produced by SHARP can then be rendered in real time, yielding high-resolution photorealistic images for nearby views. The representation is metric, with absolute scale, supporting metric camera movements.
This release includes fully validated ONNX versions of SHARP (FP32 and FP16), optimized for cross-platform inference on Windows, Linux, and macOS.
Rendered using Splat Viewer
Getting started
π Run Inference
Use the provided inference_onnx.py script to run SHARP inference:
# Run inference with FP16 model (faster, smaller)
python inference_onnx.py -m sharp_fp16.onnx -i test.png -o test.ply -d 0.5
CLI Options:
-m, --model: Path to ONNX model file-i, --input: Path to input image (PNG, JPEG, etc.)-o, --output: Path for output PLY file-d, --decimate: Decimation ratio 0.0-1.0 (default: 1.0 = keep all)--disparity-factor: Depth scale factor (default: 1.0)--depth-scale: Depth exaggeration factor (default: 1.0)
Features:
- Cross-platform ONNX Runtime inference (CPU/GPU)
- Automatic image preprocessing and resizing
- Gaussian decimation for reduced file sizes
- PLY output compatible with all major 3D Gaussian viewers
Model Input and Output
π₯ Input
The ONNX model accepts two inputs:
image: A 3-channel RGB image infloat32format with shape(1, 3, H, W).- Values expected in range
[0, 1](normalized RGB). - Recommended resolution:
1536Γ1536(matches training size). - Aspect ratio preserved; input resized internally if needed.
- Values expected in range
disparity_factor: A scalar tensor of shape(1,)representing the ratiofocal_length / image_width.- Use
1.0for standard cameras (e.g., typical smartphone or DSLR). - Adjust to control depth scale: higher values = closer objects, lower values = farther scenes.
- Use
π€ Output
The model outputs five tensors representing a 3D Gaussian splat representation:
| Output | Shape | Description |
|---|---|---|
mean_vectors_3d_positions |
(1, N, 3) |
3D positions in Normalized Device Coordinates (NDC) β x, y, z. |
singular_values_scales |
(1, N, 3) |
Scale parameters along each principal axis (width, height, depth). |
quaternions_rotations |
(1, N, 4) |
Unit quaternions [w, x, y, z] encoding orientation of each Gaussian. |
colors_rgb_linear |
(1, N, 3) |
Linear RGB color values in range [0, 1] (no gamma correction). |
opacities_alpha_channel |
(1, N) |
Opacity (alpha) values per Gaussian, in range [0, 1]. |
The total number of Gaussians N is approximately 1,179,648 for the default model.
Model Conversion
To convert SHARP from PyTorch to ONNX, use the provided conversion script:
# Convert to FP32 ONNX (higher precision)
python convert_onnx.py -o sharp.onnx --validate
# Convert to FP16 ONNX (faster inference, smaller model)
python convert_onnx.py -o sharp_fp16.onnx -q fp16 --validate
Conversion Options:
-c, --checkpoint: Path to PyTorch checkpoint (downloads from Apple if not provided)-o, --output: Output ONNX model path-q, --quantize: Quantization type (fp16for half-precision)--validate: Validate converted model against PyTorch reference--input-image: Path to test image for validation
Requirements:
- PyTorch and ml-sharp source code (automatically downloaded)
- ONNX and ONNX Runtime for validation
Citation
If you find this work useful, please cite the original paper:
@inproceedings{Sharp2025:arxiv,
title = {Sharp Monocular View Synthesis in Less Than a Second},
author = {Lars Mescheder and Wei Dong and Shiwei Li and Xuyang Bai and Marcel Santos and Peiyun Hu and Bruno Lecouat and Mingmin Zhen and Ama\"{e}l Delaunoy and Tian Fang and Yanghai Tsin and Stephan R. Richter and Vladlen Koltun},
journal = {arXiv preprint arXiv:2512.10685},
year = {2025},
url = {https://arxiv.org/abs/2512.10685},
}
- Downloads last month
- -
Model tree for pearsonkyle/Sharp-onnx
Base model
apple/Sharp