Build Fails & Wheels Cause Runtime Error On Windows With RTX 3070 Ti (Ampere)
Introduction
The ComfyUI-Hunyuan3DWrapper
node has been a fascinating project, but unfortunately, it has been plagued by persistent issues on Windows with an RTX 3070 Ti (Ampere) GPU. The problems seem to be related to GPU architecture support and build environment conflicts. This article documents the attempts to resolve these issues, hoping to help diagnose the problem.
Environment Setup
The environment setup is crucial in understanding the issues at hand. Here's a summary of the key components:
- OS: Windows 11 Pro
- GPU: NVIDIA GeForce RTX 3070 Ti (Ampere Architecture, Compute Capability 8.6)
- NVIDIA Driver: 576.02 (Studio)
- ComfyUI: Portable Windows version (Latest update as of 2025-04-19, Revision
f3b09b9f
) - Python: Embedded Python 3.12.9 (with
include
andlibs
folders copied from full Python 3.12.9 installation) - Build Tools: Visual Studio 2022 Community Build Tools (with C++ Desktop workload), Ninja installed via pip.
- Installed CUDA Toolkits: v11.8, v12.1, v12.6, v12.8 (Located in
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA
) - PyTorch Versions Tried:
- Stable
cu121
builds (e.g., 2.3.1, 2.5.1, 2.6.0) - Nightly
cu121
build (2.6.0.dev...
) - Stable
cu126
build (2.6.0+cu126
)
- Stable
Summary of Problems
There are three main problems encountered:
1. Runtime Error with Wheels
Using either of the provided wheels (custom_rasterizer-0.1-cp312-cp312-win_amd64.whl
OR custom_rasterizer-0.1.0+torch260.cuda126-cp312-cp312-win_amd64.whl
) results in a runtime error when the Hy3DRenderMultiView
node executes. The error log is as follows:
RuntimeError: CUDA error: no kernel image is available for execution on the device
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
Exception raised from c10_cuda_check_implementation at C:\actions-runner\_work\pytorch\pytorch\pytorch\c10\cuda\CUDAException.cpp:43 (most recent call first):
[... Full DLL stack trace omitted for brevity ...]
Traceback (most recent call last):
File "F:\ComfyUI_windows_portable_test\ComfyUI_windows_portable\ComfyUI\execution.py", line 345, in execute
output_data, output_ui, has_subgraph = get_output_data(obj, input_data_all, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb)
File "F:\ComfyUI_windows_portable_test\ComfyUI_windows_portable\ComfyUI\execution.py", line 220, in get_output_data
return_values = _map_node_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb)
File "F:\ComfyUI_windows_portable_test\ComfyUI_windows_portable\ComfyUI\execution.py", line 192, in _map_node_over_list
process_inputs(input_dict, i)
File "F:\ComfyUI_windows_portable_test\ComfyUI_windows_portable\ComfyUI\execution.py", line 181, in process_inputs
results.append(getattr(obj, func)(**inputs))
File "F:\ComfyUI_windows_portable_test\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI-Hunyuan3DWrapper\nodes.py", line 501, in process
normal_maps, masks = self.render_normal_multiview(...)
File "F:\ComfyUI_windows_portable_test\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI-Hunyuan3DWrapper\nodes.py", line 538, in render_normal_multiview
normal_map, mask = self.render.render_normal(...)
File "F:\ComfyUI_windows_portable_test\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI-Hunyuan3DWrapper\hy3dgen\texgen\differentiable_renderer\mesh_render.py", line 456, in render_normal
rast_out, rast_out_db = self.raster_rasterize(...)
File "F:\ComfyUI_windows_portable_test\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI-Hunyuan3DWrapper\hy3dgen\texgen\differentiable_renderer\mesh_render.py", line 184, in raster_rasterize
findices, barycentric = self.raster.rasterize(pos, tri, resolution)
File "F:\ComfyUI_windows_portable_test\ComfyUI_windows_portable\python_embeded\Lib\site-packages\custom_rasterizer\render.py", line 31, in rasterize
findices, barycentric = custom_rasterizer_kernel.rasterize_image(pos[0], tri, clamp_depth, resolution[1], ...)
RuntimeError: CUDA error: no kernel image is available for execution on the device
...
This error occurs consistently across different stable and nightly PyTorch versions, suggesting that these wheels were not compiled with support for Ampere (Compute Capability 8.6).
2. Runtime Error with Nightly + Generic Wheel
Following advice in issue #46, using the PyTorch Nightly (2.6.0.dev... +cu121
) combined with the generic wheel (...0.1-cp312...whl
) results in a different runtime error:
Traceback (most recent call last):
File "F:\ComfyUI_windows_portable_test\ComfyUI_windows_portable\ComfyUI\execution.py", line 345, in execute
output_data, output_ui, has_subgraph = get_output_data(obj, input_data_all, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb)
File "F:\ComfyUI_windows_portable_test\ComfyUI_windows_portable\ComfyUI\execution.py", line 220, in get_output_data
return_values = _map_node_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb)
File "F:\ComfyUI_windows_portable_test\ComfyUI_windows_portable\ComfyUI\execution.py", line 192, in _map_node_over_list
process_inputs(input_dict, i)
File "F:\ComfyUI_windows_portable_test\ComfyUI_windows_portable\ComfyUI\execution.py", line 181, in process_inputs
results.append(getattr(obj, func)(**inputs))
File "F:\ComfyUI_windows_portable_test\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI-Hunyuan3DWrapper\nodes.py", line 492, in process
self.render = MeshRender(...)
File "F:\ComfyUI_windows_portable_test\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI-Hunyuan3DWrapper\hy3dgen\texgen\differentiable_renderer\mesh_render.py", line 158, in __init__
import custom_rasterizer as cr
File "F:\ComfyUI_windows_portable_test\ComfyUI_windows_portable\python_embeded\Lib\site-packages\custom_rasterizer\__init__.py", line 32, in <module>
from .render import *
File "F:\ComfyUI_windows_portable_test\ComfyUI_windows_portable\python_embeded\Lib\site-packages\custom_rasterizer\render.py", line 25, in <module>
import custom_rasterizer_kernel
ImportError: DLL load failed while importing custom_rasterizer_kernel: The specified procedure could not be found.
This error indicates a runtime linking incompatibility between the generic wheel (likely built against stable PyTorch) and the PyTorch Nightly DLLs.
3. Build Failures (Building from Source)
Attempts to build custom_rasterizer
from source (pip install .
or setup.py bdist_wheel
within the Developer PowerShell) failed consistently:
- Without
CUDA_HOME
set: The build fails detecting the system's CUDA 11.8, causing a mismatch with the installed PyTorchcu121
orcu126
.
RuntimeError:
The detected CUDA version (11.8) mismatches the version that was used to compile
PyTorch (12.1). Please make sure to use the same CUDA versions.
- With
CUDA_HOME
set to v12.1 or v12.6: The build proceeds further but fails during C
Q&A: Build Fails & Wheels Cause Runtime Error on Windows with RTX 3070 Ti (Ampere)
Q: What is the issue with the ComfyUI-Hunyuan3DWrapper
node on Windows with an RTX 3070 Ti (Ampere) GPU?
A: The ComfyUI-Hunyuan3DWrapper
node has been plagued by persistent issues on Windows with an RTX 3070 Ti (Ampere) GPU. The problems seem to be related to GPU architecture support and build environment conflicts.
Q: What are the specific problems encountered?
A: There are three main problems encountered:
- Runtime Error with Wheels: Using either of the provided wheels (
custom_rasterizer-0.1-cp312-cp312-win_amd64.whl
ORcustom_rasterizer-0.1.0+torch260.cuda126-cp312-cp312-win_amd64.whl
) results in a runtime error when theHy3DRenderMultiView
node executes. - Runtime Error with Nightly + Generic Wheel: Using the PyTorch Nightly (
2.6.0.dev... +cu121
) combined with the generic wheel (...0.1-cp312...whl
) results in a different runtime error. - Build Failures (Building from Source): Attempts to build
custom_rasterizer
from source (pip install .
orsetup.py bdist_wheel
within the Developer PowerShell) failed consistently.
Q: What are the possible causes of these issues?
A: The possible causes of these issues are:
- Lack of Ampere Support: The wheels appear to lack support for Ampere (Compute Capability 8.6).
- Build Environment Conflicts: The build process is blocked by environment detection issues and header conflicts.
- Runtime Linking Incompatibility: The generic wheel (likely built against stable PyTorch) and the PyTorch Nightly DLLs have a runtime linking incompatibility.
Q: What can be done to resolve these issues?
A: To resolve these issues, the following steps can be taken:
- Provide a Pre-Compiled Wheel: Provide a pre-compiled wheel for
custom_rasterizer
(compatible with Python 3.12 and recent PyTorchcu121
orcu126
) that explicitly includes support for Compute Capability 8.6 (Ampere). - Update the Build Process: Update the build process (
setup.py
) to correctly handle Windows environments with multiple CUDA Toolkits installed and resolve the C2373 header conflicts.
Q: What is the current status of the ComfyUI-Hunyuan3DWrapper
node?
A: The ComfyUI-Hunyuan3DWrapper
node is currently not working on Windows with an RTX 3070 Ti (Ampere) GPU due to the persistent issues encountered.
Q: What is the next step to resolve these issues?
A: The next step is to investigate these issues further and provide a solution to resolve the build fails and runtime errors on Windows with an RTX 3070 Ti (Ampere) GPU.