Build Fails & Wheels Cause Runtime Error On Windows With RTX 3070 Ti (Ampere)

by ADMIN 78 views

Introduction

The ComfyUI-Hunyuan3DWrapper node has been a fascinating project, but unfortunately, it has been plagued by persistent issues on Windows with an RTX 3070 Ti (Ampere) GPU. The problems seem to be related to GPU architecture support and build environment conflicts. This article documents the attempts to resolve these issues, hoping to help diagnose the problem.

Environment Setup

The environment setup is crucial in understanding the issues at hand. Here's a summary of the key components:

  • OS: Windows 11 Pro
  • GPU: NVIDIA GeForce RTX 3070 Ti (Ampere Architecture, Compute Capability 8.6)
  • NVIDIA Driver: 576.02 (Studio)
  • ComfyUI: Portable Windows version (Latest update as of 2025-04-19, Revision f3b09b9f)
  • Python: Embedded Python 3.12.9 (with include and libs folders copied from full Python 3.12.9 installation)
  • Build Tools: Visual Studio 2022 Community Build Tools (with C++ Desktop workload), Ninja installed via pip.
  • Installed CUDA Toolkits: v11.8, v12.1, v12.6, v12.8 (Located in C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA)
  • PyTorch Versions Tried:
    • Stable cu121 builds (e.g., 2.3.1, 2.5.1, 2.6.0)
    • Nightly cu121 build (2.6.0.dev...)
    • Stable cu126 build (2.6.0+cu126)

Summary of Problems

There are three main problems encountered:

1. Runtime Error with Wheels

Using either of the provided wheels (custom_rasterizer-0.1-cp312-cp312-win_amd64.whl OR custom_rasterizer-0.1.0+torch260.cuda126-cp312-cp312-win_amd64.whl) results in a runtime error when the Hy3DRenderMultiView node executes. The error log is as follows:

RuntimeError: CUDA error: no kernel image is available for execution on the device
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

Exception raised from c10_cuda_check_implementation at C:\actions-runner\_work\pytorch\pytorch\pytorch\c10\cuda\CUDAException.cpp:43 (most recent call first):
[... Full DLL stack trace omitted for brevity ...]

Traceback (most recent call last):
  File "F:\ComfyUI_windows_portable_test\ComfyUI_windows_portable\ComfyUI\execution.py", line 345, in execute
    output_data, output_ui, has_subgraph = get_output_data(obj, input_data_all, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb)
  File "F:\ComfyUI_windows_portable_test\ComfyUI_windows_portable\ComfyUI\execution.py", line 220, in get_output_data
    return_values = _map_node_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb)
  File "F:\ComfyUI_windows_portable_test\ComfyUI_windows_portable\ComfyUI\execution.py", line 192, in _map_node_over_list
    process_inputs(input_dict, i)
  File "F:\ComfyUI_windows_portable_test\ComfyUI_windows_portable\ComfyUI\execution.py", line 181, in process_inputs
    results.append(getattr(obj, func)(**inputs))
  File "F:\ComfyUI_windows_portable_test\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI-Hunyuan3DWrapper\nodes.py", line 501, in process
    normal_maps, masks = self.render_normal_multiview(...)
  File "F:\ComfyUI_windows_portable_test\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI-Hunyuan3DWrapper\nodes.py", line 538, in render_normal_multiview
    normal_map, mask = self.render.render_normal(...)
  File "F:\ComfyUI_windows_portable_test\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI-Hunyuan3DWrapper\hy3dgen\texgen\differentiable_renderer\mesh_render.py", line 456, in render_normal
    rast_out, rast_out_db = self.raster_rasterize(...)
  File "F:\ComfyUI_windows_portable_test\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI-Hunyuan3DWrapper\hy3dgen\texgen\differentiable_renderer\mesh_render.py", line 184, in raster_rasterize
    findices, barycentric = self.raster.rasterize(pos, tri, resolution)
  File "F:\ComfyUI_windows_portable_test\ComfyUI_windows_portable\python_embeded\Lib\site-packages\custom_rasterizer\render.py", line 31, in rasterize
    findices, barycentric = custom_rasterizer_kernel.rasterize_image(pos[0], tri, clamp_depth, resolution[1], ...)
RuntimeError: CUDA error: no kernel image is available for execution on the device
...

This error occurs consistently across different stable and nightly PyTorch versions, suggesting that these wheels were not compiled with support for Ampere (Compute Capability 8.6).

2. Runtime Error with Nightly + Generic Wheel

Following advice in issue #46, using the PyTorch Nightly (2.6.0.dev... +cu121) combined with the generic wheel (...0.1-cp312...whl) results in a different runtime error:

Traceback (most recent call last):
  File "F:\ComfyUI_windows_portable_test\ComfyUI_windows_portable\ComfyUI\execution.py", line 345, in execute
    output_data, output_ui, has_subgraph = get_output_data(obj, input_data_all, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb)
  File "F:\ComfyUI_windows_portable_test\ComfyUI_windows_portable\ComfyUI\execution.py", line 220, in get_output_data
    return_values = _map_node_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb)
  File "F:\ComfyUI_windows_portable_test\ComfyUI_windows_portable\ComfyUI\execution.py", line 192, in _map_node_over_list
    process_inputs(input_dict, i)
  File "F:\ComfyUI_windows_portable_test\ComfyUI_windows_portable\ComfyUI\execution.py", line 181, in process_inputs
    results.append(getattr(obj, func)(**inputs))
  File "F:\ComfyUI_windows_portable_test\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI-Hunyuan3DWrapper\nodes.py", line 492, in process
    self.render = MeshRender(...)
  File "F:\ComfyUI_windows_portable_test\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI-Hunyuan3DWrapper\hy3dgen\texgen\differentiable_renderer\mesh_render.py", line 158, in __init__
    import custom_rasterizer as cr
  File "F:\ComfyUI_windows_portable_test\ComfyUI_windows_portable\python_embeded\Lib\site-packages\custom_rasterizer\__init__.py", line 32, in <module>
    from .render import *
  File "F:\ComfyUI_windows_portable_test\ComfyUI_windows_portable\python_embeded\Lib\site-packages\custom_rasterizer\render.py", line 25, in <module>
    import custom_rasterizer_kernel
ImportError: DLL load failed while importing custom_rasterizer_kernel: The specified procedure could not be found.

This error indicates a runtime linking incompatibility between the generic wheel (likely built against stable PyTorch) and the PyTorch Nightly DLLs.

3. Build Failures (Building from Source)

Attempts to build custom_rasterizer from source (pip install . or setup.py bdist_wheel within the Developer PowerShell) failed consistently:

  • Without CUDA_HOME set: The build fails detecting the system's CUDA 11.8, causing a mismatch with the installed PyTorch cu121 or cu126.
RuntimeError:
The detected CUDA version (11.8) mismatches the version that was used to compile
PyTorch (12.1). Please make sure to use the same CUDA versions.
  • With CUDA_HOME set to v12.1 or v12.6: The build proceeds further but fails during C
    Q&A: Build Fails & Wheels Cause Runtime Error on Windows with RTX 3070 Ti (Ampere)

Q: What is the issue with the ComfyUI-Hunyuan3DWrapper node on Windows with an RTX 3070 Ti (Ampere) GPU?

A: The ComfyUI-Hunyuan3DWrapper node has been plagued by persistent issues on Windows with an RTX 3070 Ti (Ampere) GPU. The problems seem to be related to GPU architecture support and build environment conflicts.

Q: What are the specific problems encountered?

A: There are three main problems encountered:

  1. Runtime Error with Wheels: Using either of the provided wheels (custom_rasterizer-0.1-cp312-cp312-win_amd64.whl OR custom_rasterizer-0.1.0+torch260.cuda126-cp312-cp312-win_amd64.whl) results in a runtime error when the Hy3DRenderMultiView node executes.
  2. Runtime Error with Nightly + Generic Wheel: Using the PyTorch Nightly (2.6.0.dev... +cu121) combined with the generic wheel (...0.1-cp312...whl) results in a different runtime error.
  3. Build Failures (Building from Source): Attempts to build custom_rasterizer from source (pip install . or setup.py bdist_wheel within the Developer PowerShell) failed consistently.

Q: What are the possible causes of these issues?

A: The possible causes of these issues are:

  1. Lack of Ampere Support: The wheels appear to lack support for Ampere (Compute Capability 8.6).
  2. Build Environment Conflicts: The build process is blocked by environment detection issues and header conflicts.
  3. Runtime Linking Incompatibility: The generic wheel (likely built against stable PyTorch) and the PyTorch Nightly DLLs have a runtime linking incompatibility.

Q: What can be done to resolve these issues?

A: To resolve these issues, the following steps can be taken:

  1. Provide a Pre-Compiled Wheel: Provide a pre-compiled wheel for custom_rasterizer (compatible with Python 3.12 and recent PyTorch cu121 or cu126) that explicitly includes support for Compute Capability 8.6 (Ampere).
  2. Update the Build Process: Update the build process (setup.py) to correctly handle Windows environments with multiple CUDA Toolkits installed and resolve the C2373 header conflicts.

Q: What is the current status of the ComfyUI-Hunyuan3DWrapper node?

A: The ComfyUI-Hunyuan3DWrapper node is currently not working on Windows with an RTX 3070 Ti (Ampere) GPU due to the persistent issues encountered.

Q: What is the next step to resolve these issues?

A: The next step is to investigate these issues further and provide a solution to resolve the build fails and runtime errors on Windows with an RTX 3070 Ti (Ampere) GPU.