In my most recent project, I implemented a GPU path and bidirectional path tracer. I already previously used OptiX for my bachelor thesis, and I thought it is a good idea to use it also for this. I mean, it was created for ray tracing, so it should be predestined for such kind of GPU application.
One of the biggest advantages for me were the reliable and fast acceleration structures, which are crucial for ray tracing. But it proved to cause more pain than benefit due to several bugs. And the overall speed was not that great in the end, probably due to lacking optimisation in my code, which in turn is at least partly caused by lacking profilers (which would be available for CUDA).
But the big issue were really the bugs, which prevented me actively from implementing stuff. Some of the bugs are totally ridiculous and don’t get fixed..
I already encountered the first bug when coding for the bachelor thesis 2 years ago. Unfortunately I ignored it and decided again on OptiX. There is a special printf function, which takes into account the massively parallel architecture of the graphics card. For example, it is possible to limit the printing just to one pixel. But, ehm
// the following line works: rtPrintf("asdf\n"); // on the contrary the following two don't rtPrintf("asdf\n"); rtPrintf("asdf\n");
The latter crashes with
OptiX Error: Invalid value (Details: Function “RTresult _rtContextLaunch2D(RTcontext, unsigned int, RTsize, RTsize)” caught exception: Error in rtPrintf format string: “”, [7995632])
The difference is really only in calling the rtPrintf once or twice with the same text. Sometimes it could be even a different text and sometimes it worked with the same text. I reduced the code to a minimal example in order to eliminated possible stack corruption, but to no avail. This problem was
reported on the NVIDIA forum in June 2013. It’s possible to use stdio’s printf in conjunction with some ‘if’ guards as a workaround, as proposed in the forum.
The second one is also connected to rtPrintf:
RT_PROGRAM void pathtrace_camera() { BiDirSubPathVertex lightVertices[2]; lightVertices[0].existing = false; lightVertices[1].existing = false; for(unsigned int i=0; i<2; i++) { sampleLightPath(); if(!(lightVertices[i].existing)) break; } // rtPrintf("something\n"); sampleEye(); output_buffer[launch_index] = make_float4(1.f, 1.f, 1.f, 1.f); }
This code would crash on two tested operating systems (Windows 7 and Linux) and on two different computers (my own workstation and one from the uni).
OptiX Error: Unknown error (Details: Function “RTresult _rtContextLaunch2D(RTcontext, unsigned int, RTsize, RTsize)” caught exception: Encountered a CUDA error: Kernel launch returned (700): Launch failed, [6619200])
.
It runs fine though, when the rtPrintf is not commented. I reported it here. Again I made a minimal example, in the end the source was only two files, containing ~30 and ~60 lines, but it constantly and reliably crashed depending on weather the rtPrintf was commented or not.
So, later, if the program crashed, the first attempt to resolve the issue was spreading randomly rtPrintfs. This was also one of the solutions to another problem presented in the next paragraphs.
But before I come to it, I quickly have to explain the compilation process. There are two steps, first, during “compile time”, the C++ code is turned into binaries and the OptiX source into intermediate .ptx files. Those contain sort of a GPU assembler, which is then compiled during runtime by the NVIDIA GPU driver into actual binaries executed on the device. This is triggered by a C++ function call, usually context->compile().
Now, the problem is that these context->compile()s don’t return always. Sometimes they run until the host memory is full. Once it helped to spread calls to the mentioned rtPrintf in the code, another time the resolution was an extra RT_CALLABLE_PROGRAM construct, report with additional information here.
Generally those problems are painful and demotivating. Especially taking into account, that it seems like NVIDIA doesn’t care, at least they didn’t answer any of my reports. But the reason for this could also be, that they are phasing out OptiX already due to little success. Any way, it was certainly the last time I used OptiX and I don’t recommend anybody to start with it.