Discussion:
[Interest] Crash with Qt application that use OpenGL
Xavier Bigand
2017-04-20 17:27:03 UTC
Permalink
Hi,


I think that I have found a crash in the nvidia drivers made by QtQuick.

On the computer on which our application crash at startup in the
nvoglv32.dll QtCreator crash too in the same way, that it why I suspect
that it comes from Qt.

It seems to be specific to the following configuration:
- Nvidia geforce 1060 or 1070 at least (can't reproduce with a 980 GTX)
- driver version 381.65

Other things that may have an impact
- Windows 10 64bit, with creator update


Does someone have the same issue?

I will made some test with a basic QtQuick application to be sure and
eventually fill a bug.
--
Xavier
Thiago Macieira
2017-04-20 17:35:18 UTC
Permalink
Post by Xavier Bigand
I think that I have found a crash in the nvidia drivers made by QtQuick.
Then the bug is in NVidia code. Report to them, please.
--
Thiago Macieira - thiago.macieira (AT) intel.com
Software Architect - Intel Open Source Technology Center
Xavier Bigand
2017-04-20 19:04:56 UTC
Permalink
It is not necessary a bug from Nvidia, it can comes from bad parameters to
functions like glDrawElements. Nividia drivers don't check a lot the
parameters and given wrong values can cause buffer overflows,...
Post by Thiago Macieira
Post by Xavier Bigand
I think that I have found a crash in the nvidia drivers made by QtQuick.
Then the bug is in NVidia code. Report to them, please.
--
Thiago Macieira - thiago.macieira (AT) intel.com
Software Architect - Intel Open Source Technology Center
_______________________________________________
Interest mailing list
http://lists.qt-project.org/mailman/listinfo/interest
--
Xavier
Sergio Martins
2017-04-20 19:23:10 UTC
Permalink
Post by Xavier Bigand
It is not necessary a bug from Nvidia, it can comes from bad
parameters to functions like glDrawElements.
Could be, but you'll have to prove it.

I suggest:
- Create a minimal-testcase which reproduces the problem
- Run apitrace on it [1] (Never used it on Windows, but there are
binaries for it)
- Re-run the trace under apitrace, hopefully it crashes
- Check if your hypothesis is correct (bad glDrawElements)
- If it's an nvidia bug: Report to Nvidia, attach trace
- If it's an nvidia bug: Open a Qt bug report so the card can be
blacklisted
- If it's non an nvidia bug: Report a regular Qt bug so it can be fixed,
attach trace



[1] - http://apitrace.github.io/#download

Regards,
--
Sérgio Martins | ***@kdab.com | Senior Software Engineer
Klarälvdalens Datakonsult AB, a KDAB Group company
Tel: Sweden (HQ) +46-563-540090, USA +1-866-777-KDAB(5322)
KDAB - The Qt, C++ and OpenGL Experts
Xavier Bigand
2017-04-20 21:01:37 UTC
Permalink
Ok, Thank you for suggestions Sergio, I also never used Apitrace.

The black list seems to be a good thing even if it can't work for us as for
the moment we have our homebrew 3D engine that directly call opengl.
If you have tips to migrate our engine to Angle I will be interested
because the last time I took a look I fell on build issues with glew and
other dependencies we have that also includes opengl headers.


Angle seems the way we should take to improve compatibility of our
application under Windows, we have few other issues that might be related
to OpenGL. I hope that Vulkan will get a better support under Windows as it
seems much more well defined.
Post by Sergio Martins
Post by Xavier Bigand
It is not necessary a bug from Nvidia, it can comes from bad
parameters to functions like glDrawElements.
Could be, but you'll have to prove it.
- Create a minimal-testcase which reproduces the problem
- Run apitrace on it [1] (Never used it on Windows, but there are binaries
for it)
- Re-run the trace under apitrace, hopefully it crashes
- Check if your hypothesis is correct (bad glDrawElements)
- If it's an nvidia bug: Report to Nvidia, attach trace
- If it's an nvidia bug: Open a Qt bug report so the card can be
blacklisted
- If it's non an nvidia bug: Report a regular Qt bug so it can be fixed,
attach trace
[1] - http://apitrace.github.io/#download
Regards,
--
KlarÀlvdalens Datakonsult AB, a KDAB Group company
Tel: Sweden (HQ) +46-563-540090, USA +1-866-777-KDAB(5322)
KDAB - The Qt, C++ and OpenGL Experts
--
Xavier
Xavier Bigand
2017-04-21 09:47:22 UTC
Permalink
Here is the trace.

I don't know if it is an error from the Nvidia drivers or Qt.
Post by Xavier Bigand
Ok, Thank you for suggestions Sergio, I also never used Apitrace.
The black list seems to be a good thing even if it can't work for us as
for the moment we have our homebrew 3D engine that directly call opengl.
If you have tips to migrate our engine to Angle I will be interested
because the last time I took a look I fell on build issues with glew and
other dependencies we have that also includes opengl headers.
Angle seems the way we should take to improve compatibility of our
application under Windows, we have few other issues that might be related
to OpenGL. I hope that Vulkan will get a better support under Windows as it
seems much more well defined.
Post by Sergio Martins
Post by Xavier Bigand
It is not necessary a bug from Nvidia, it can comes from bad
parameters to functions like glDrawElements.
Could be, but you'll have to prove it.
- Create a minimal-testcase which reproduces the problem
- Run apitrace on it [1] (Never used it on Windows, but there are
binaries for it)
- Re-run the trace under apitrace, hopefully it crashes
- Check if your hypothesis is correct (bad glDrawElements)
- If it's an nvidia bug: Report to Nvidia, attach trace
- If it's an nvidia bug: Open a Qt bug report so the card can be
blacklisted
- If it's non an nvidia bug: Report a regular Qt bug so it can be fixed,
attach trace
[1] - http://apitrace.github.io/#download
Regards,
--
KlarÀlvdalens Datakonsult AB, a KDAB Group company
Tel: Sweden (HQ) +46-563-540090, USA +1-866-777-KDAB(5322)
KDAB - The Qt, C++ and OpenGL Experts
--
Xavier
--
Xavier
Till Oliver Knoll
2017-04-21 10:59:10 UTC
Permalink
It is not necessary a bug from Nvidia, it can comes from bad parameters to functions like glDrawElements. Nividia drivers don't check a lot the parameters and given wrong values can cause buffer overflows,...
Still a bug in nVidia code then. Again, an OpenGL driver is *not* supposed to crash, even with bad input. That's what the OpenGL standard mandates...

And buffer overflows, God beware :) Report this to nVidia asap (you might get a reward for discovering a security hole ;)).

Cheers,
Oliver
Xavier Bigand
2017-04-21 12:51:39 UTC
Permalink
Hi Oliver,

All other applications that crash at startup on the hardware on which we
also have the issue are made with Qt, games and all other applications are
running fine.
We finally found that it is specific to the latest Nvidia driver 381.65 and
at least geforce 1070 and 1060, we have tested on an other computer with a
1080 and the same driver version that doesn't have the issue.

OpenGL Nvidia drivers often crash with bad inputs, and you are right it is
boring because in this case we can't use Nsight or other things to debug.

There is a lot of wrong OpenGL commands that can generate a good result
depending on drivers and hardware. This is the case with FBO on that we can
read and write at the same time with Nvidia and AMD GPUs, but not with
Intel ones. I think that is because Nividia or AMD have some code that fix
some common wrong commands in their driver to make OpenGL easier to use.

I will fill a bug to Nvidia.

Thank you all.
Post by Xavier Bigand
It is not necessary a bug from Nvidia, it can comes from bad parameters
to functions like glDrawElements. Nividia drivers don't check a lot the
parameters and given wrong values can cause buffer overflows,...
Still a bug in nVidia code then. Again, an OpenGL driver is *not* supposed
to crash, even with bad input. That's what the OpenGL standard mandates...
And buffer overflows, God beware :) Report this to nVidia asap (you might
get a reward for discovering a security hole ;)).
Cheers,
Oliver
_______________________________________________
Interest mailing list
http://lists.qt-project.org/mailman/listinfo/interest
--
Xavier
Viktor Engelmann
2017-04-21 13:09:52 UTC
Permalink
I'm not an OpenGL expert, so this might be completely unrelated, but
this reminded me of

http://lists.qt-project.org/pipermail/interest/2016-October/025009.html
Post by Xavier Bigand
Hi Oliver,
All other applications that crash at startup on the hardware on which
we also have the issue are made with Qt, games and all other
applications are running fine.
We finally found that it is specific to the latest Nvidia driver
381.65 and at least geforce 1070 and 1060, we have tested on an other
computer with a 1080 and the same driver version that doesn't have the
issue.
OpenGL Nvidia drivers often crash with bad inputs, and you are right
it is boring because in this case we can't use Nsight or other things
to debug.
There is a lot of wrong OpenGL commands that can generate a good
result depending on drivers and hardware. This is the case with FBO on
that we can read and write at the same time with Nvidia and AMD GPUs,
but not with Intel ones. I think that is because Nividia or AMD have
some code that fix some common wrong commands in their driver to make
OpenGL easier to use.
I will fill a bug to Nvidia.
Thank you all.
2017-04-21 12:59 GMT+02:00 Till Oliver Knoll
Am 20.04.2017 um 21:04 schrieb Xavier Bigand
It is not necessary a bug from Nvidia, it can comes from bad
parameters to functions like glDrawElements. Nividia drivers don't
check a lot the parameters and given wrong values can cause buffer
overflows,...
Still a bug in nVidia code then. Again, an OpenGL driver is *not*
supposed to crash, even with bad input. That's what the OpenGL
standard mandates...
And buffer overflows, God beware :) Report this to nVidia asap
(you might get a reward for discovering a security hole ;)).
Cheers,
Oliver
_______________________________________________
Interest mailing list
http://lists.qt-project.org/mailman/listinfo/interest
<http://lists.qt-project.org/mailman/listinfo/interest>
--
Xavier
_______________________________________________
Interest mailing list
http://lists.qt-project.org/mailman/listinfo/interest
--
Viktor Engelmann
Software Engineer

The Qt Company GmbH
Rudower Chaussee 13
D-12489 Berlin

***@qt.io
+49 151 26784521

http://qt.io
Geschäftsführer: Mika Pälsi, Juha Varelius, Mika Harjuaho
Sitz der Gesellschaft: Berlin
Registergericht: Amtsgericht Charlottenburg, HRB 144331 B
Xavier Bigand
2017-04-21 13:17:57 UTC
Permalink
This is not directly related, but that it is why I thought it was a Qt bug.
It it so easy in OpenGL to use badly API and having a good result that will
work on almost all configurations.

Dislike in Vulkan there is no way to check if APIs are correctly used, so
when we test our engine we can't really expect that it will run everywhere
just because it runs on our computer.

Anyway I had report the issue to Nvidia.
I'm not an OpenGL expert, so this might be completely unrelated, but this
reminded me of
http://lists.qt-project.org/pipermail/interest/2016-October/025009.html
Hi Oliver,
All other applications that crash at startup on the hardware on which we
also have the issue are made with Qt, games and all other applications are
running fine.
We finally found that it is specific to the latest Nvidia driver 381.65
and at least geforce 1070 and 1060, we have tested on an other computer
with a 1080 and the same driver version that doesn't have the issue.
OpenGL Nvidia drivers often crash with bad inputs, and you are right it is
boring because in this case we can't use Nsight or other things to debug.
There is a lot of wrong OpenGL commands that can generate a good result
depending on drivers and hardware. This is the case with FBO on that we can
read and write at the same time with Nvidia and AMD GPUs, but not with
Intel ones. I think that is because Nividia or AMD have some code that fix
some common wrong commands in their driver to make OpenGL easier to use.
I will fill a bug to Nvidia.
Thank you all.
Post by Xavier Bigand
It is not necessary a bug from Nvidia, it can comes from bad parameters
to functions like glDrawElements. Nividia drivers don't check a lot the
parameters and given wrong values can cause buffer overflows,...
Still a bug in nVidia code then. Again, an OpenGL driver is *not*
supposed to crash, even with bad input. That's what the OpenGL standard
mandates...
And buffer overflows, God beware :) Report this to nVidia asap (you might
get a reward for discovering a security hole ;)).
Cheers,
Oliver
_______________________________________________
Interest mailing list
http://lists.qt-project.org/mailman/listinfo/interest
--
Xavier
_______________________________________________
--
Viktor Engelmann
Software Engineer
The Qt Company GmbH
Rudower Chaussee 13
D-12489 Berlin
http://qt.io
GeschÀftsfÌhrer: Mika PÀlsi, Juha Varelius, Mika Harjuaho
Sitz der Gesellschaft: Berlin
Registergericht: Amtsgericht Charlottenburg, HRB 144331 B
_______________________________________________
Interest mailing list
http://lists.qt-project.org/mailman/listinfo/interest
--
Xavier
Till Oliver Knoll
2017-04-25 05:59:46 UTC
Permalink
[with reference to:
http://lists.qt-project.org/pipermail/interest/2016-October/025009.html]
Post by Xavier Bigand
This is not directly related, but that it is why I thought it was a Qt
bug. It it so easy in OpenGL to use badly API and having a good result
that will work on almost all configurations.
While it is not (directly) related to the nVidia driver crash that you
are observing it demonstrates nicely my prior point "a driver must not
crash" by example. Even if you feed the driver with bogus data.

When you look at the actual Qt bug report that resulted out of this:

https://bugreports.qt.io/browse/QTBUG-56234

"it will call API glVertexAttribPointer but input parameter pointer is a
pointer to system memory vertex data, which violates spec because
OpenGL 4.1 context needs to bind a vertex buffer before calling
glVertexAttribPointer, so AMD OGL driver will report the following error
" glVertexAttribPointer in a Core context called without a bound
Vertex Array Object [which is now required for Core Contexts]"

Yes, Qt uses the OpenGL API wrong here (for the simple reason that the
code in question was written against OpenGL 2, where it was perfectly
valid - jut not with a "Core Context" anymore).

But note the two points:

1. "which violates spec" and
2. "AMD OGL driver will report the following error"

This is how a properly validating driver is supposed to behave :)

Disclaimer: I am neither working for AMD nor nVidia and I acknowledge
that both drivers may have their quirks ;)
Post by Xavier Bigand
Dislike in Vulkan there is no way to check if APIs are correctly used,
so when we test our engine we can't really expect that it will run
everywhere just because it runs on our computer.
"There is no way" because with Vulkan most of what the driver did
previously - that includes input validation - is now in the
responsibility of the application itself.

In fact, the application is now in control of the "command queue", and
as such also responsible that the "data types fit together" and the data
arrives in the proper order (or is otherwise synchronised with
"barriers"). The Vulkan driver only does the bare minimum, but if you
pass it a pointer which points into system RAM instead of GPU RAM (when
such one is expected) then Bad Things Happen(tm).

That's why people say you are "closer to the metal" - there is much much
much less between the GPU and your application. Namely much less
validation (which in OpenGL is required by the specs, see above).

Simply said: most of the code (logic) which was previously in the OpenGL
driver is now in your application (which knows much better about the
nature of the data it wants to render, and hence can also spare most of
the input validation which - again by the specs - a former OpenGL driver
always *had* to do).

So it is not quite correct when you say "there is no way to check if the
API is used correctly" - it just happens that it is YOUR application now
that is supposed to validate the data (or at least it has to know
exactly what it does ;)).


Cheers, Oliver

Tim Blechmann
2017-04-20 19:11:16 UTC
Permalink
Post by Thiago Macieira
Post by Xavier Bigand
I think that I have found a crash in the nvidia drivers made by QtQuick.
Then the bug is in NVidia code. Report to them, please.
broken opengl drivers are a known issue on windows. it may be reasonable
to add this driver/device combination to the qt opengl blacklist and
enforce the use of ANGLE.

while this device seems to be new and chances are good that bugs will be
fixed, not so recent devices won't receive driver updates despite still
being widely in use. chrome and mozilla have extensive driver
blacklists, while qt only blacklists very few devices by default.

so i guess it's a good idea to file a bug report for qt as well ...
Till Oliver Knoll
2017-04-21 10:48:23 UTC
Permalink
Post by Xavier Bigand
Hi,
I think that I have found a crash in the nvidia drivers made by QtQuick.
On the computer on which our application crash at startup in the nvoglv32.dll QtCreator crash too in the same way, that it why I suspect that it comes from Qt.
Hold on... you have a "crash" caused by a given library (here: nvogl32.dll) which makes several other applications go down, and you suspect the /other/ applications of being the culprit? What's the reasoning behind that?

Even if you throw bad OpenGL at a driver that driver is *never* supposed to "crash".

So why not at least try the same applications with another GPU/driver first?

Cheers
Till Oliver Knoll
2017-04-21 10:55:40 UTC
Permalink
Post by Xavier Bigand
...
- Nvidia geforce 1060 or 1070 at least (can't reproduce with a 980 GTX)
So you actually tried with another GPU (possibly different driver) and said yourself that you cannot reproduce this. So why suspect Qt/QML in the first place? I do not understand this logic...

Did you try upgrading (or even downgrading) the driver? That is, what one usually does first when encountering GPU driver related crashes? What was the outcome?

Cheers,
Oliver
Tim Blechmann
2017-04-21 13:59:15 UTC
Permalink
Post by Till Oliver Knoll
It seems to be specific to the following configuration: - Nvidia
geforce 1060 or 1070 at least (can't reproduce with a 980 GTX)
So you actually tried with another GPU (possibly different driver)
and said yourself that you cannot reproduce this. So why suspect
Qt/QML in the first place? I do not understand this logic...
guys, it is not a question whose fault it is. it is the question how to
improve the situation for the customers! broken opengl drivers on
windows are a real-world issue that we cannot ignore!
Post by Till Oliver Knoll
Did you try upgrading (or even downgrading) the driver? That is, what
one usually does first when encountering GPU driver related crashes?
What was the outcome?
a customer buys application A, which uses qt, which uses opengl. the
customer won't blame the vendor of the opengl driver, nor qt, but the
developer of the end-user application A.

the situation in the windows opengl driver land is quite unfortunate:
the hardware vendors won't provide good opengl drivers, users won't
update them (especially the windows device manager reports drivers as
up-to-date when there are updated drivers on the download side of the
hardware vendor).

---------------------

one of the notoriously buggy drivers in the wild is intel's hd graphics
3000, which is the default gpu for many sandy bridge based laptops which
are still out in the wild.
we had severe issues with this driver on our qtquick applications. we
also got in touch with intel and got the reply: sorry, we don't support
this device anymore. likewise, lots of issues with amd/ati drivers.

---------------------

furthermore the current opengl driver check in qtbase is only checks the
first gpu, so it is horribly broken on multi-gpu systems (again, a
real-world issue that some of our end-users are facing):
https://code.qt.io/cgit/qt/qtbase.git/tree/src/plugins/platforms/windows/qwindowsopengltester.cpp#n78

cheers,
tim

ps: i had a chat with a mozilla developer some time ago: they never use
desktop opengl on windows, but only use ANGLE with a fallback to
software rendering, if the application crashes on customer's machines.
Ulf Hermann
2017-04-21 15:07:06 UTC
Permalink
Post by Tim Blechmann
ps: i had a chat with a mozilla developer some time ago: they never use
desktop opengl on windows, but only use ANGLE with a fallback to
software rendering, if the application crashes on customer's machines.
You can force Qt to use Angle by setting the QT_OPENGL environment
variable to QT_OPENGL=angle. Then detect if it crashes, by registering a
signal handler or similar (sorry, Qt cannot do that for you). If it has
crashed the last time set QT_OPENGL=software. There you have the same
behavior.

See also http://doc.qt.io/qt-5/windows-requirements.html for more details.

br,
Ulf
Loading...