Can an application break the graphics card?

| | August 4, 2015

Quick note: I know this is a game development Q&A site but I guess you guys most of all know and have experience with graphics cards so I address this question to you. If you think this is completely off-topic, please refer me to a proper site/forum. Edit: Actually, it is gamedev-related: if a bad code can result in card overheating or breaking then game developers should be aware of that and make sure their applications don’t do that.

This might seem like a weird or stupid question but is it actually possible to write such a graphics rendering application that can break the graphics card (in any way)?

The immediate reason that made me ask this question was (no surprise) my own broken graphics card. After having it repaired the serviceman said that they tested various apps (games) on it and it worked fine. But when I launched my own app (deferred shading demo) it heated it to over 100 degrees Celsius. So my card didn’t turn out to be fixed after all but what’s important here is that the problem seemed to only occur when running my own app.

I’ve played various GPU-demanding games on it (like Crysis) and often pushed it to the limit and more (had settings so high that the games ran at 5 FPS), some benchmarks as well… So I’ve given my card, many times, so much work-load that it couldn’t catch up (hence low FPS) but it never reached dangerous temperatures. But my own application managed to achieve that (at least when the v-sync was off). :P Since it was only my own app, I don’t think a bad cooling system was the culprit.

So I ask – do you think (or maybe know) whether or not it is possible to break the graphics card (in any way, not just by overheating) by some vicious code?

Update:

Joe Swindell said that overheating may be the problem (well, it definitely can break the card). But shouldn’t a proper cooling system prevent that from happening (under any circumstances)?

Boreal pointed out another problem. If I understand correctly, FPS is bound by both CPU and GPU (is that right?). So low FPS might signal either high CPU load or high GPU load. But again – shouldn’t a proper cooling system prevent GPU from overheating even if the card is “used at 100% all the time”?

12 Responses to “Can an application break the graphics card?”

  1. I once had a GeForce 4 MX 440 graphics board and I wanted to play Prince of Persia: sands of time. But the game didn’t launch because it couldn’t find the expected Pixel Shader support. This was a bit unexpected for me, because the later Prince of Persia: Warrior Within worked just fine.

    So, in the end I found of 3d analyser(http://www.tommti-systems.com/main-Dateien/files.html) and forced the game to run and played the game for several days. After a couple of days, my video card broke – didn’t display anything anymore. I had the new computer for about 5-6 months so I think that forcing the game to run this way actually broke my video card :(

  2. My personal experience:

    I used to have an Lenovo Thinkpad T61p with a Quadro FX570M built around August 2008, this batch was to known to have faulty GPUs that would one day or the other fail (soldering was sub-optimal on some of the GPU pins.)

    Everything was fine for about 5 years until I ran XCOM The Bureau on it (a game known for not really being optimized), the laptop was hot, fans at full speed and for about 1 hour of gaming it did freeze but not an usual freeze.

    Guess what ? I turned the laptop off and back on, it was dead with relevant BIOS beep codes indicating a video failure.

    To answer your question : yes (as others have pointed out), a software can definitely break hardware if the former is not protected in some way; if for instance the GPU fan is turned off then it will definitely blow up with a 100% chance of success :D

    1. Installing a mismatching driver with the actual card used can easily lead to causing permanent damage. My friend has somehow managed to do it through repetitive reinstallation of OS and physical change of HDD.

    2. Make your PC turn on and off a lot of times. Not sure if that can cause failure but it’s very possible. Anyway, doesn’t sound like a very software way to do it.

    3. Manipulate the level of power in the system by turning power-consuming USB devices on and off(example: external HDD that doesn’t use it’s own power). Doing this always makes my keyboard and mouse unusable until next restart and it has (over 2 years of everyday pluging 3 HDDs in and out) burned several cells in one of my RAM chips which resulted in a BSOD every 10-20 minutes.

  3. Yes, I have broke a few. I don’t run grid gpu calculating apps anymore. Some apps tend to break those especially when machine goes to sleeping mode, but in normal situations when blowers are working/cooling liquid is circulating there shouldn’t be issues unless cooling is undersized.

  4. One word answer: YES.

    Detailed answer: Yes. It can (in certain situations). Imagine you write a program which drastically transfers data to your GPU to an infinite loop. It is certain that it will get overheated. Now again is not its Cooling system’s responsibility to take care of it? Of course it is. But you should also remember that, the cooling system also have some threshold levels. If the heat produced is out of operating range of your cooling system, then your cooling system is hardly of any use. I dunno what your app does, but talking to a programmers perspective, you may write such programs which leads to this kind of situation.

  5. It can if the card’s circuit shorts, however this is very unlikely to happen because the system is isolated until a certain high temperature. In some case the thermodynamic system of the card can be disturbed if it is really close to another system or if it is even touching another material that is not a system.

  6. It has happened in the wild.

    Starcraft II in 2010 had a problem where it had an uncapped framerate on menu screens placing an odd load on graphics cards, destroying cards from some vendor with insufficient thermal protection.

    Design and manufacturing flaws in the GPU itself can also lead to the card dismantling itself under load. G84/G86 mobile GPUs had solder joints that broke under acceptable temperature loads and eventually broke. We also have the infamous Red Ring of Death of the XBox 360 which has similiar thermal problems with solder and expansion.

    All of the above are a mixture of hardware defects and insufficient thermal designs, amplified by software load.

  7. Your question is much more complex than what you wrote. I would say the general question is “can software break hardware?”, and the answer to that is a definite yes.

    Mind you, not all hardware can be theoretically broken via software commands, but eventually, what software does is send electrical signals to very delicate hardware components. Usually, the more delicate a hardware component is, the more likely it is for it to be damaged when it is handled in a way in which it was not designed.

    There’s a lot of fun ways hardware can break, but let’s just consider overheating: processing work generates heat, and that heat has to go somewhere. Depending on the dissipating characteristics of your card, the airflow in the case, and the overall temperature in the room, the amount of heat removed from the system can be more or less than what is being generated by it.

    If you ask the video card to do work that generates more heat than can be efficiently dissipated, then the chip temperature will rise. If you keep it going, then the temperature will rise above the safe operating level, and the chip will break, lose its magic smoke, and probably even cause a fire hazard. You have just broken your video card, I hope you’re happy.

    Now, can you write software that does this? I would say most likely not. Any (user-level) program you write will not talk directly to the video card. There are lots of safeguards designed to prevent this situation, and they all have to fail so your rendering program ends up burning your house.

    1. Generally, heat dissipators and fans are designed so they can comfortably remove the maximum amount of heat the card will generate, even on poorly ventilated cases on hot climates (within the operating ranges specified by the manufacturer)

    2. If heat generation is greater than heat dissipation, the first line of defense would be the driver. Most drivers will be checking the core temperature of the GPU, and if it is going up, the driver may limit the amount of instructions it sends to the GPU to prevent generating more heat.

    3. Should that fail, the firmware in the graphics card should detect that heat is dangerously building up, and will therefore reduce the clock speed in an attempt to reduce heat generation.

    4. If after all that, heat is still building up, a thermal diode available in most modern CPUs and GPUs will shut down the video card entirely, and heat generation will stop.

    So, if you want to break your expensive video card from a user level application via overheating, in addition to build a piece of software that stresses the system to its maximum, you would need:

    1. A faulty or broken heat dissipating system. Simply sticking your finger in the fan (always in the center, not in the blades) should do the trick. Difficulty: Easy

    2. A custom (or buggy) driver with throttling features disabled or broken. Difficulty: Normal

    3. A custom firmware with clock reducing features disabled or broken. Difficulty: Hard

    4. A broken thermal diode. If you’re constantly triggering the thermal diode, it may get damaged. Difficulty: Very hard

    … but not impossible! Feel free to try it*, but do make sure you keep your fire department’s phone at hand when you do it.

    *: This is sarcasm. I am in no way condoning the creation of a fire hazard, or any activities that may harm you, your family, dog or community in any way. By reading this post you completely waive me of any responsibility your actions may bring.

  8. Wolfgang Skyler on November 30, -0001 @ 12:00 AM

    Yes, It can.

    • Overheating as the obvious example, can be caused by extreme workloads. Usually achieved through overclocking. This would be the easiest to cause purposefully.

    Can be avoided with good cooling systems. Enabling V-sync is also a good way to avoid this. V-sync prevents the GPU from outputting frames at a faster rate than the monitor can handle, which are frames that normally get dropped, never to be seen.

    Fewer frames = less processing = less extreme workload.

    Keeping track of the abilities of the GPU is also important. I imagine the programmers at Crytek wrote the code to be ready for someone to overestimate their graphics-cards abilities. If they did, I’m sure it’s a feature that has saved many-a-GPU, and has saved many unknowing GPU owners from frustration.

    • A little bit of corrupt (or improperly coded) data can cause a pointer to end up pointing to somewhere it’s not supposed to, which can wreck all kinds of things. Though likely not permanent, it could cause varying degrees of failure in it’s operation. Such a fault on the CPU is normally caught by the OS and avoided or, if it can’t be avoided, will invoke a BSOD (Blue screen of death.)

    Can be avoided with careful coding and double checking at run-time. (But there are always bugs. If there aren’t, it’s because they’re toying with you.)

    • The GPU is also going to have a driver, which adds another place things can go wrong. Some data can be corrupted there, or there can be a bug, etc., etc. To add to that drivers, in general, run the risk of causing a BSOD. The fallback system of the OS when something goes terribly wrong and it needs to run an emergency shutdown operation to try and minimize, or prevent, the damage. A carefully coded driver will (hopefully) not do this, but there’s always a chance for bugs. Including in emergency shutdown procedures.

    This can be avoided with careful coding and double checking at run-time.

  9. Andon M. Coleman on November 30, -0001 @ 12:00 AM

    Even with VSYNC off, many games can fail to hit even 98% GPU utilization. The more actual gameplay they implement, the fewer frames they can stage and the more likely the GPU will go underutilized. Good multi-core optimized games can get significantly closer to 100% GPU utilization, but generally gameplay logic keeps the CPU busy enough with other tasks that it is not able to saturate the GPU with a full workload. Pure rendering applications can easily reach 100% GPU load, but games do a lot more than rendering.

    On a side note, on my home machine my GPU generates significant EMI under high load and it interferes with the cheap integrated audio on my motherboard. I can hear a high-pitch whining over the analog audio whose frequency varies with load. I have come to enjoy that and consider it a feature rather than a design flaw, it makes profiling interesting as I can actually hear the load level without having to sample a GPU performance counter. However, I suppose if you have some device that is highly sensitive to EMI and inadequately shielded this could be a problem… high GPU load could cause failure in another device.

  10. It’s not the app’s responsibility to ensure the GPU doesn’t overheat, and it’s not the app’s fault if it does overheat.

    If the GPU doesn’t have proper cooling, then yes, running a 3D app can heat it up to dangerous levels. I don’t know why your app does it and Crysis doesn’t, but it means the card has inadequate cooling, and/or it has been messed with (overclocked, or other factory settings / drivers altered).

    Besides overheating, I’m not aware of any other way in which software could physically damage the chip it’s running on. That’s really not supposed to be possible; it would be a very serious failure of design.

  11. Overheating IS breaking your graphics card. Throwing a massive loop of data that it can’t handle will certainly, as you’ve seen, crash and possibly damage your card permanently.

Leave a Reply