Solving the thermal problems of HD2 or other snapdragon powered devices - HD2 General

UPDATE -
As of June 14. 2011 it appears the only viable way to solve this problem is to reheat the cpu with specialized equipment. I am currently testing if this can be done in a standard gas/electric oven (NOT MICROWAVE !!). If successful this method will solve the problem and the following guide (cpu cooling system) may help prevent it from happening again. Only building this cooling system is not enough and will not solve your problems.
Hello dear members of XDA
First of all please excuse my english, I will try to explain myself as well as I can. It will be a long post, it could be boring, it could be scary or whatever you like but bear with me on this one.
So.. got myself a HTC (T-Mobile) HD2. A bit late in the game, but hell no, still a good phone. I've dreamed of having one but couldn't afford. Anyway finally i got one. A second hand broken one, damn it.
As i found out the problem is pretty common: damn thing restart itself - thermally related - the old CPU overheat problem. By searching the net I found out that it's pretty common with some HTC models. HD2 has it, Desire has it, Nexus One has it, hell even some xperia models have it.. about half of the devices powered by anything from the Snapdragon series could have it.
The problem could be easely described as : phone hanging, restarts, the dreaded 7-8 short vibrate sequence - phone locked etc.
Mine was worst then i've seen on the forums or with other people. It locked itself for just about every reason i could get. Taking pictures, browsing the menu, using gps, the browser, 3g or wifi, watching a movie ... all concluded with restarts or lock-ups after some couple of minutes. I've found out that keeping the phone at 4-5 degrees celsius would solve my problems in most cases, but anything above 10-15 degrees would make the thing go crazy.
Well, I'm pasionate about electronics, development in this area, trying to solve problems and things like that. Also experienced in heat and semiconductor related problems. I also had one macbook air that suffered from core shutdown because of overheating (also a well known problem for MBA rev 1.0) and managed to design an alternate cooling system that solved the problem. So i gave it a shot, i know there are many users that have similar problems and altrough i don't suggest them explicitly to make this hacks to their phones.. this is one way to solve the problem if you buy your unit second hand or don't have some form of warranty.
So here we go.
Big fat warning!!! Don't attempt these things with your phones unless you are familiar with the concepts or the tools involved in the process. Also, there is a real risk to permanently damage your phone. Not just real.. but big if you get something wrong.
First step is to run some simple tests to determine the cause of the problem or the range it extends to.
So, I used a multimeter with a K type thermal probe to measure the temperature of various components of the phone during intensive use.
this is the back of the mainboard of my HD2. If you notice, HTC placed a blue-ish thermal pad over one metallic shield covering the back components. I don't know what's the purpose as the back casing in that area is made of plastic - no heat dissipation, or a bad one. Anyway that's a good place to place my probe. Some tape holded the probe in position. Because we don't have perfect mechanical contact between the probe tip and the casing or chips i expect +1 or +2 degrees celsius to be added to each measurement i will later describe.
i now placed the battery over the back of the phone and secured it with some other tape and some toothpicks
we're at 19.3 degrees. That's were we'll going to start from.
there's a usefull little app that allows users to overclock or stress test their phones cpu. Found it here on XDA, i'll use it for some heat making purposes.
as you can see.. we're already at 25.8 degrees, after 5 minutes of testing.. not to mention the actual heat making primary suspect - qualcomm chipset is on the other side. At 29.5 degrees at this point.. the phone locked itself. I reapeted the experiment 2 more times - got exactly the same result.. at least the readings were consistent.
Ok, i then removed the motherboard to take some readings from the actual CPU.
same procedure.. next readings. - at around 33.4 - 34.2 degrees (varies) on the CPU itself the phone will either restart or lock itself up. So you see how serious my problem is. Summer will come so I won't be able to use my phone...
Measures have to be taken.
Let's make a small introduction about heat related to semiconductors.
Well, simply put a conductor (semiconductors act the same way) generates some amount of heat when an electric current is passed along it. This is because of the fact that small electrons moving along the conductor (in a simple way that's the definition of any electric current) will ocasionaly collide with the atoms of the material their passing through. In the collision the electron loses some amount of energy. That energy is heat. Also, heat itself can be described at an atomic level as the intensification of natural ocuring brownian movement of atoms. If they move a lot, if they are more agitated they create more heat. If they are more agitated, they are more likely to be hit by passing electrons. So a hot conductor is more likely to get even hotter because of that. There is a point were the heat generated makes the conductor's atoms prone to more hits from passing electrons in kind of like a geometric progression. That's called thermal runaway. It will tend to destroy electronics by overheating, melting or burning themselves up.
Back to our phones now. The CPU produces heat. Because of the same effect described above. The heat in this case will either melt or break the small "balls" that comprise the BGA matrix on what these cips are mounted on. The small balls will either melt (extreme cases) or dilatate with increasing temperature. However it seems most of the new processors used by HTC are mounted in some epoxy resin that has both dilatation point and melting point higher then the flux and welding compound used to solder those cips. So the actual cip will tend to stay fixed in a particular position, unable to expand or contract with temperature variations, but the balls used in the BGA matrix underneath it will contract or expand with these variations. This could lead to a case when at least one of that balls (some couple hundreds in total) become "loose" or out of position, thus breaking the electrical contact it should have made. Therefore our problems. At fist large amounts of heat must be applied in order to actually break the bond between the cpu and board, but after that, once broken the tiny links are very sensible to temperature variations and they will expand or contract freely.
Most users notice that at it's core, the problem seemed related to overheating (in the begining) but after time it's effects are degenerative.. phones seem to restart with no apparent reason. It's still overheating, but things are starting to get more and more worse as the chip and it's connections become more sensitive to heat variations. Thus, even small variations now produce these problems - my CPU restarts at 34 degrees .. that sucks.
So, my only option was to try to reheat the cpu in the attempt to partially melt the broken "balls" in the bga matrix and hopefully.. i repeat HOPEFULLY they remake contact with the mainboard. A re-ball of this chip is not possible, as the resin placed around it by HTC doesn't melt at the normal temperature i could remove the chip itself, so heating it at even higher temperatures would risk killing the cpu long before the resin melts. Strange move by HTC to make things like this.
Anyway.. here goes nothing..
I've placed the usual aluminium foil designed to protect surrounding components by the heat generated by the rework station and the hot air used to heat up the CPU.
I preheated the CPU for about 10 minutes, from both sides of the board, then switched to heating it at 360 degrees. I applied even pressure above it after it was heated in order to tighten the space between it and the board, just a little bit. THIS IS VERY RISKY. Normally not recommended because of the risk damaging the BGA. In this case the resin would prevent me from moving the chip to much so it's less risky. Not safe.. but less risky.
I've let the board to cool on it's own for half an hour and repeated the temperature monitoring tests.
Now i had an increase of maximum temperature before a restart from 34 degrees on the cpu to about 42. It's not much but it's a start. However above these temperature.. the phone will still lock or restart.
I went for another round of reheating with the hot air station. After this, i've got slightly better results. Some 2-3 degrees more. My lucky break was when i suspected thermal runaway for the CPU. So i tried to make some sort of a heat sink for that chip using some mica foils for to220 can transistors, some thermal grease and a bunch of aluminum and copper foils. My theory was that heat dissipation will eventually accelerate faster above a specific level, a point from witch thermal runaway occur. In my case in the initial tests, even after the phone locked itself and i manually restarted (battery out - in) the temperature continued to increase even faster altrough the phone wasn't doing anything intensive.
The role of my "heat sink" would be to dissipate more heat rapidly and in some manner to press the cpu against the board.
After I placed the mica foils directly above the cpu with thermal grease above and beyond i mounted back the metal shield over that area. On it, i placed some more silicon paste and some thick copper foil (used in some broken laptops i have over here). It looks ugly but.. worth a shot:
after that i begin making the rest of the heat sink using aluminum foil. I folded about 12 layers, between each of them having placed... more thermal grease and at the 6-7 layer another round of mica crystal foil.
Here's the aluminum foil
I then pressed the foils very hard between two flat surfaces in order to remove the excess thermal grease.
I "anodized" the first layer (the one in contact with the cpu shielding) with some ferric chloride. Before that, the board looked like this:
After the logic board was mounted back, i remade all the connections and after some preliminary tests, mounted the phone back together. It now looks like this
I only have to re-attach the serial no. and imei, plastic sticker.
Of course i then run tests. I heated up the phone with a hair dryer to simulate a hot summer day. About 40 degrees, just to be sure. I then run cpu stress tests and a full divx movie (impossible in the past). On preliminary testing, i had indications that i avoided the thermal runaway the cpu now running stable at 24 degrees (19.3 in the room - ambient temperature). No more, heating up by itself to about 40 degrees then restart.
On the final testing, with the phone put together, i heated it with the hair dryer and achieved 40 degrees. I started it and run stress tests. No more lockups or restarts, not even a single one. However with the phone put together i can't measure inside temperature on it's components. As i feel it, it get's warmer, it heats up to some degree, but now it's spread all over it's surface. For some particular reason it doesn't restart anymore.
I then tried, cpu stress test, wlan connection, pc connection and browsing the net all at the same time. NO RESTART I watched a full 1.30 hour movie at max playable quality, the phone was really hot (43-44 degree at it's surface) but still no problems.
It appears that for the moment i saved the phone. However, future behavior is still to be determined.
I'll get back with more testing, in the following days and eventually i hope to devise a general method for building heat sinks for phones (yeaah, ridiculous....) using combinations of metal and thermal conductive cristals. The ideea is to find out if reheating the chip by hot air station can be avoided (this involves the most risk). But the start is promising. By the time warranties will expire and phones like the new droids or winmo 7's start to break from thermal problems, maybe i'll have some sort of a more user friendly solution.
EDIT JUNE 04.2011
since i have a dead hd2 motherboard here, i tried to remove the cpu to expose the BGA soldering. Just for fun, no chance of BGA reball, as there aren't any tools available for this particular chip. The resin prevents a proper removal, at about 450 degrees celsius it was still kind of hard, so i had to forcefully remove the chip and break some of the BGA. The chip is very thin, kind of like a micro sd card. It heats up pretty quick and fast, the solder points underneath it got melted in about 2-3 min at 370 degrees celsius.
Here's how it looks.
This is the motherboard without the chip. The BGA matrix is broken, some balls were simply ripped out when i forcefully removed the chip.
This is the actual chip compared with a mini sd and and standard sd card.
...and this is the underside of the chip. belive it or not, the chip is actually alive and it's pins are ok. It cannot be used because it cannot be properly soldered to a board. Guess i'm gonna punch a hole through it and use it at my key chain, along with a laptop cpu already there
In the following days i will experiment with the solder points&materials in order to try to produce a more safer method to reheat future boards with thermal problems. It seems this board died because of overheating and a short circuit made over the center of the array by 3 solder balls that got in contact once they were melted.

You sir are the man.
I personally do not have any over heating issues with my HD2. But there is so many people here on XDA tbat do. So your work will be greatly appriciate and followed here. From what you have posted you may just be onto something that will be very useful not just to HD2 owners but to a large veriety of smartphone and maybe even tablet owners, present and future. I too find the the epoxy resin that HTC placed around the chip odd. It is almost like HTC did not want this part of the hardware to be replacable. But make you have to buy a whole new main board, corperate crap
Anyways please keep up the good work and I will be following this thread very attentiavely. So please do post back here with your future findings.

The epoxy resin is there to hold the CPU (which is the biggest and probably the heaviest part on the PCB) in it's place, and thus making it more resistant to mechanical forces (such as accidental drops, shocks, etc...).
It could also be, that HTC wanted to prevent access to the CPU-s I/O pins, making it impossible (or at least very difficult) to desolder it. This way it is difficult to "reaserchers - hackers" to chart a schematic diagram of the connections between CPU, flash chip, ram, etc..., or to attach logic probes to the IO lines, and that leads to being difficult to make software hacks such as HardSPL, or sim unlocking, etc... I know that there are other methods to connect to CPU (such as JTAG), but fewer options mean, less chances to succeed.
B.r.:
d3m0n

This is an excellent post and thnx you for your testing and investigation. Do let us know how your unit goes within a week of real world testing.
The engineers @ htc should have incorporated a better CPU cooling solution on hd2. Your testing and modifications are a wealth of information.

thanks for the support
The resin on the cpu has a higher melting point then that of the solder joints of the actual cpu to motherboard. I guess the reason for it's presence is that when the phone is hot (and HTC knows the cpu can gen hot) the solder joints become less mechanical resistant, they could be more easily broken. In case of a BGA mechanical failure, the resin however would pose a problem as the chip can't be desoldered safely or even reheated. I took a risk there.
I don't think HTC didn't want us to desolder the chip because of JTAG pinnout mapping. Even with acces to the pins, it would be very hard to find the pins without some form of CPU datasheet. Same goes for CPLD for example.
Anyway, at this time i still don't know if either reheating or making some sort of cooling system helped me to solve this problem. So far, still good, not a single problem noticed so far. I can now reflash the phone, before the procedure, i experienced that vibration pattern during a flash. I've put android on it, stress test it even more, i'm now trying to play some 720p on it. It still heats up, it feels as warm in the hand as before, but for whatever reason, it doesn't restart anymore.
However...more testing still to come.
On another note, i'm now working on a broken Eten M810, some CPU problems. It this case, the CPU doesn't have that resin, however the nand memory has. Different brand, different choices.

as i found out until now the steps from a good working hd2 to this problems are something like this:
1. phone working ok. mainboard (lower part of the device) heats in some conditions - demanding programs etc battery can reach about 40-45 degrees max. without problems. The phone will restart or freeze (cpu halt) in any of these situations :
- battery temp exceeds 45 degrees and stays over this value for at least 5-10 minutes in order to trigger the thermistor used to measure the temperature in this area over i2c. at this time, it will prevent further charging and restart or lock the phone. This is normal behavior.
-CPU exceeds 60-65 degrees (exact value still to be determined. i'm trying to get acces to some similar chipset datasheets). This produces CPU halt. Depending on what you're doing, the halt will either reset the phone or simply lock it up. Restarting by soft reset or by itself will probably return the user back in the home screen with the phone still working. This is also normal behavior, related to qualcomm chip.
2. phone starts to malfunction. This condition starts by either large variation in temperature - mainboard al low temperature gets fast to full load or simply sustained full load. All HTC HD2's revisions have the same type of soldering in the cpu area. Visually speaking (no conclusive data yet) first revision used a bit more epoxy resin to secure the cpu in place. In the context of overheating and solder balls dilatation, that's not quite a good thing. Some sort of thermal spike must occur in order to break the contact between cpu and motherboard. Warning, if your phone will lock up and doesn't restart by itself, it's imperative that you disconnect the battery because as I measured, even with the phone locked, the CPU still overheats even more, thermal runaway occurs and temperature climbs to dangerous levels. I never left the phone do this for a long time, therefore I don't know how much it will still overheat, but it does and it will. In the initial stage of the problem, only extended heavy load use can trigger the problem. A common case is keeping the phone on in the car and using it for gps navigation in a hot summer. If the phone will restart before either 45 degrees at battery or 60-65 degrees at cpu level (however the last one is harder to measure) then you certainly have problems and they are just at the start.
3. problems get worse. At this stage it is possible to notice the 7 short vibrates at boot time if the phone is warm or kept in a warm environment. You don't have to push it very hard, it only needs to be warm. The vibration pattern is an error code made by the actual qualcomm chipset, not sent by either bootloader, spl or operating system. When in happens the cpu will lock itself up, however file transfer (including nand memory acces, storage card acces and basic operations) or other chipset functions will still work for some time. It appears only cpu processing is being halted. So if this occurs when you boot the phone, it will lock up, but if this occurs when you are flashing a rom, you might continue to see the progress bar still filling. The vibration pattern signals a physical damage to the qualcomm chipset has ocurred. There's no way around it, when it occur it will never just .. heal up by itself.
You will notice that the temperatures needed to induce a restart/lockup will decrease with time (both battery & cpu).
4. Problem at it's worse. CPU can lock itself even at 35-40 degrees (measured at it's level). Ambient temperature of only 10-12-15 degrees is enought to have the phone experience problems. The cpu start to suddenly produce either lock-outs or hard faults or simply work intermittently. The OS may give errors relating ARM CORE failure or fatal errors regarding execution of certain "lines" (related to code lines in the os core programming). At this stage, the phone doesn't need to feel warm in your hands to produce these problems. This could trick some people not to still relate this to thermal problems and look for the solution or problem cause elsewhere. It's still related... but at it's worse.
5. Total CPU collapse. If the phone locks and remains locked in whatever screen or program it was running, like i've said before, it will still overheat. If a stage 4 phone is left overheating, chances are that more balls connecting the chipset to the motherboard will fail. If any one needed to correctly initialise the chip or to power it on, fails - then it's end game for that phone. It will simply stop working and never turn on. Some other variants are that the phone will only start if placed in a freezer or start but never complete a boot sequence (either os or bootloader .. or both could be unable to start)

facdemol said:
as i found out until now the steps from a good working hd2 to this problems are something like this:
1. phone working ok. mainboard (lower part of the device) heats in some conditions - demanding programs etc battery can reach about 40-45 degrees max. without problems. The phone will restart or freeze (cpu halt) in any of these situations :
- battery temp exceeds 45 degrees and stays over this value for at least 5-10 minutes in order to trigger the thermistor used to measure the temperature in this area over i2c. at this time, it will prevent further charging and restart or lock the phone. This is normal behavior.
-CPU exceeds 60-65 degrees (exact value still to be determined. i'm trying to get acces to some similar chipset datasheets). This produces CPU halt. Depending on what you're doing, the halt will either reset the phone or simply lock it up. Restarting by soft reset or by itself will probably return the user back in the home screen with the phone still working. This is also normal behavior, related to qualcomm chip.
2. phone starts to malfunction. This condition starts by either large variation in temperature - mainboard al low temperature gets fast to full load or simply sustained full load. All HTC HD2's revisions have the same type of soldering in the cpu area. Visually speaking (no conclusive data yet) first revision used a bit more epoxy resin to secure the cpu in place. In the context of overheating and solder balls dilatation, that's not quite a good thing. Some sort of thermal spike must occur in order to break the contact between cpu and motherboard. Warning, if your phone will lock up and doesn't restart by itself, it's imperative that you disconnect the battery because as I measured, even with the phone locked, the CPU still overheats even more, thermal runaway occurs and temperature climbs to dangerous levels. I never left the phone do this for a long time, therefore I don't know how much it will still overheat, but it does and it will. In the initial stage of the problem, only extended heavy load use can trigger the problem. A common case is keeping the phone on in the car and using it for gps navigation in a hot summer. If the phone will restart before either 45 degrees at battery or 60-65 degrees at cpu level (however the last one is harder to measure) then you certainly have problems and they are just at the start.
3. problems get worse. At this stage it is possible to notice the 7 short vibrates at boot time if the phone is warm or kept in a warm environment. You don't have to push it very hard, it only needs to be warm. The vibration pattern is an error code made by the actual qualcomm chipset, not sent by either bootloader, spl or operating system. When in happens the cpu will lock itself up, however file transfer (including nand memory acces, storage card acces and basic operations) or other chipset functions will still work for some time. It appears only cpu processing is being halted. So if this occurs when you boot the phone, it will lock up, but if this occurs when you are flashing a rom, you might continue to see the progress bar still filling. The vibration pattern signals a physical damage to the qualcomm chipset has ocurred. There's no way around it, when it occur it will never just .. heal up by itself.
You will notice that the temperatures needed to induce a restart/lockup will decrease with time (both battery & cpu).
4. Problem at it's worse. CPU can lock itself even at 35-40 degrees (measured at it's level). Ambient temperature of only 10-12-15 degrees is enought to have the phone experience problems. The cpu start to suddenly produce either lock-outs or hard faults or simply work intermittently. The OS may give errors relating ARM CORE failure or fatal errors regarding execution of certain "lines" (related to code lines in the os core programming). At this stage, the phone doesn't need to feel warm in your hands to produce these problems. This could trick some people not to still relate this to thermal problems and look for the solution or problem cause elsewhere. It's still related... but at it's worse.
5. Total CPU collapse. If the phone locks and remains locked in whatever screen or program it was running, like i've said before, it will still overheat. If a stage 4 phone is left overheating, chances are that more balls connecting the chipset to the motherboard will fail. If any one needed to correctly initialise the chip or to power it on, fails - then it's end game for that phone. It will simply stop working and never turn on. Some other variants are that the phone will only start if placed in a freezer or start but never complete a boot sequence (either os or bootloader .. or both could be unable to start)
Click to expand...
Click to collapse
my phone stuck after it fall on the ground facing the LCD down on the tiles. there is no physical damage the screen is in perfect condition, touch screen works very well but after it hits on the tiles my phone is getting stuck randomly.. since its been a month after the incident i have tried a lot of ways to fix this even tried removing all the parts except the LCD and digitizer to see if there is something lose inside but still not fixed.. i m noob to all this dont really know names for all the parts. the reason i have found for this freeze is due to a little press near sim card where the main board is. even a very lite press from the back near sim card results the freeze..i can say this because it wont get stuck if didn't touch the mentioned area, every time when its stuck i have to remove battery cover and press the red button and it will get stuck again if i put the cover back after it boots (becouse when putting cover back it definitely press the area....so i have to be very careful to put it back) so can anyone help me on this? what could be the problem.. sorry for my english

An interesting observation. I have been in air conditioned room last 4 hrs and it really cools the hd2 down. Perhaps the glass digitizer is quite conducting and non-insulating.
During summer hot days I have hit 42C without any issues. Dont want to hit 45C though this hd2 is a beast.. Imagine staying at 1100MHz o/c all the time... aiiii caramba... we could cook some eggs on it.

Yeah, I've experienced some of the freeze while HD2 got hot and stayed like that until it cooled off. I've noticed that when I try to charge it in the car and run google map it tends not to fully charge as it should but it heats up after a while until it becomes unresponsive then i'd let it cool off again.
Hope it hasn't effected CPU's connections much but at this point I'll have to monitor its heat situation to prevent future disruptions.
Good work/thinking on the "cooling adapter" , i've seen similar approach on IBM's graphic card which fail due to the same reason and ppl would heat them up to reconnect chips connection to pcb.

the back of the lcd display (the actual lcd, not touchscreen) is made of metal and on top of it there is some copper foil. So.. yes, if you cool the screen it helps cools the cpu. However when i disassembled my hd2 i noticed that the actual cpu isn't in direct contact with the metallic back of the screen. So, although it could help if you cool the screen, it isn't very effective. I adapted my "cooling" pad to have the cpu thermally connected to the back of the screen via that DIY aluminium and copper foils setup.
@ modex if the phone drops, the bga connection between either the qualcomm chipset or nand memory (the 2 largest chips onboard) could get damaged. As we know the connection between the cpu (inside qualcomm chipset) and motherboard can be faulty or get that way over time, it's a prime suspect in your case. It is very difficult to predict the outcome if you send in the phone for repairs or have it reheated. I know of no service center that can effectively reball the cpu to the motherboad (means that the chip would actually have to be removed, connections remade - chip resoldered)
My phone is still doing well, one week after the intervention i made. About 14 roms installed, running wp7, android builds, custom wm 6.5, ubuntu and etc. Not a single restart or freeze. It does heat up, but it's now spread over it's surface.
From what i can tell, the diy heatsync helped more then the actual reheating of the chip via hot air station. If in the future, someone else without a warranty to the phone, would try making a similar hack to the phone, we will know for sure if the problem can be solved by simply cooling down the cpu to some extent.

When i will get back from my holiday i will try your thing on my hd2 as it does all the freaky parts even when doing nothing.
today i noticed 7vibrations and got scared, got artemis as a backup, using now, but after a week will try re-solder and give back feedback.
still-after few years of silence in doing electronics- i got my hermes back to life-white screen due to faulty front pcb keyboard, had tp2 and exchanged for hd2-want to see it fully working for the price i paid for it.
regards

Very cool thread...reminds me of days of palm pda when people were more technically
Inclined. Will have to try this out as well.

Just one silly question.
I mended pcb on hermes using heating torch(butane powered)
do you use the same sort of thing or special ones?

first of all heating up the qualcom chip is recomanded as a last resort option. however if you reheat it, pressing the chip to the board is VERRY dangerous, as it could permanently damage the BGA connection.
Here's some sort of guide on doing this. You will need a screwdriver, some 4-5 mica foil pads (you can get them from any electronic component store (get them for either TO3 or TO220 casing and cut them to the size of the cpu inside hd2) some good thermal grease (arctic silver or something for pc cpu's) an aluminum sheet for you to cut a piece of it.
* i don't recommend silicon thermal pads, use only mica crystal pads
* you can substitute the aluminum plate with aluminum/copper foil - the first is the one used for food wrapping)
* i don't recommend using anything beside a smd rework station (either hot air or infrared) to heat up the board. Although a heat gun can develop high temperatures, the air debit is to high (dangerous, you can blow up other components) and you will lack precise temperature control needed for this job.
1. Disassemble the phone following HTC's official videos. Completely remove the motherboard from the phone's casing.
2. Once you have the motherboard de-attached remove all metallic shields on both sides. Normally these prevent EM interferences from the outside to get in and mess with electric signals over the PCB. We can use them as part of the "cooling" system later.
3. OPTIONAL - efficiency yet to be determined/great risk involved - use either a special oven (not microwave !! it WILL kill the phone!) or a smd rework station to pre-heat the mainboard. Temperature must be set at around 95-110 degrees. Board must be heated from both sides, or at least one at a time, beginning with the one opposing the cpu side. Let it preheat at least 10 minutes.
3a. after preheating, use an aluminum foil to cover the rest of the components, anything other then the cpu itself then get to the actual heating, switching first to 250 degrees and directing the air stream on the cpu itself (using a larger nozzle for the tip of the heating gun). After 2-3 minutes of 250 degrees, swich to 340-360 degrees and heat the chip for another 5minutes. Move the heating gun around the surface of the chip and try to heat it evenly. If you have the guts and you are crazy enough use a knife with a larger blade and put the tip of the blade in the hot air stream in front of the cpu. Let it heat for a while, and also, continue heating the cpu. When the blade tip is hot enough press the chip with it , starting from the center and following each side. Apply even force on each press and try to have the blade as parallel with the chip possible. Don't press too hard, if you haven't kill the chip yet, that will kill it.
3b. let the board to cool down on it's own and during cooling try not to move it or do anything to it.
4. place a little amount of thermal grease on top of the cpu then place 1-2 mica foil pads (depending on thickness) over the cpu. Gently press the mica foil with one finger over the cpu. Now place more thermal grease over that mica foil and try to place the metalic shield over that area. If successfully done, the metallic shield should be in contact with the mica foil and the grease. Place back all shields on the main board.
5. On the phone's casing, measure the back of the display and try to cut an aluminum sheet of exactly the same size. If the sheet you can find is too thick - polish it and place it in a solution of either caustic soda or ferric chloride. This will get it thinner, but you have to supervise the process as if you leave it for long, the sheet could get completely dissolved. Check the sheet on short intervals (1min) to see the progress. Always use gloves and eye protection as both substances are dangerous (never mix them, use only one of them, the one you can get or already have). Once done, you will have a thin aluminum sheet that's flexible and about 1mm thick.
6. notice there are some ribbons connecting the display to the motherboard or other exposed metallic contacts. Before placing the aluminum sheet over the display's back, place some insulating tape over those metallic contacts to prevent any shortcircuit forming between them and the aluminum sheet. Next place the aluminum sheet over the display's back. Be careful not to damage any connector or ribbon in the process.
7. place more thermal grease on the cpu's metallic shield and check to see if the motherboard gets in good thermal contact with the aluminum sheet you just placed over the display's back. If there is still some space between them, use another mica foil and place thermal grease on both sides of it.
8. reassemble the phone, and make some tests to see if you get some improvements.
One more thing, this little project of our is in a "more to be seen/tested" state. As of now... only one device was fixed by this method - mine, it could have been simple luck. I don't know yet. more then a week later (strange weather also, + 20 degrees outside then last time i wrote the original post) the phone still works ok. Now running 1.3Ghz overclocked with NAND Android
@ januszgorlewski i remember the first time the phone was vibrating 7 times and i didn't know about this problem, i though it was an WM6.5 Energy Rom feature .

yep.. more than 2 weeks have passed and after i completed all possible tests the phone still works ok.
About 22-25 roms flashed (wp7, wm6.5, android, ubuntu) phone was used either normally or heated with a hair dryer. At about 30 degrees ambient room temperature, i run some 720p testing and manage to run sample videos until battery died out, then rerun the videos while charging (charging induces more heat also).
In all those 2 weeks i had only 2 restarts, both in wp7 (can't remember what rom version did that) and both occurring when i was setting up the phone after the phone update. Phone was cold however. I didn't manage to produce more restarts either when the phone heated up or i tried running intensive apps on it. Guess it was software related.
So.. i guess it's over with this problem.

This thread is awesome. I've opened my HD2 a few times in the past week to replace the LCD. Between the first time and opening it last night the LCM (LCD and touchscreen module) was slightly loose from the chassis so the screen was protruding a little and the front AP buttons were sunken. This was the case for a period of about a week, during which I noticed the phone would get very, very hot towards the bottom on the posterior surface, beneath the battery cover (around the area of the main board). Last night I properly assembled the whole device and it's now completely flush. The overheating doesn't seem to be occurring now.
I wasn't experiencing any restarts or lockups during the time it was overheating.
When I can get hold of the materials I'm definitely trying your heatsink. Thanks for sharing this.

Packaging Reliability - 7 vibration lock-ups
Thanks facdemol for your investigation and sharing.
I have been typing a lengthy description of what happend to me and then my browser hang - annoying so now only the short version
My long awaited factory fresh HD2 that failed exactly as described within 2 month at winter season. I was lucky get the original SPL back after storing it on the balcony at minus 5 and flashing from SD. Please mind, that for me the issue was heavily accelerated with the HD2 plugged in.
Yesterday I got mine back from warranty repair with the main board swapped. Since I am now anxious about this to happen again I asked about similar experience from others which has been denied. After reading this I recommend starting a petition, as this is obviously an wrong thermal design. I work as an packaging engineer and can access this as x-ray, ultrasound microscope (water bath only) and infrared imaging. Even though I dont have the time start this petition I would offer to help putting some serious reliability research behind.
So you could donate malicious hardware for inspection, as mine is still in warranty.
Few years back I had a good RMA experience with my Canon camera that died in warm humidity. After some research in the net I found the policy that all models will be replaced for exactly this failure -no matter when it occurs, as it was a design error (wrong material for CCD attach at this case).
So please people with thermally instable snapdragon devices STAND UP and ask HTC for seriously handling these mistakes. They should replace even after the warranty expiration if they only admit, it was their design flaw...
I for myself will probably try to stress my repiared HD2 in order to have this failure again and then I can opt for exchanging the device. Buth then, what device to buy? For the dual cores this might be even worse. Suppliers do not have long enough life cycles for their products to really do good redesign.
Keep it up

I had 7 vibrations while on the plane, just switched on, play solitaire 5min, then reset-7vibrations, took battery out, start it up, same 7vibrations, about five times same cycle.
Then i thought, ok I am done now-it's a brick, as PC's have BIOS which can tell you by beeping what the hell the problem is. In this case, no idea... Then i found this thread.
@ facdemol I might try only a heatsink - reheating cpu not needed, right? but i will wait for insurance exchange of my phone...

sir first of all i want to thank u for this excellent post . . . . Cn u tell some other easy material than mica foil pads which may b available .. I hav same prob with my hd2 with expired warranty .

the mica crystal pads should be available at any electronic components store. If you can't find any, you could try to substitute them with any other similar purpose material. Use only thermal pads used in electronics for semiconductor (transistors mostly) thermal dissipation. However from what i know or can test, the mica ones are superior to other designs or materials.
Also, good quality thermal paste is a MUST. Cheap one tend to dry out or loose effectiveness over time.
@ profahmad - yes, the back of the lcd unit is metallic. Normally it was not intended to provide heat dissipation, neither is in direct contact with the heat making components, but it takes some of the heat and spreads it over it's surface. What i did is to forcefully use this piece of metal along with the materials i used for the "heat sync" in order to facilitate better thermal dissipation. The HD2 is build on the "edge" as you can see, even if the display unit is removed or improperly mounted, the small effect in cooling the board it once had is enough now to provoke some of the thermal issues.
@januszgorlewski reheating is very risky without solid previous experience. Simply reheating the cpu didn't solve the problem for me, it only ameliorated it a bit. The new heat sync did the trick so i suspect you can skip reheating with not much of a loss in effectiveness. However i should have experienced with more devices in order to know for sure the effects of each stage of my experiment.
@sqeeza yes, a petition could be filed out. However, there are 20-30 topics in this area about hd2 freezing or restarting but most people don't know there is a thermal problem related with these events. If we advertise the problem and it's cause to these people they could run some simple test to determine if their phones are also suffering from this problem.

Related

[BUG FIX] Phantom keypress and screen shot

I've been working on fixing this issue for awhile. Here's the deal:
The problem.
The four keys at the bottom of the phone are monitored by a melfas touchkey chip (http://www.melfas.com/english/touch/sensor.asp) that connects to the main processor via an I2C bus (http://en.wikipedia.org/wiki/i2c). The melfas chip generates an interrupt whenever one of the keys is touched or released. The processor then reads the key value from this chip over the i2c bus. The problem is that the touchkey chip is located right next to the 3G antenna. When the phone is accessing the 3G network the RF energy gets transferred to the interrupt and i2c clock and data lines causing false interrupts to occur. The processor responds to the interrupt by reading the key value from the cypress chip. The symptoms occur more frequently in low signal areas because the phone outputs a higher RF level in those situations which causes more RF interference on the interrupt line.
Most of the time when a false interrupt has occurred the touchkey chip will return a value of zero for the key and the driver will recognize this as a bad key press and ignore it. Sometimes the RF interference on the i2c clock and/or data line causes a valid value to be returned and the driver reports a key press value to the application. In the case where the driver reports a ‘back’ key down, the software sees this as holding the back key down so when you press the power button you get a screen shot. The easiest way to cure this is to always press and release the back key before pushing the power button. This causes the software to see both a key down and key up event which cancels the screenshot mode.
This RFI induced touchkey interrupt happens hundreds of times per second when the phone is using 3G. It produces lots of different symptoms including applications that always seem to shut down. A wide variety of problems can be attributed to this failure. In addition, the processor spends a lot of time servicing these bogus interrupts, which take cpu time away from the other applications. This can make the phone appear to be slow or even freeze up for short periods of time. There’s a good chance that most people have experience this to some degree without realizing the root cause.
Solution one. Fix the driver.
Since this is a true hardware failure, a software solution is going to be less than perfect. After dozens of experiments rewriting the interrupt service routines in the driver I’ve settled on a combination of fixes. The first is to re-test the interrupt input line several times. In normal operation when you touch or release a button, the touchkey chip drives the interrupt line low and keeps it low until the driver reads data over the i2c interface. Since the RF interference is a sine wave and is being sampled it causes the interrupt line to go high and low at a fast rate. Sampling the line multiple times in software increases the chance of finding it in the high state. This is done both in the interrupt handler and then again in the interrupt thread. About 90% of the false interrupts are filtered out by testing the line in the handler. If the interrupt handler doesn’t find the line high after 10 samples, it masks the interrupt so that another falling edge doesn’t produce another interrupt. In testing I’ve noticed that the interrupt handler would run multiple times before the interrupt thread was even called. Once in a while, so many interrupts would get stacked up that the phone would just reboot. It was probably a stack or buffer overflow that wasn’t being handled. Remember, this interrupt would happen many hundreds of times a second. About 90% of the remaining false interrupts are filtered out by sampling this line in the thread. That leaves about 1% of the interrupts that need to be further tested. The second test is to read the data from the chip and discard anything that isn’t a valid key press value. This is easily done with a case statement. Finally, since occasionally a bogus valid value will get through, I set up a timer so that any key down event that doesn’t have a corresponding key up event within 3 seconds is canceled by calling the all_keys_up routine.
This combination all but eliminates the symptoms produced by this failure. The only draw back is that the processor still spends a considerable amount of time servicing the false interrupts. And rarely a phantom keypress does get through. In all, it’s a fairly good piece of duct tape and JB Weld.
During my experiments I used a copy of the kgb kernel. My version with the modified driver is in github at https://github.com/dmriley/kgb. If you want to try this yourself, be sure to use the ‘dev’ branch.
Solution two. Fix the hardware.
There are three signals that connect from the melfas touchkey chip to the processor. They are the two i2c lines: sdc which is the clock and sda which is the data. The third line is the interrupt. In troubleshooting this problem, I took my phone apart and put oscilloscope probes on the three lines. This allowed me to see the real cause of the problem. Since the interference is RFI (or EMI) the only real way to fix the problem is to either remove the RF or make the impedance of the signals much lower. Removing the RF is easy if you don’t need to use 3G. When the phone is using wifi (or no network connectivity at all) the problem does not exist. Also, when you are very close to a cell tower, the phone transmits at a much lower level. This lower level greatly reduces the RFI. Lowering the impedance is a little harder. I2C uses active pull down and passive pull up for the logic levels for both sda and sdc. This means that the impendence is mostly governed by the pull up resistor. This resistor value is typically upwards of 1kohm and probably as high as 3kohms (I didn’t measure it in this phone). Since the impedance only needs to be lowered for the 3G frequencies of around 800MHz, a capacitor can be added from the signal source to signal ground. At 800MHZ a 100 pf cap is about 2 ohms (1/ 2*pi*f*c). That’s a couple of orders of magnitude lower than the pull up resistor alone, and much too low for the RF signal to induce any significant voltage on the line. This value is also low enough not to interfere with the signal rise and fall times for the interrupt line. In the case of the interrupt line, the melfas chip drives the signal low and keeps it low until the interrupt is serviced. Discharging a 100pf cap with a 2mA driver takes only microseconds. This much delay is not noticeable when touching the key and is much less than the amount of time that the processor takes to service the interrupt.
Adding the cap to the interrupt line eliminates false interrupts. A chance does exist that a valid key event during 3G access could cause an incorrect key value to be returned due to RFI on the clock and data lines. The i2c protocol is designed to compensate for capacitive loading on the lines. Although it would cause the clock period to be stretched out significantly it would still only take milliseconds to read the key data from the chip. The difference would be imperceptible. To date I have only added the cap to the interrupt line and have yet to experience an invalid key press.
I’ll post pictures of cap mod.
Summary.
Most people will be satisfied using the software fix. I think that a couple of the kernel devs are incorporating some or most of the driver mods outlined in this document. Both comradesven (kgb dev) and ssewk2x aka Efpophis (glitch dev) were involved in the test and debug process. Much appreciation is given to both of them for the help that they gave me and for allowing me to use and hack up their code on github. Efpophis saved me hours of searching through code. Without their help, I’d still be unable to build a kernel.
UPDATE:30 Mar 2012
The phone had been working fine since the mod. I hadn't seen a screen capture or any of the other symptoms. Then, a couple of nights ago, while I running maps on 3G (a data intensive app) the touchkey backlights started flashing rapidly like the phone was having a little seizure. And then it happened, the voice search popped up. A couple of debug kernels later I've come to the conclusion (and I'm never wrong) that the clock line (SCL) going to the melfas chip was being toggled by the same RF interference that was causing the false interrupts. A random clock along with random data was causing the chip to turn the backlights on and off as well as generate a false interrupt. I was able to reliably duplicate the problem in a couple of really low signal level areas (not hard to find when you live out in the boonies).
I tore the phone apart (again) today and added a 100pf cap to the scl line right next to the chip. I also added another cap in parallel with the 100pf on the interrupt line. I spent about 1/2 hour tonight running 3G data apps in the same location where the problem first appeared. So far, no problems and none of the debug messages have shown up on dmesg.
If anyone wants pics of the added cap I'll open it back up, no problem, otherwise if you look at this photo you can see which pin is scl (although I incorrectly labeled it SDC in the photo). http://forum.xda-developers.com/attachment.php?attachmentid=953824&d=1332117055
If anyone tries these mods I'd be real interested in your results.
Here are some pictures of the cap mod:
this is the open phone showing where the melfas touchkey circuit is:
View attachment 951774
Awesome, thanks for doing this for all of us. Phantom key press is really annoying
Sent from my SCH-I500 using XDA App
the cap. yeah, that's a normal size pen to show scale
View attachment 951812
on the board
View attachment 951821
with notes
View attachment 951820
the antenna problem
View attachment 951822
close up showing touckey circuit. micro sd card for scale
View attachment 951834
my finger
View attachment 951836
back off
View attachment 951838
another view
View attachment 951837
BTW, I took these pictures with my son's fascinate
Wow, we're lucky to have someone as capable as yourself figure out this annoying issue! I've kinda kept up on your work, but seeing this breakdown and the photos is helpful in understanding the root cause of the problem. I do wonder sometimes how Samsung missed this issue in their testing, but at least we have custom kernels that implement your fixes and dramatically reduce the phantom presses!
Uuuhm...You're an awesome human being. Holy crap. -_-
That's some amazing work, thanks!
k_nivesout said:
.... I do wonder sometimes how Samsung missed this issue in their testing, but at least we have custom kernels that implement your fixes and dramatically reduce the phantom presses!
Click to expand...
Click to collapse
Yeah, it's crying shame that Samy couldn't fork over the extra penny to keep this problem from happening in the first place.
sendan said:
Uuuhm...You're an awesome human being. Holy crap. -_-
That's some amazing work, thanks!
Click to expand...
Click to collapse
wasn't just me. had help from other members here. I didn't even know where to start looking when I first started. It's so cool that people are willing to do the level of work that the devs here do without expecting anything back.
electric bill said:
wasn't just me. had help from other members here. I didn't even know where to start looking when I first started. It's so cool that people are willing to do the level of work that the devs here do without expecting anything back.
Click to expand...
Click to collapse
Thanks so much for all the work, and the detail in your post. It is amazing the work everybody does here and the knowledge you pass on to us.
I do have a few questions
Would you mind sharing what kind off iron you used? is that the most bottom line on the board you soldered to? If so, did you have to scratch it or something first? Is it the farthest left line on the chip that was used? Do they make caps that size with leads coming of the 2 sides, and if so would that be a easier mod? Is there a positive and negative side to that capacitor?
I'm really thinking about doing this, if i decide to would you mind sending me 5 of your extra caps for a $10 donation?
Sent from my SCH-I500 using xda premium
Ditto on the $10.00
neh4pres said:
Thanks so much for all the work, and the detail in your post. It is amazing the work everybody does here and the knowledge you pass on to us.
I do have a few questions
Would you mind sharing what kind off iron you used? is that the most bottom line on the board you soldered to? If so, did you have to scratch it or something first? Is it the farthest left line on the chip that was used? Do they make caps that size with leads coming of the 2 sides, and if so would that be a easier mod? Is there a positive and negative side to that capacitor?
I'm really thinking about doing this, if i decide to would you mind sending me 5 of your extra caps for a $10 donation?
Sent from my SCH-I500 using xda premium
Click to expand...
Click to collapse
I did the mod at my workplace under a microscope. I used a metcal (http://www.okinternational.com/product_soldering/mx500) soldering iron but you could use just about any low wattage iron with a really fine tip.
There's four pins on each side of the melfas chip. One end of the cap is soldered right to the interrupt pin which is the closest to the corner. the other end is connected to the ground side of C2 via a solder bridge.
View attachment 953824
I doubt that they make caps that small with leads on them. You could look. It's not hard to make the solder bridge. Remember the scale that were talking about here. That cap is 0.06 inches long by 0.03 inches wide. I wouldn't try to scratch the solder resist from the board because it's a flex circuit on top. Also, the cap is not polarized.
I bought a hundred of these caps for less than $6 including shipping. I'd feel terrible charging someone $10 for five. If you pm me your address I'll stick a couple in an envelope and send them. If you want to give away ten bucks, donate it to a charity like destiny rescue or UMCOR (http://new.gbgm-umc.org/umcor/about/financialinformation/).
Disclaimer:I've been working with parts this size for years and am pretty good at soldering. You risk dorking up your phone if you don't do this correctly. Only attempt if you are skilled at soldering. All information is presented "as is" and without warranty for fitness or use. Your mileage may vary. Void where prohibited, taxed or licensed.
What is the easiest way to implement the band-aid software fix?
I am on CSpire so there are not many proven custom roms out there.
IamUmpire57 said:
What is the easiest way to implement the band-aid software fix?
I am on CSpire so there are not many proven custom roms out there.
Click to expand...
Click to collapse
The fix is in the kernel. I used the KGB kernel as the source for my build. You can download it from github and build your own. If you're running all stock (rom & kernel) you can mod the stock kernel.
I'm really not the expert here on choices. Maybe someone else could chime in.
Too tiny to solder so band-aid?
Excellent research, fix and documentation. I was going to follow the fix, but, when I finally got the phone disassembled, I saw that the bits were much too small for me to solder. And I'm an ex-electronics guy who's worked on surface mount stuff before, so I doubt amateurs will have much luck, either.
So the problem is that RFI is hopping onto the I2C and interrupt lines... Could we just block the RFI? Sure. A grounded piece of aluminum foil which covered the whole Melfus+lines area should do that. So I tried that. Worked great for the soft keys, but, for reasons not apparent to me, my phone would no longer do 3G (stuck in 1X). Perhaps because the big old piece of grounded foil in the middle of the 3G antenna soaked up too much signal?
How about not grounding the Aluminum foil? It wouldn't be tied to ground, so the potential of the Alu foil would wobble, but it might prevent enough RFI from reaching the I2C and interrupt lines.
I opened the phone back up and squished the Alu foil a bit so that it just covered the Melfus chip and the lines heading to the left, and so that it didn't touch what-I-think-is the ground plane right at the upper edge of the PCB. Now, the piece of Alu foil was a rectangle about 6mm wide and 3mm tall. Seems to prevent softkey misfires and my phone seems more responsive. Assuming the results hold, this is a 5 minute fix for the issue and it doesn't require anything more than a tiny screwdriver, a spot of aluminum foil and a moderately steady hand. Wish me luck!
CoffeeDregs said:
Excellent research, fix and documentation. I was going to follow the fix, but, when I finally got the phone disassembled, I saw that the bits were much too small for me to solder. And I'm an ex-electronics guy who's worked on surface mount stuff before, so I doubt amateurs will have much luck, either.
So the problem is that RFI is hopping onto the I2C and interrupt lines... Could we just block the RFI? Sure. A grounded piece of aluminum foil which covered the whole Melfus+lines area should do that. So I tried that. Worked great for the soft keys, but, for reasons not apparent to me, my phone would no longer do 3G (stuck in 1X). Perhaps because the big old piece of grounded foil in the middle of the 3G antenna soaked up too much signal?
How about not grounding the Aluminum foil? It wouldn't be tied to ground, so the potential of the Alu foil would wobble, but it might prevent enough RFI from reaching the I2C and interrupt lines.
I opened the phone back up and squished the Alu foil a bit so that it just covered the Melfus chip and the lines heading to the left, and so that it didn't touch what-I-think-is the ground plane right at the upper edge of the PCB. Now, the piece of Alu foil was a rectangle about 6mm wide and 3mm tall. Seems to prevent softkey misfires and my phone seems more responsive. Assuming the results hold, this is a 5 minute fix for the issue and it doesn't require anything more than a tiny screwdriver, a spot of aluminum foil and a moderately steady hand. Wish me luck!
Click to expand...
Click to collapse
That's great work. I tried that initially with some foil tape over the whole melfas chip without success. This was all documented in the github problem log but it got deleted when the ticket was closed out. In my basement where I was doing my testing, the signal strength is very low so it's a worst case scenario. Maybe the shield will work better if it's shaped just right. I'm not an RF guy so my shield was just a guess. Share some pics with us if you find a solid solution. The shield would be much easier to implement.
electric bill said:
I tried that initially with some foil tape over the whole melfas chip without success.
Click to expand...
Click to collapse
What was not successful about it? You still had phantom keypresses or you lost 3G?
Also, how did you ground the foil? I grounded it against what I thought was a ground plane. And I covered the entire L-shaped assembly (Melfas, lines and all).
[Stating the obvious...:] The idea of covering the Melfas chip and lines with foil assumes that the RFI is getting to the lines from above the chip+lines. The foil wouldn't do anything were the RFI hopping over from elsewhere. But AFAICT the top layer of the PCB is a ground plan and the signal lines head down into buried layers directly from the connector, so I'm not sure how else RFI could get the I2C lines except from in the module...
My un-grounded foil seems to be an improvement, but not a fix, so I might try grounded-foil again and try to figure out why it killed my 3G.
Good to hear that you have a microscope; I still have 20/20 vision as a 40yo, but that's a tiny little area!
I gotta say that I am wildly disappointed in Samsung. If a few electronics-savvy folks polking around the interwebs can find root cause and propose multiple fixes, it's shocking that Samsung won't acknowledge it, much less fix it. I'm due a phone upgrade and I'd love to get an SGS III, but I really don't trust Samsung.
CoffeeDregs said:
What was not successful about it? You still had phantom keypresses or you lost 3G?
Also, how did you ground the foil? I grounded it against what I thought was a ground plane. And I covered the entire L-shaped assembly (Melfas, lines and all).
[Stating the obvious...:] The idea of covering the Melfas chip and lines with foil assumes that the RFI is getting to the lines from above the chip+lines. The foil wouldn't do anything were the RFI hopping over from elsewhere. But AFAICT the top layer of the PCB is a ground plan and the signal lines head down into buried layers directly from the connector, so I'm not sure how else RFI could get the I2C lines except from in the module...
My un-grounded foil seems to be an improvement, but not a fix, so I might try grounded-foil again and try to figure out why it killed my 3G.
Good to hear that you have a microscope; I still have 20/20 vision as a 40yo, but that's a tiny little area!
I gotta say that I am wildly disappointed in Samsung. If a few electronics-savvy folks polking around the interwebs can find root cause and propose multiple fixes, it's shocking that Samsung won't acknowledge it, much less fix it. I'm due a phone upgrade and I'd love to get an SGS III, but I really don't trust Samsung.
Click to expand...
Click to collapse
Yeah, I used what I thought was a ground pad and covered pretty much everything on that little flex board that has the chip on it. It didn't stop the problem. Also, I had a bunch of dmesg stuff in the driver so I could see every time that there was a "missfire" vs just seeing the actual symptoms. A shield could theoretically fix the problem, I'm just not a RF engineer so I went with what I know. With the microscope, it's pretty easy to add the caps. Without, it'd be kinda hard. It probably only took me 20 minutes or so to do the last one I did. The good news it, the cap fix does the trick 100%. We've been running it on three phones without a problem for a few months now.
I totally agree on Samsung's failure. That design defect should have been caught pretty early in development. Maybe these guys have never heard of a Peer Review . It's even sadder if they knew it might be a problem but decided to risk it to save 1/2 cent per phone.
I understand the corporate mentality of denying a problem exists (iphone signal loss is a good example). If they admit it, then they have to fix it and that would be very costly. I'm sure when they started to have a problem they did a cost analysis and decided that losing N number of customers was cheaper than actually fixing all the bad phones.
What made it even worse was trying to find info on the phone design. Samsung was completely unresponsive when I contacted them to get data sheets on the CPU and other info on the phone. It's as if they didn't want me to solve the problem. Come to think of it, they probably didn't want me to. Solving it verifies that the problem exists and isn't just user error.
Anyway, now with my phone fixed and the excellent AOKP ROM and Glitch kernel, I love my fassy.
electric bill said:
Yeah, I used what I thought was a ground pad and covered pretty much everything on that little flex board that has the chip on it. It didn't stop the problem. Also, I had a bunch of dmesg stuff in the driver so I could see every time that there was a "missfire" vs just seeing the actual symptoms. A shield could theoretically fix the problem, I'm just not a RF engineer so I went with what I know. With the microscope, it's pretty easy to add the caps. Without, it'd be kinda hard. It probably only took me 20 minutes or so to do the last one I did. The good news it, the cap fix does the trick 100%. We've been running it on three phones without a problem for a few months now.
I totally agree on Samsung's failure. That design defect should have been caught pretty early in development. Maybe these guys have never heard of a Peer Review . It's even sadder if they knew it might be a problem but decided to risk it to save 1/2 cent per phone.
I understand the corporate mentality of denying a problem exists (iphone signal loss is a good example). If they admit it, then they have to fix it and that would be very costly. I'm sure when they started to have a problem they did a cost analysis and decided that losing N number of customers was cheaper than actually fixing all the bad phones.
What made it even worse was trying to find info on the phone design. Samsung was completely unresponsive when I contacted them to get data sheets on the CPU and other info on the phone. It's as if they didn't want me to solve the problem. Come to think of it, they probably didn't want me to. Solving it verifies that the problem exists and isn't just user error.
Anyway, now with my phone fixed and the excellent AOKP ROM and Glitch kernel, I love my fassy.
Click to expand...
Click to collapse
Yeah: dmesg would be lots better!
My foil status: decent. I'm getting a lot less buzzing, but I still do get **some** in low signal areas (my bedroom). So I'm happier.
Samsung's response: I'm not at all surprised. I used to be an FAE for Cirrus Logic and worked a lot with ARM processors (back in 2000-2003). I got ahold of some of Samsung's datasheets on their ARM processors and was staggered: the datasheet was about 4 pages long and was full of errors, inaccuracies or glossings-over. Our datasheets were 40 pages long and we had 200 page programming manuals available on the web. You got no love from Samsung unless you were looking to buy 5M chips.
Anyways, thanks for you research and help!
I'll be giving that kernel a shot!
Second cap
I finally got around to mod'ing our last phone. Actually, I was finally able to pry it from my teen's hands long enough to do the work. I think she sat home all afternoon and twitched.
Anyway, here's a pic of the two caps. One is on the interrupt line and the other is on the clock (or scl) line. I melted the insulation from a piece of real fine magnet wire to connect between the clock pin and the second cap. The other end of the second cap is just solder bridged to the same ground as the first cap.

[Q] [Request] Overheating ROM

Long story short, my touchscreen has stopped responding. Was working yesterday for a while after squeezing the sides, so don't think its completely dead (yet). Have had a similar experience many months ago due to a central pressure mark due to sleeping on it, which is/was completely resolved by overheating the device a few times.
Managed with some difficulty yesterday to turn on show touches in developer options and can see that a touch is constantly registered at the upper left corner (no pressure marks), and today screen is completely dead, and can't manage to get the phone to overheat to try and fix it.
So rather ironically can anyone suggest a ROM that is guaranteed to overheat the battery/phone, current ones seem to work too well, or any other suggestions to get it to overheat (adb commands to overclock CPU?), as a possible temporary fix.
ive always found WP roms to run at a retardedly high temp.
wifi, charging, and screen on ... seems to always do it for me.
best of luck with your oh so odd request.
I think it might be problem more with digitizer ribbon hidden under hang up button which might be damaged.
But if you want to heat up your phone a little, I'd recommend Paranoid 1.9 (Jelly Bean). Older versions made my phone hot, especially with some games like GTA or Max Payne.
Hairdryer
What about just using a Hairdryer.
You can control the temperature a lot better.
Spaqin said:
I think it might be problem more with digitizer ribbon hidden under hang up button which might be damaged.
But if you want to heat up your phone a little, I'd recommend Paranoid 1.9 (Jelly Bean). Older versions made my phone hot, especially with some games like GTA or Max Payne.
Click to expand...
Click to collapse
Pretty sure it is the ribbon as well, heat seems to be a temporary fix though, will take a look when i find my tools
QUOTE=flaep;32561316]What about just using a Hairdryer.
You can control the temperature a lot better.[/QUOTE]
Have been using a halogen heater to heat it, and provides some temporary results, but think (partly from experience) that having the heat being generated from the device itself, i.e. battery thus heating the back of the screen provides a better result

How I fixed my Nexus 5X heat and boot loop issue.

Hi there,
I was really frustrated when my favourite Nexus 5X got freezed and bootlooped, There weren't even any complete boot sometimes,Just it kept on and off. All I have seen was Google logo, and rarely some boot animations. Eventually, it turned off with in few seconds. I tried to flash 4 core kernals, and almost all stock roms. (6,7,and 8) but nothing worked!Strangely, I could boot a little by using this image 7.1.2 (N2G48C, Aug 2017). However that too turned off with in 2-3 minutes. I tried everything I could on the software part. So I decided to open my phone , after watching few YouTube videos I decided to try some what similar to copper heat sink mod, but I used Aluminium plate instead of copper. I cut it from some Aluminium fabrication left out,
First, I removed the yellow net like thermal pad from phone very carefully, I didn't want to destroy it.Then I applied thermal paste on both side of the prepared Aluminium plate and placed the exact place where yellow thermal pad was situated. I removed the outer covering of the motherboard CPU as well as cleaned all chip sets by using PCB alcohol. Then applied thermal paste over nearby chips . There after, I carefully applied thermal paste over the main CPU , and fixed the yellow thermal pad from LG above the CPU. I meticulously reattached the outer covering of the CPU and other chip sets.( I didn't cut the metal like covering, because I didn't want to damage any components of my phone) Then placed the motherboard above the Aluminium plate that I already fixed inside the phone.( Previously, Yellow thermal pad was here). Closed my phone , reattached all screws and turned it on.. Wow its working for me now! Running all 6 cores without any issue. I believe the yellow thermal pad and Aluminium plate combo effectively tackling the heat issue. It might be working the same way like a computer CPU works, thermal pad+ heat sink. When I placed the yellow thermal pad above the chip set and Aluminium plate where thermal pad was present, it worked perfectly.I tried to place 2 Aluminium plates combo without thermal pad, it added more stress to the CPU also never worked. May be Copper is a great alternative , but for me Aluminium was readily available and easy to cut just by using a scissor. Now I don't want to update to Oreo and more. I opened developer options and turned off automatic system updates, and also turned off all notifications. It's been 2 days since I modded my phone. Still it's working without any problems. So I thought It would be nice if I post it here.
hxxps:imgur.com/a/t0riR
hxxps:imgur.com/a/vfzth
hxxps:imgur.com/a/TjETN
hxxps:imgur.com/a/xMDVi
hxxps:imgur.com/a/Xty3e
hxxps:imgur.com/a/9BBAS
Thank you.
Can you make some pictures inside phone
slobo2712 said:
Can you make some pictures inside phone
Click to expand...
Click to collapse
Sorry, I don't want to open my phone again and end up in another boot loop, but this video is very helpful.
hxxps:.youtube.com/watch?v=TTHAbarHebg
Main points to note,
1. I didn't use Copper plate, I used Aluminium plate.
2. I didn't cut the metal like shield that covered the CPU,because I don't like to damage my phone.
3. I didn't discard the yellow thermal pad like others did, Instead I fixed it above the CPU/RAM chip by using thermal paste.
4.Everything else is almost the same.
Please be careful with the phone components , its really fragile.
New Mod-Multi layer Heat sink.
I tried a small aluminium heat sink, and it worked fine without any issues but when I tried to play high-end games, and when CPU temperature has gone above 60-65 degrees it hanged and shut down, but never experienced any boot loop. After keeping the phone idle for few minutes it started working again.
However, I decided to go a bit further and opened my phone again,but when I turned it on it boot looped again, constantly turned on and off for a while.So I decided to find out why this happens and ultimately wanted to find a solution to fix my phone.
This is how I fixed my Nexus 5X.
1. First of all, there is a need to heat the CPU/RAM chip set for few seconds by using a heat gun or any other favourable equipment.This is necessary , because after the procedure is complete phone should be turned on and boot perfectly for a while. (It needs a start) (Optional)
2. When It cools down, clean the top part of the chip set ( I used PCB alcohol)
3.Apply thermal paste above the chip sets, specially above CPU/RAM chip. (Apply only recommended quantity)
4. I cut the metallic shield that covered exactly above the chip set . So that there would be better airflow and heat sink contact.
5. There were few old PC-CPU fans at my home, so I chose one and removed the metallic sheath beneath the CPU fan, It was very easy to cut,due to it's thinness. [( It provides better heat flow) (Optional). Certain fans have metal like sheath under the heat sink where we usually apply thermal compound)]
hxxs:imgur.com/a/kNNyW
6. I also found an old damaged graphics card at my home, I carefully removed the thermal pad from it's chip set, I felt that it's quality is pretty high as compared to the one LG placed inside the phone .Still I didn't discard the thin LG thermal pad.
After reshaping the thermal pad:
hxx:imgur.com/a/lwOks
7. First I placed the thin LG thermal pad above the chip set, there after I fixed the high quality thermal pad and closed the metallic shield. As you can see thermal pad and metallic shield all at the same level.
hxx:imgur.com/a/jjUiW
8. I made a small Aluminium heat sink like this (0.8mm Copper Shim or thinner is recommended)
hxx:imgur.com/a/xMDVi
9. I applied little thermal paste inside the plastic compartment and fixed the Aluminium plate inside it (Yellow thermal pad from LG was previously here)
10. Again I applied little thermal paste above the Aluminium plate and fixed the metallic sheath that I made from CPU fan.
11. Applied little thermal paste right above the metallic sheath and carefully placed motherboard above it.
hxx. imgur.com/a/8KHC8
12. Reattached screws and cover(Do not apply too much pressure, this step is very important, it might damage the motherboard, If it doesn't fit inside , find whats gone wrong! All materials needs to be thin.), when I turned my phone on first time it turned off suddenly, so I enter into fast boot mode and placed my phone like that for 5 minutes ( I wanted my phone to wake up)
13. Then I again turned on , It booted without any hassles, I tried to play plenty of games, I could feel the heat but never like before. Also it never froze or shut down even after crossing 60 degrees ( Heat suddenly lowers to 40 C even it crosses 50 or 60 )
14. Now my phone is fine running all apps, running all 6 cores more over It works like a new phone which never experienced any boot loops in its life.
I think the core shut itself down due to heat, If heat can pass through some high quality heat sink and thermal pads, Nexus 5 will live again! And none of the cores will shut down again. If its ever shut down it would quickly recover and there will never be a boot loop problem. :good: that's what I learned from this issue.
Performance update after mod (After continuous use);
hxxp.imgur.com/a/8QDD9
hxxp.imgur.com/a/yDVEi
hxxp.imgur.com/a/vaxEk
hxxp.imgur.com/a/Q4XeX
Tested Android 8, 7, and 6 all works, but I personally prefer 7.1.2.
Sorry but i dont understand how did you fixed with that method.You have not soldered chip or anything,that can maybe only "prevent" working device to bootloop?Someone correct me if i am wrong.Tnx
slobo2712 said:
Sorry but i dont understand how did you fixed with that method.You have not soldered chip or anything,that can maybe only "prevent" working device to bootloop?Someone correct me if i am wrong.Tnx
Click to expand...
Click to collapse
No, I didn't replace any chip or soldering crack , I have used my phone rigorously since this mod. Phone is working great, and I feel it is working better now than ever before.
All I used, high quality thermal pad from graphics card chip set,good thermal conductive metal and a heat sink. Probably the CPU is shutting down to protect itself,due to intense heat, If a thermal pad and heat sink combo can effectively regulate heat then the CPU would work. I am running all 6 cores without any issue, I have continuously watched 2 movies but it still worked perfectly. If we remove the CPU fan from a working computer motherboard, computer would not boot even if it boots it will shut down quickly or freeze, and slowly it will damage the CPU, the same problem is happening here too.
Even if there was a crack, I heated the chip set and fixed it temporarily, then used this heat sink mod, the mod is not allowing the CPU to fail again because it can now conduct the heat effectively to another medium, thereby the crack problem is not happening again, as a result of this phone is working fine now.
Wow, so your phone is previously bootlooping and you attempted these steps then it works permanently? That’s amazing! Is your phone still working?

Seem to have fixed my screen issues.... by playing Mario Kart

My S7 (G930F) has had screen issues that are fairly typical to the model. When it gets a bit cold, the bottom three quarters of the screen go dim, then pink streaks start to appear from the bottom as it gets worse.
This can be be fixed to a certain extent by disabling auto-brightness, then turning the brightness down until the screen looks normal again. Once the brightness is even across the whole screen, it can be turned back up again a little. Sometimes rebooting it helps.
It's been happening very regularly, and it's been so bad that if I'm outdoors in daylight, I need to turn the brightness down to the point where it's hard to see the screen. Even at a comfortable room temperature, it's likely to happen if the brightness is up high. The screen is usually going weird in the morning too, when I first pick it up, until it warms up a little.
As I understand it, it's caused by a bad solder joint somewhere on the screen assembly. It gets cold, the solder contracts a little, enough to cause a bad connection.
Anyway, last weekend I was playing Mario Kart Tour online for quite a while, to the point where the phone was really warm in my hands. Now, ever since, I've not had any screen problems at all. I've had auto-brightness turned on the whole time, it's definitely been exposed to low temperatures, down to maybe 5 Celsius. It was cold to the touch this morning, and it was fine. All the conditions where it would usually happen, the screen has been fine, all week, even with the brightness slider all the way up full for extended periods.
I'm guessing what's happened is similar to the crude but effective trick of fixing a faulty PlayStation 3 by putting the motherboard in the oven for a while, or blasting it with a heat gun, except in this case I've baked the phone from the inside.
A relatively graphically demanding game, played online, constantly processing touch and gyroscope input, means that everything that can get hot has got hot. GPU, CPU, radio. The copper heat dissipation pipe inside has spread the heat across the back of the screen assembly, et voila, the bad solder joint has been cooked just enough to make a better connection.
I guess that like baking a PS3 motherboard, it will ultimately be a temporary fix, but it will be interesting to see how long it works for, and if it will work again if the problems start again.
No. You didn't fix anything, you just avoided the problem temporarily. This also happened on my Galaxy S7 and it is due to a partial disconnection of the display flex. Over time, due to thermal expansion and compression the flex' contact becomes loose. When it becomes loose, it disconnects partially when the device cools down because the display and the motherboard contract and get away from each other increasing the distance between the display and the motherboard. If you keep your device hot at all times it will have a reduced lifespan. Don't.
Indeed, you repositioned your flex by heating your phone because it is loose and it sits just right, but at any moment it can come off again.
There is only one permanent fix. You need to take your back panel off and let the device cool to room temperature (important!)
After that, disconnect the display flex completely and then plug it again and push it just tight enough to make a firm contact. Your display should work perfectly now no matter the temperature.
Note that this will remove your IP68 water resistance, but you shouldn't get your phone near water anyways.
My phone has been opened up, all the connectors carefully cleaned with isopropyl alcohol, then firmly pushed back into place.
in fact my phone is made from spare parts. the screen was leftover from a friend's S7, after i replaced their screen for them. they had already had it fixed once while it was under warranty, and the screen problems ultimately started again.
i kept hold of the screen and paired it with a mainboard i had left over from another S7. i wanted to establish if the problem really was the screen assembly, if it was just bad connections, and if it still behaved in the same way with another mainboard.
the problem, in this case, is most definitely the screen assembly. it has displayed exactly the same problems with two different mainboards, after both myself and the warranty repair service had worked on it.
also, if you read what i said properly, you'll see i'm definitely not suggesting keeping it hot all the time as a solution. i'm saying that it got really hot ONCE, for maybe 40 minutes or so, and ever since, it's worked perfectly for the longest period of time i've ever known it to go for without the display going weird. it's been down to 5°c or maybe lower since, i live in rural wales, and it's autumn.
If you read what I said properly, you should know that what you did is just a workaround to keep the connector in the correct position temporarily moving it with thermal expansion, you definitely didn't reflow anything because RoHS solder only starts to soften at near 160°C, far beyond what the Exynos TMU will let anything in the board heat up before tripping and shutting the entire motherboard down.
If the problem persists then try putting a small piece of rolled electrical tape above the display connector to make some extra pressure against the board.
yeah "read what i said again properly" was an unnecessarily dickish choice of words on my part. i attribute that to replying before finishing my coffee
so would you say from experience that the issue where the bottom half/three quarters of the screen goes dull is pretty much always down to the display connector, and a little extra pressure holding it in place will stop it happening again? i have read conflicting reports, with a few attributing it to a bad solder joint in the screen assembly.
i'm not too worried about opening the phone up again, once the original adhesive has been removed and replaced with the third party pre-cut stuff, it's a bit easier to open it up again.
Mr Creosote said:
yeah "read what i said again properly" was an unnecessarily dickish choice of words on my part. i attribute that to replying before finishing my coffee
so would you say from experience that the issue where the bottom half/three quarters of the screen goes dull is pretty much always down to the display connector, and a little extra pressure holding it in place will stop it happening again? i have read conflicting reports, with a few attributing it to a bad solder joint in the screen assembly.
i'm not too worried about opening the phone up again, once the original adhesive has been removed and replaced with the third party pre-cut stuff, it's a bit easier to open it up again.
Click to expand...
Click to collapse
Okay, no problem
We all get grumpy without our morning coffee, especially after fixing tens of AOSP build errors and having to restart our 7-hour build.
I had the exact same problem you had, the three bottom quarters of the display blacked out. I am pretty sure it is caused by a weak contact because, from observation alone the phone:
1. the connection restored itself when the phone became hot
2. after cooling down for a while, the display started to fail again
3. by artificially applying a CPU load, the phone heated and the display connector fell back into place connecting the bottom of the display and giving image.
4. the display NEVER flickered when it felt hot to my hands
Also, you mentioned that you exposed your phone to low temperatures so that might cause a cumulative effect over time on a loose display connector, I think they come kinda loose out of the factory or get loose after a few years. I myself opened up my phone with a hair dryer and a few guitar picks, it was hard and I ruined a student credential card but eventually I cooled down the phone so when I remade the contact it would keep itself together on a cold device. Then I detached the connector and voila, it came back to life and confirmed my observation. As a result, I got a free S7 for myself to upgrade from my Huawei P9. The display hasn't failed for me since.
On a second thought, you should probably avoid the electrical tape and use something that will resist higher temperatures without melting or being conductive. Or just avoid a shim altogether. I didn;t need one.
You are pretty much returning the connection to its factory state, keeping it nice and tight for more time.
Good luck, it was quite the learning experience for me and my first time repairing a phone.
I've got some adhesive foam pads that are meant for use inside electronics, I will cut one of those down to size and give it a try if my screen problems reappear.
After a couple of weeks, the screen problems started to reappear. They've never got as bad as they were before, i haven't had to turn the screen brightness down to the point where it was hard to see, but it has been getting worse and more regular. I've just taken the back off my phone, disconnected the screen connector and pushed it firmly back into place, and added a layer of adhesive foam to it. all seems fine at the moment, i will see how it gets on out on a walk in daylight with auto-brightness on.

Very strange! Digitizer problem

So I dropped my pixel 3a XL and did the repair myself. Phone worked perfect before drop, after repair the phone still works fine except a couple things. One the blue colors appear a little washed out on the new screen, and 2nd if I press the power button on the phone, or the phone locks itself due to inactivity then the digitizer will not work after the phone is locked and requires a hard reset. After the reset the phone will work fine for hours and hours as long as you don't press the power button or lock the phone. Immediately upon awakening it the touch screen will not work anymore and require a reboot. This is a little annoying... I tested to make sure everything was working before I glued the phone shut and now will have to reopen to check connections.... I can't find another instance of this glitch anywhere. Anyone else come across it?
Try reformatting the phone
If that fails reflash the software.
Blown mobo perhaps.
Any impact that can damage the frame or display significantly can transmit enough G loading to destroy chipsets internally and/or fractured solder joints. The multilayered mobo's internal traces can also be damaged.
Firmware corruption indicates damage to the flash memory.
Add to all that the fact that out of circuit subassemblies ie the mobo are ESD sensitive. Full ESD protocols and safeguards should always be observed when working on them. It takes very little voltage to breach the couple microns thick insulation of semiconductor junctions. The internal connections going to display do not have (or very minimal at best) voltage snubbing circuits like the external ones ie C port do. You can't sense a few hundred volts of static electricity but it's more than enough to destroy the unprotected mobo's I/O's.
I don't have high hope's for this mobo...
blackhawk said:
Blown mobo perhaps.
Any impact that can damage the frame or display significantly can transmit enough G loading to destroy chipsets internally and/or fractured solder joints. The multilayered mobo's internal traces can also be damaged.
Firmware corruption indicates damage to the flash memory.
Add to all that the fact that out of circuit subassemblies ie the mobo are ESD sensitive. Full ESD protocols and safeguards should always be observed when working on them. It takes very little voltage to breach the couple microns thick insulation of semiconductor junctions. The internal connections going to display do not have (or very minimal at best) voltage snubbing circuits like the external ones ie C port do. You can't sense a few hundred volts of static electricity but it's more than enough to destroy the unprotected mobo's I/O's.
I don't have high hope's for this mobo...
Click to expand...
Click to collapse
welcome blackhawk with the awesome technical business behind the issue.
nice
but he is right, components are sensitive to shock, unless you have some rugged phone or smth.
Depends on how high you dropped it and ground type etc etc
bookburner said:
So I dropped my pixel 3a XL and did the repair myself. Phone worked perfect before drop, after repair the phone still works fine except a couple things. One the blue colors appear a little washed out on the new screen, and 2nd if I press the power button on the phone, or the phone locks itself due to inactivity then the digitizer will not work after the phone is locked and requires a hard reset. After the reset the phone will work fine for hours and hours as long as you don't press the power button or lock the phone. Immediately upon awakening it the touch screen will not work anymore and require a reboot. This is a little annoying... I tested to make sure everything was working before I glued the phone shut and now will have to reopen to check connections.... I can't find another instance of this glitch anywhere. Anyone else come across it?
Click to expand...
Click to collapse
Maybe its cause you repaired it yourself.

Categories

Resources