6-3-0 The Origin of Heat: Chemical Warfare at the Microscopic Interface and Invisible Killers

Goal: Not to "cool down" the heat, but to "remove" it.

This article focuses on the most easily overlooked, yet most failure-prone, "microscopic interface" in the Die → IHS path: TIM, pump-out effect, and PCM.

🎯 The Essence of Heat Dissipation: Not to "Cool Down" with Air, but to "Remove the Heat"

Before entering the battlefield, we must first clarify a commonly misunderstood concept. Most people assume heat dissipation means "increasing fan speed" or "enlarging the water radiator." This completely misses the point.

The operation of a cooling system is essentially a "heat relay race." The first leg is the chip core (Die), the second is the integrated heat spreader (IHS), and the third is the cold plate or heatsink. Between the chip (silicon) and the integrated heat spreader (copper), there exists a physical gap that is difficult to overcome.

Microscopic Reality: Surfaces are not flat; contact area is only 1% – 2%.

In the microscopic world, most "seemingly smooth" surfaces are actually separated by air.

If you examine a chip surface under an electron microscope, you'll find it's not flat at all. It's like the surface of the moon, full of craters, bumps, and rough textures. When a copper lid is pressed onto the chip, these two seemingly smooth surfaces actually make true contact over only 1% to 2% of their area in the microscopic world! The remaining 98% is entirely "air."

Air is a thermal insulator (thermal conductivity approx. 0.026 W/mK).

Therefore, the first principle of Thermal Interface Materials (TIMs) is to "eliminate air and fill air gaps."

Air is a thermal insulator (with a thermal conductivity of only about 0.026 W/mK). If these microscopic air gaps are left untreated, the heat generated by the chip will instantly accumulate, causing the core temperature to skyrocket past 150°C within seconds, leading to immediate burnout. To eliminate air, we must fill these gaps with a substance that can flow and seal, and this is TIM (Thermal Interface Material).

However, this is merely an elementary physics problem. In AI servers, we encounter a boss-level challenge known as the "Pump-out Effect."

📉 Invisible Killer: Pump-out Effect and Material Fatigue

The biggest difference between AI chips and traditional server CPUs lies in their extremely extreme "workload."

When AI is training large models, the GPU operates at full speed, causing core temperatures to instantly soar to 100°C. When training ends or models are switched, the temperature can drop back to 40°C just as quickly. This "extreme heat → cool down → extreme heat" cycle can occur dozens, even hundreds, of times daily in data centers.

Key Factor: CTE Mismatch → Horizontal Shearing → TIM Extrusion.

The more frequent the thermal cycles and the larger the temperature difference, the more the pump-out effect resembles a "time bomb."

At this point, the physical phenomenon known as "Coefficient of Thermal Expansion (CTE)" begins to wreak havoc:

Silicon chip (Silicon) has a very low CTE: approximately 2.6 ppm/K (expands minimally when heated).
Copper integrated heat spreader (Copper) has a high CTE: approximately 17 ppm/K (expands significantly when heated).

Imagine this: when the temperature soars to 100°C, the upper copper lid expands more than the chip beneath it, causing it to "stretch" outwards. When the temperature drops back to 40°C, the copper lid contracts, "shrinking" inwards. This creates a tiny but continuous "horizontal shearing motion" between the chip and the heatsink.

This shearing motion is similar to squeezing toothpaste every day. The layer of thermal paste (TIM) sandwiched in between is gradually "pumped out" (extruded) from the chip's edges with each "breathing motion."

This is the "pump-out effect." Initially, you might not notice anything unusual. However, after three to six months of high-intensity AI computation, the thermal paste in the center of the chip will become progressively thinner, or even develop voids (dry-out). Subsequently, you will observe the GPU's temperature rising abnormally and its operating frequency being forcibly reduced (throttling). Ultimately, the thermal interface will completely fail, leading to the chip overheating and burning out during a computational surge.

Given the demanding seven-to-ten-year lifespan requirements for AI servers, this invisible killer is absolutely unacceptable.

🧪 Reshaping KPIs: The Deception of Thermal Conductivity (W/mK)

In the past PC DIY market, when enthusiasts chose thermal paste, they primarily focused on the number displayed on the packaging: Thermal Conductivity. Was it 8 W/mK? Or 12 W/mK? A higher number was always better.

However, on the battlefield of AI data centers, thermal conductivity is no longer the sole KPI, nor is it even the most important KPI.

The true KPI for data centers: long-term stability and lifespan.

High thermal conductivity but short-lived = garbage. Moderate thermal conductivity but no displacement for five years = king.

A top-tier thermal paste with a thermal conductivity as high as 15 W/mK, if its chemical structure is unstable and it experiences severe "pump-out" or "dry-out" after just one month of continuous high-temperature operation, is considered garbage for a data center. Conversely, a material with a thermal conductivity of only 8 W/mK, but which can firmly adhere to the chip surface for five years without displacement or degradation under 100°C high temperatures and intense thermal cycling, is the true champion.

This is why, in the AI server supply chain, traditional "physical cooling manufacturers" (those making fans and copper pipes) can only produce external components; the real custodians of the core lifeline are the "chemical giants" who manipulate molecular structures.

🏗️ The Philosophy of the Interface: A Dialectic from Liquid to Solid

To solve the pump-out effect, engineers have tried countless methods:

Use stickier glue? If it's too sticky, it won't fill tiny voids, leading to higher thermal resistance.
Use metal soldering (Solder TIM)? While solder offers good thermal conductivity and doesn't flow, it's too hard. The rigid "hard-on-hard" contact under CTE thermal expansion stress can directly "crack the die" of the fragile silicon chip.

We need a substance that is soft when applied (easy for installation), liquid when operating at high temperatures (to fill gaps and conduct heat), but when subjected to the pulling forces of thermal expansion and contraction, it must remain solid and steadfast, resisting extrusion.

This sounds like magic, but in material science, it's called "Phase Change." It is this chemical formulation, balancing on the edge of solid and liquid states, that has become the critical foundation enabling NVIDIA and AMD to relentlessly push the boundaries of computational power.

🧙‍♂️ Molecular-Level Disguise: The Birth of Phase Change Material (PCM)

To solve the pump-out effect, engineers faced a dilemma:

For thermal conduction, the interface material must be "liquid" to flow into the micron-sized pits on the chip surface.
For longevity, the interface material must be "solid" to resist the shearing motion from thermal expansion and contraction, staying firmly in place without loss.

This seemingly insoluble contradiction was perfectly resolved by a special polymer called PCM (Phase Change Material).

The principle of PCM is fascinating, much like the liquid metal in the Terminator movies (though it's not electrically conductive). At room temperature (e.g., 25°C), it is a "solid film." This allows production line operators to easily and precisely apply it to the GPU chip like a sticker, without worrying about uneven application or overflow.

Phase change point (e.g., 45°C) is the "switch."

Above phase change point: Solid film → Viscous fluid, fills gaps, expels air, conducts heat
Below phase change point: Fluid → Re-solidifies, resists shearing, no pump-out

However, the moment the GPU begins computing and the core temperature breaks through 45°C (the phase change point), a miracle occurs. This solid film instantly "melts" and transforms into a viscous fluid. It permeates the tiniest gaps between the chip and the integrated heat spreader like water, thoroughly expelling air and establishing an ultimate thermal conduction pathway.

The most crucial magic happens when the GPU stops computing and its temperature cools down. When the temperature drops below 45°C, this fluid "re-solidifies" back into a solid! This means that no matter how the chip expands and contracts thermally, or how it shears, the PCM can always flexibly switch between solid and liquid states. It acts like a living buffer pad, absorbing all mechanical stress and never getting "pumped out."

🦅 The American Eagle's Moat: Honeywell and the Legendary PTM7950

In this field, one company is revered by all hardware engineers: the US industrial giant—Honeywell.

Within the AI server and high-end gaming laptop circles, one codename is held as gospel: "PTM7950." This is a high-performance phase change material exclusively developed by Honeywell. While its thermal conductivity (8.5 W/mK) may not be the highest on paper, it possesses a characteristic that makes all competitors despair: "ultra-long-life stability."

The counter-intuitive shock of "thermal impedance not rising but falling": PCM fits better with use.

This is also why it has become the default, trusted material for the H100 / B200 / MI300 generation.

According to internal tests by NVIDIA and major server manufacturers, ordinary thermal paste can experience a degradation of over 30% in thermal performance after 1000 hours of continuous high-temperature operation. However, after thousands of hours of extreme burn-in tests (Bake-in), the thermal impedance of PTM7950 actually "decreases instead of increases"! This is because, over time, the PCM becomes more densely packed, achieving a perfectly fitted state.

This is why NVIDIA's H100 and B200, and AMD's MI300, almost uniformly ship with Honeywell's products as their default TIM material. This is a deep "chemical formulation moat." Honeywell doesn't need to build wafer fabs; simply by selling this thin "magic sticker," it levies high tolls on global computing power.

🧪 Dow Chemical's Defense Line: Vertical Gap Filling with Thermal Gels

Beyond Honeywell, another chemical giant commanding the AI thermal management lifeline is Dow (Dow Chemical).

If Honeywell guards the most critical "bare die" surface, then Dow Chemical's strength lies in "Thermal Gel." On an AI motherboard, besides the GPU core, there are other heat-generating components like High Bandwidth Memory (HBM) and Voltage Regulator Modules (VRM). These components vary in height, creating inconsistent gaps with the heatsink.

Dow Chemical has developed the DOWSIL™ series of thermal gels, which offer excellent "vertical gap-filling capabilities" and "sag resistance." Even in vertically mounted server racks, these gels can cling to components like jelly, resisting years of gravity and high temperatures without dripping down like mucus and contaminating the circuit board. This is a crucial safety and regulatory guarantee for AI servers employing upright slot designs.

🇹🇼 Taiwan's Role: The Ammunition Depot's Logistics Crew

In this battle dominated by US chemical giants, what role do Taiwanese manufacturers play? We must honestly state: Taiwan does not hold formulation rights, but we possess strong logistics and equipment capabilities.

Our positioning in one sentence: Ammunition transporters + Launcher manufacturers.

The material moat lies in chemical formulations.

Taiwanese manufacturers' leverage lies in distribution channels, application/dispensing, and precision automation.

Wah Lee Industrial Corp. (3010): As Taiwan's largest high-tech material distributor, Wah Lee is the crucial bridge introducing these precious Honeywell and Dow materials to Quanta, Hon Hai, and TSMC.
Horn Teng (6693) / GPM (6640): These are the manufacturers we mentioned previously in the packaging equipment chapter. While the materials are invented by Americans, applying or dispensing these materials onto CoWoS chips with micron-level precision requires extremely sophisticated automated equipment. This is the battlefield for Taiwanese manufacturers.

Taiwanese manufacturers are the "ammunition transporters" and "launcher manufacturers" in this chemical war; although the profits may not be as rich as those of the original manufacturers, they represent an indispensable last mile in the supply chain.

🗺️ Strategic Conclusion: The Heat Finally Leaves the Chip

Using Honeywell's Phase Change Material (PCM), we have successfully transferred heat from the fragile bare die to the metal lid above it—the IHS (Integrated Heat Spreader). This battle was won through chemical formulation.

However, once the heat enters the IHS, the challenge has only just begun. This metal lid covering the chip must meet two extremely demanding conditions:

It must be impossibly flat: Because CoWoS packaging is extremely fragile, even slight warping or tolerance issues in the metal lid can directly crush the million-dollar HBM memory.
It must conduct heat incredibly fast: Facing heat loads of 1000W or even 2000W, ordinary copper lids are no longer sufficient to dissipate the heat quickly enough.

This requires a "precision cold forging" process that surpasses traditional stamping technology, and the ultimate future black technology, "micro-channels." In this extreme realm of metal processing, there is only one company in Taiwan that can satisfy both NVIDIA and AMD, and even TSMC seeks its cooperation.

← Previous Article 6-2-4 Gatekeeper of Safety Regulations: The EMI Nightmare at High Frequencies and the Front-line Ammunition Depot

Next Article → 6-3-1 The Chip's Metal Bodyguard: Kenmec (3653) and the Flatness Anxiety in the CoWoS Era

In-depth Research · Quantitative Perspective

Want more insights from quantitative semiconductor research?

【Insight Subscription Plan】Bid Farewell to Retail Investor Thinking: Build Your Alpha Trading System with 'Quantitative Capital Flows' and 'Consensus Data'

EDGE Semiconductor Research

📍 Series Map — Navigate the Complete EDGE Semiconductor Research →