The focus should be about how to make chips produce less heat in the first place.
Better quality ISAs (not x86...),
Let's be clear, though. This is mainly about GPUs. Sure, CPUs are getting bigger and hotter, but I think not at the rate that they're adding cores. So, the per-core power consumption is going down.
For GPUs, the hardware is so expensive that the up-front purchase costs are
way more than the lifetime energy costs, at least when I ran the numbers for Hopper. Like 10x, even when you factor in cooling. So, the incentives are all pushing power consumption ever higher.
Maybe, when Nvidia & TSMC don't have such a lock on the market, and if HBM ceases to be a limiting factor, then it'll make sense to dial back frequencies a bit and just deploy more units. Unless/until that happens, these AI chips are going to be pushed ever further. IIRC, Nvidia is even talking about 600 kW per rack, in the second gen Rubin parts. So... buckle up.
compute units directly within the memory, etc
This is happening, too. I forget which, but one of the upcoming HBM standards will stack the DRAM dies directly on the compute dies.
Others, like Tenstorrent and Cerebras, have gone the path of integrating lots of SRAM in their compute dies and just scaling horizontally.