The March surprise Apple bought in their March “Peek Performance” event is the introduction of M1 Ultra and the Mac Studio that goes along with it. It is a surprise because one would expect such a chip and machine to be presented during the developer’s conference in June and many were expecting a new M2 SOC to be presented. It is, however, a pleasant surprise since Apple is releasing their most powerful SOC to date. But the question still remains, how does it work and it is as good as the competition?

Executive Summary:

  • M1 Ultra is Apple latest SOC released in March 2022 and make it debut on the brand new Mac Studio
  • On basic terms, the M1 Ultra is two M1 Maxes attached together in the middle, so you have double of everything: 20 computer cores, 2 Neural Engines, 64 graphic cores, 4 Media Engine and 800 MB/s Memory Bandwidth
  • More impressive technical details: There is a chip-to-chip bridge that is micro stitched together called UltraFusion with chip-to-chip bandwidth of 2.5 TB/s
  • Based on patent findings, Apple is considering joining more chips in a 3D stacked style and using such micro stitching which is expected to debut in the updated Mac Pro.

Specs


M1 Ultra layout. Like all M1s in the series, the memory lays right next to the SOC.

On the surface level, the M1 Ultra is two M1 Max co-joined together like siamese twins through a chip-to-chip connector that they called UltraFusion. The connector itself is quite impressive: 2.5TB/s chip to chip bandwidth that is so far that those two M1 Maxes can be logically seen as a single large M1 Ultra. But, the impressive technical prowess goes even further than that when you consider their approach based on patent findings which we will discuss later.

Having two or more physical CPUs or Symmetric Multiprocessing (SMP) is not something new. However, there are many pitfalls for having dual physical processors, hence why the industry is generally moving to multicore systems. Apple could put dual M1 Max soldered on a single board and call it a day, but there’s more performance benefit of using HyperFusion than having dual M1 Max in their own container.


UltraFusion is the bridge the connects both M1 Maxes

Since the M1 Ultra is essentially two M1 Max stitched together, it has double of everything in M1 Max: 20 compute cores (16 are performance and 4 are efficiency), 2 Neural Engines with total of 32 Neural cores, a whopping 64 core graphic cores and 4 Media Engines which allows you to stream sixteen, sixteen 8K ProRes 422 video streams in Final Cut Pro. What is unique about the layout is how the graphic cores are layed out next to each other, so data regarding graphic processing are bunched together instead of needing to travel from one end of the chip to the other.

The other impressive feat is the memory bandwidth which stands around 800MB/s. Based on having 6.4 GB/s per 8-bit channel, the M1 Ultra should have around 16 channels to achieve the magic 800 GB/s bandwidth, which is impressive. Intel high end chips like the i9-12900K have only dual channels and manage a memory bandwidth of 76.8GB/s while the higher end Xeon W-3375 has 8 memory channels on DDR4 which should translate around 200GB/s memory bandwidth. With around 128GB RAM available, the M1 Ultra can access the entire memory banks 6 times per second.


M1 Ultra features as described by Apple

The bandwidth feats on the M1 does not stop at memory, there’s the question of I/O bandwidth, which afterall, you need to get and present the data through linkage from outside of the box. In the M1 Ultra Mac Studio, there are 6 Thunderbolt 4 ports and each port has its own controller. This means that each Thunderbolt 4 port in the Mac Studio has the full 40GB/s bandwidth dedicated to each port. So just for ThunderBolt ports, there is a total of 240 GB/s bandwidth. There’s also the 10Gb ethernet, HDMI 2.0 port and the legacy USB-A ports, which should give around USB 3.1 Gen 2 speed. This is the reason why the Mac Studio can support 4 6K monitors like the Pro Display XDR plus a single 4K TV.

Despite handling a lot of bandwidth and processing power, the M1 Ultra is one cool SOC. Based on Apple’s presentation, the CPU consumes around 60 watts while the GPU consumes around 100 to 120 watts. Each Mac Studio comes with a 370 watt power supply, so there is a lot of headroom to power the M1 Ultra plus other peripherals that might be attached to it.


The M1 family of SOC as of March 2022

Performance

We compared the M1 Ultra against Intel i9-12900K here.

Now we get the details about the M1 Ultra, now how does it compare to Intel and Nvidia latest and greatest? For the Intel, we compare the M1 Ultra with Intel i9-12900K. For Nvidia, we compare the M1 Ultra with Nvidia GTX 3090. Both processors are mentioned and compared in Apple presentations. However, there are no concrete numbers presented so we have to investigate further.


Early benchmark shows M1 Ultra smacks Intel i9-12900K bottom in multicore performance

But it is a different story in CineBench.

For the CPU side, the Intel i9-12900K performs better in single core tasks while the M1 Ultra shines through multithreaded tasks thanks to having more cores than Intel. Despite Intel’s advances in their 12th generation Core processors, which manage to beat M1 Max chips, Apple comes back with a vengeance with their M1 Ultra processor which manage to beat Intel’s processor on multicore task. Single threaded task like viewing Blender render and opening large Adobe files, the Intel still have the edge due to the single core performance.


According to Apple, the M1 Ultra is more efficient and powerful at a given power usage than the Nvidia RTX 3090 ...

... but forgot to mention that the RTX 3090 would happily eat 400-600 watts to get the performance that it needs

On the GPU side, there is no contest, the Nvidia RTX 3090 is still the GPU to beat and the M1 Ultra is no exception. But how would one explains Apple’s graph in the “Peek Performance” event? In the graph, Apple shows that the M1 Ultra is more efficient at a given power usage. This is true. However, the graph shows that the Nvidia RTX 3090 power usage goes through 300 watts only. The thing about the RTX 3090, it would happily eat electricity for breakfast. In a typical gaming rig, the RTX 3090 can easily take on 500 watts. Some manage to punch 600 watts by overclocking and liquid cooling. Rumors are flying around that the RTX 40 series, the one that will replace the RTX 3090, sucks around 800 watts. So, the Nvidia approach is throwing more wattage to the problem. That’s great if you just want a huge gaming rig and do not mind the noise and heat it generates but Apple’s approach is getting things done as efficiently as possible. There’s a reason for this as we explore what’s next for the M1 Ultra.

So in conclusion, if your workflow fits in the Apple ecosystem, then the M1 Ultra is the most powerful chip from Apple to date. And based on the performance and feature set that Apple gives by going their own way, expect much better things ahead from Apple Silicon Division in the near future.


Optimized version of Adobe Premier shows they have take advantage of the media engine in the M1 professional chips. Data by Dave2D

But of course, the M1 Ultra has the most powerful integrated GPU to date, while PC builders prefer to rely on discrete solutions from Nvidia.

Other benchmarks shows mixed results for the M1 Ultra, showing the importance of optimization. Data provided by Dave2D

Blender has been optimized for the Apple Silicon, but benchmark shows that there is a lot of room for improvement. Data by Dave2D

What’s next


Die Stitching aka HyperFusion. Changed the game by improving yields for ultra large dies.

Now we reviewed the M1 Ultra chip, what’s next for Apple? The prevailing rumor that came out since the introduction of the original M1 chip, Apple plans to have a family of SOCs which the M1 would be for consumers, and another for professionals in different configurations. The project codename is rumored to be Jade and there are several Jade configurations that come out from that project. In that rumor, there will be 4 configuration code names Jade-Chop, Jade-C, Jade-2C and Jade-4C. Jade-Chop is now the M1 Pro, the Jade-C is the M1 MAx and the Jade-2C is the M1 Ultra. With the high degree of accuracy on the rumors, there is a high chance that the Jade-4C will be coming out for the last Mac that is yet to be upgraded to Apple Silicon: Mac Pro.


M1 line up predictions in 2021. Jade-2C is now M1 Ultra and Jade-4C should be the M1 Extreme

This hypothetical Jade-4C SOC will have 4 M1 Max in a single die. There is no word on how this will be, but based on a patent filing, it will offer a clue (bold for emphasis):-

A multi-chip module (MCM) is generally an electronic assembly in which multiple dies are integrated on a substrate. Various implementations of MCMs include 2D, 2.5D and 3D packaging. Generally, 2D packaging modules include multiple dies arranged side-by-side on a package substrate. In 2.5D packaging technologies multiple dies are bonded to an interposer with microbumps. The interposer in turn is then bonded to a package substrate. The interposer may include routing to interconnect the adjacent dies. Thus, the dies in 2.5D packaging can be directly connected to the interposer, and are connected with each other through routing within the interposer. Generally, 3D packaging modules include multiple dies stacked vertically on top of each other. Thus, the dies in 3D packaging can be directly connected to each other, with the bottom die directly connected to a package substrate. The top die in a 3D package can be connected to the package substrate using a variety of configurations, including wire bonds, and through-silicon vias (TSVs) though the bottom die.

Based on this patent filing, one can infer that Apple is strongly considering or going ahead with putting M1 Max on top of each other and stitching them together using UltraFusion. Stacking dies on top of each other is not something new, but usually reserved for storage or memory chips where the heat load is lower. But to do that on high performance chips like M1 Max where things can get hot, one needs to be very creative in packaging such chips. This shows why Apple is focused on TDP for each chip even on the M1 Pro and M1 Max when they can go the easy route like Intel and Nvidia by bumping up the clock speed and voltage which in turn induce a lot of heat.


WoW: Wafer on Wafer ; CoW: Chip on Wafer; KGD: Know Good Die; Apple approach to chip manufacturing theoretically could improve their yield, allowing them to create huge dies that is previously prohibitively expensive.

A key point that the patent also makes is the method of combining multiple chips into a single module via a way of stitching things together. One would have thought that Apple would print out the entire M1 Ultra in one go and bin the M1 Ultra into M1 Max and M1 Pro, but this takes it a step further. This suggests that all Apple Silicon professional chips started life as M1 Max and then stitched up to become M1 Ultra or M1 Extreme or binned up at M1 Pro. This greatly improved manufacturing yield as the M1 Ultra die size at 119mm squared is very big in the world of chip manufacturing.

So, based on these two pieces of information, there is some conjecture that the M1 Extreme that will be on the Mac Pro will have 4 M1 Max dies stacked on top of each other on a single plane. It would be interesting to see if Apple decided to use the current Mac Pro chassis or will be using a brand new one based on this design. The end result will be an Intel Xeon and ADM Eypc beating chip that boasts around 40 compute cores, 64 Neural cores and a whopping 128 graphic cores that will consume around 120 watts for the compute cores and another 200-250 watts for the graphic cores. 370W sounds like a lot, but Apple managed to make the 205 watt TDP Intel Xeon W-3275M and 4 Radeon GPUs whisper quiet on the Mac Pro, so keeping things cool should not be a problem.

There is also some contrary evidence that shows that the final M1 series silicon that slated for the Mac Pro will be laid out in a grid fashion. In other words, two M1 Ultra chips will be stitched together side-by-side to form the M1 Extreme and connected by a HyperFusion type connection. This makes things easier to design as a 3D stacked design, you will need to get creative to manage the heat generated from the high performance chip. Of course, at this point, one must take the leaked design with a grain of salt as it could be true or somebody with an active imagination and too much time on his hands.

Conclusion

The M1 Ultra so far is the best of what Apple can do when given a free hand in designing their own chip to power their devices, and for all intent and purpose, Apple is just getting warmed up. Granted, for a lot of people, many would not touch the limitations of what the M1 Ultra has to offer, but it is good that Apple can show Intel, AMD and Nvidia that they have a new competitor in the block. After all, healthy competition brings the best out of everybody.

While we acknowledge that the M1 Ultra is not for everybody, at the same time, we are excited with the prospect of what Apple put in the Mac Pro, but of course, you would need to wait for WWDC for that to happen.

Plug

Support this free website by visiting my Amazon affiliate links. Any purchase you make will give me a cut without any extra cost to you

Apple M1 vs Intel i9-12900K: Comparison between M1 Family (base to Ultra) against Intel's best

The M1 Ultra has landed in March 2022 and is Apple's latest attempt to take back the performance crown from Intel and Nvidia. We take stock of how Apple Silicon performs against Intel’s lineup.

Resources