Over the last several years, I’ve wondered what the Intel era was about to be over. In the Apple Unleashed launch event this past Monday, I were able to get an idea of how this might play out.
It’s not my intention to inform people that the next generation of personal computers is the Apple Silicon Mac. What you’re likely to see emerge in the coming years is a system design that resembles Macs regardless of the operating system you decide to use. It doesn’t matter if it’s based on Apple’s own design or chips created from Qualcomm, NVIDIA, Samsung, Microsoft, or even Google.
Make sure you get on the bus
The most popular system architecture over the last four decades has been the x86 instruction set. It’s not just the instruction set that has been the dominant. It’s also the other components found in computers and x86 servers, which were dominant including the different buses and support chips.
PCs, huge iron servers and in fact all computers from the 1960s were designed with what’s known by the term Bus architectures. Buses are similar to the nervous system. It’s utilized to transfer data between different parts in a system for computers which include the CPU GPU, system cache and processors that are specialized for Machine Learning and other functions. The bus also transfers data from and to the main memory (RAM) and video memory connected to the GPU as well as the entire set of I/O components, including keyboard mouse, keyboard Ethernet, WiFi, and/or Bluetooth.
If there is no bus in the way, information will not transfer.
The growth of SoCs
It is the computer sector that has come up with diverse bus designs like the various versions of PCI for the I/O components, and a variety of other bus types designed specifically in the field of video graphics. They are the primary components of the design of microcomputers, regardless of which company manufactures it. There are server and desktop versions of the same, and we have laptop and mobile versions.
As we entered our mobile and embedded computing However, we needed to add more system components on the chip which is why we now refer to them as Systems on a Chip or SoCs. The GPUs, CPUs, the ML Cores, primary memory video memory as well as primary storage now be housed on one chip.
There are a few benefits to using this way: reduction in size, as well as the elimination of bottlenecks. If you require buses to move information from one component of the system to another and vice versa, you have to switch technology used to interface. In many instances there may be fewer data lanes that can move the data, which is similar to taking an off-ramp to the expressway which has 8 lanes and two, before you can move on another expressway in an opposite direction. If this happens by the chip the bottleneck does not (have be required to) exist.
There are limits in what you can accomplish with SoCs. There’s only so many processor and GPU cores that you can place in them and there’s only a certain amount of memory you can put on one device. So While SoCs can work well on smaller computers but they’re not suitable to power high-end PC or server tasks; they aren’t able to expand to the biggest systems. If you want to scale desktops, or to create an office computer that has hundreds of gigabytes or Terabytes of RAM, which you see in films or aerospace industry and other industries, you must be able to achieve this. This is the same in the data center or hyperscale cloud designed for large-scale enterprise applications. The less data is copied over constrained, slow bus interfaces or across networks more efficient.
Enter M1 or UMA
With the latest Apple M1 Pro and M1 Max SoCs they have a lot more transistors are located on the chip, which typically results in greater speed. However, what’s really interesting about these chips is the speed of the memory bus.
In the past year I’ve been thinking about what Apple would scale this technology. Particularly what they would do to address the problem of increasing bus speeds between the main memory (RAM) and the GPU. For desktop computers that use Intel architecture it is carried out in a non-uniform manner also known as NUMA.
Intel systems usually utilize discrete memory, for instance DDR on GPUs. Then they utilize the high-speed memory bus interconnect between the CPU and it. This CPU connects through an additional interconnect to the main memory. The M-series as well as the A-series utilize Unified Memory Architecture; in terms of pooling RAM between the GPU, CPU as well as Machine Learning cores (Neural Engine). In essence, they share all of the resources.
And they share it quick. For the M1 Pro, that is 200 Gigabytes per Second (GB/s) as well in the M1 Max, that is 400GB/s. Also, the super-fast GPU cores that are as high as 32 in the Max and have a an extremely high bandwidth for communication to these extremely fast CPU cores of which there’s as high as 10 in both M1 Pro and M1 Max. This isn’t even talking about the special Machine Learning cores that also make use of this speedy bus.
Of course M1 Pro and Max M1 Pro and M1 Max can be loaded with more memory than previous models, too with that’s up to 64GB. It’s true that it was clear that the M1 was not slower on a per-core basis as compared to the rest of the market. However, should Apple intends to make the most professional-grade workloads, they need to achieve this in order to make the bandwidth of the bus fly.
Sizing Arm From M1 Mac to Datacenters and Big Iron
Here is where things become exciting.
The thing is, I really would like to know what they accomplish with both Mac Mini and the Mac Mini and the Mac Pro. I’m hoping for the upgraded Mini will use the same mainboard layout as the new Macbook Pro systems. However, the Pro is expected to be a beast system, possibly allowing multiple M1 Max on it. It means that they’ll have required to come up with a way for memory pooling across different SoCs that we haven’t seen in the market yet because of the bus latency issue.
One possible option is to transform the Pro into the Desktop cluster comprising multiple Mini daughterboards connected to any kind of super-fast network. The industry considered desktop clusters to support scientific tasks in the past, around 10-15 years ago, making use of Arm chipsets, Linux, and open-source cluster management software that was used on Supercomputers (Beowulf). However, they didn’t gain traction because it was not practical to design these kinds of desktop applications to function as a parallel function on an IP/TCP/IP network.
It might be feasible using the bus connectivity technology found by the Apple M1 SoCs. Since if you connect a CPU die as well as GPU die to the same mainboard and then share that memory with very fast speeds, it should be possible to connect more GPU and CPU matrices on multiple mainboards and daughterboards. Perhaps, then, the desktop cluster could be what’s coming for Mac Pro. Mac Pro.
It’s all very fascinating, and we’ll be seeing the Mac adopt all of these new technologies first. However, other tech giants are working on more efficient Arm-based SoCs as well. While you might not use an Mac in the coming years for your business or other applications, your PC could very well look similar to one from the inside.