During Intel’s COMPUTEX 2024 keynote presentation, there was a noticeable absence of any mention of the upcoming “Arrow Lake” desktop processors. Instead, the focus was entirely on Artificial Intelligence, Xeon server processors, and a comprehensive overview of the mobile-oriented Lunar Lake CPUs.
Intel revealed the technical specifications of its upcoming thin laptop CPU, code-named “Lunar Lake.” This CPU not only boasts a new integrated graphics architecture dubbed Xe2, but primarily the focus of Lunar Lake is to enhance power efficiency.
Building upon the previous Meteor Lak “Core Ultra” series, which actually offered a performance downgrade versus Intel’s previous generation, Lunar Lake seems to prioritize high-performance AI processing with a newly integrated NPU capable of facilitating Microsoft’s dastardly desires.
Lunar Lake features a sizable “Compute tile” integrating CPU cores, GPU, NPU, media engine, and display engine, alongside a “Platform Controller tile” consolidating interfaces like PCIe, USB, and storage.
These components are interconnected on the same package using Embedded Multi-die Interconnect Bridge (EMIB) and Foveros die-stacking technology.
Previously, there were discussions about Intel outsourcing Lunar Lake’s production to TSMC, its main competitor in the semiconductor sector. Now, it’s confirmed that Intel will indeed utilize TSMC processes for Lunar Lake. The main compute tile will leverage TSMC’s N3B processing node, while the platform controller tile will use TSMC’s N6 process.
Intel’s own fabrication is reserved exclusively for its Foveros packaging. For comparison, the current generation Meteor Lake CPUs, designed for mobile use, utilize a similar packaging, with the CPU core tile produced on Intel’s 4 node and the SoC and I/O tile using TSMC’s N6 process.
This development with Lunar Lake signifies a new chapter for Intel. Despite recently defrauding the United States government to acquire subsidies for domestic production, Intel has opted to source its CPUs from rival companies.
This move aims to enhance performance and efficiency, two areas where Intel has been trailing behind AMD’s “Zen” processors.
In contrast to Meteor Lake’s configuration, which spread these functions across four distinct dies, Lunar Lake presents a more streamlined design with fewer individualized tiles. This simplification in production complexity is noteworthy, although their tile-based packaging still diverges significantly from the traditional “chiplet” designs of rival companies. Nonetheless, Intel shouldn’t encounter significant hurdles in terms of mass production with Lunar Lake.
The primary innovations lie in Lunar Lake’s brand-new CPU core architecture, comprising four high-performance “P-cores” and four power-efficient “E-cores.” Both core microarchitectures have been upgraded from Meteor Lake, with the P-cores adopting the new “Lion Cove” architecture and the E-cores utilizing the “Skymont” architecture.
However, when compared to competitors who have succeeded in developing a compact core design based on the same Instruction Set Architecture (ISA), there remains a substantial disparity in actual performance capabilities between Intel’s Performance and Efficiency cores.
In contrast to Meteor Lake’s inclusion of two additional “LP E-cores” (low-power E-cores), Lunar Lake will not feature these LP E-cores, Intel clarifies that Lunar Lake’s design prioritizes power efficiency, enabling the E-cores to fulfill the roles initially designated for the LP E-cores, hence their exclusion. This simplification streamlines the architecture and reduces die area.
The Lion Cove P-cores in Lunar Lake operate at lower clock speeds than Meteor Lake’s “Redwood Cove,” yet they boast more execution units, resulting in an average 14% increase in IPC (Instructions Per Clock).
However, these performance figures, unlike those presented by AMD with their Zen 5 demonstration, are solely based on Intel’s own estimations and projections, as indicated on the slide under the “power” figure.
Intel has a history of making bold claims like this, without solid evidence to back them up. Take, for instance, their assertion of increased IPC across multiple generations of Skylake processors for mobile devices.
Such claim conveniently ignored the impact of software mitigations against critical security vulnerabilities such as Spectre and Meltdown. By choosing not to apply these patches, Intel may have avoided decreased I/O performance, but it raises questions about the accuracy of their performance claims with Lunar Lake’s IPC increase for example.
During their hour-long presentation, Intel conspicuously avoided any mention of their forthcoming generation of desktop CPUs, known as Arrow Lake. Leaked Intel slides had previously suggested that these next-gen desktop processors, which also boast Lion Cove performance cores, would provide a lackluster performance increase and would be built on Intel’s own 20A process.
These slides speculated only a marginal increase in single-core performance figures, with the primary enhancement attributed to the inclusion of Skymont-based Efficiency cores, which bolster multi-threaded capabilities over Raptor Lake.
It’s worth noting that Lunar Lake diverges from Intel’s long-standing tradition by not supporting their “Hyper-Threading” technology. Hyper-Threading enables a single CPU core to handle two threads simultaneously. Although it can potentially boost throughput by up to 30%, it also escalates circuit complexity and power consumption, attributable to thread and process management, as well as additional security measures.
Consequently, Intel opted to omit Hyper-Threading in Lunar Lake. The P-cores in Lunar Lake are engineered to prioritize power efficiency, resulting in a 5% enhancement in performance per watt and a 15% improvement in power efficiency per area when compared to configurations with Hyper-Threading enabled.
However, performance per area is higher with Hyper-Threading enabled. Therefore, in products like Xeon, which require handling many threads, Lion Cove might still be used with Hyper-Threading, however this is unlikely as even Arrow Lake desktop processors, effectively based on the same core architectures will not be coming with Hyper-Threading either.
An exciting feature of Lion Cove is its AI-driven power management system. It analyzes the workload and CPU temperature to finely regulate the power delivery to the CPU. Unlike earlier Intel CPUs, which adjusted clock speeds in increments of 100MHz, Lion Cove provides finer control at 16.67MHz increments. This allows the CPU to approach its maximum clock speeds more precisely.
This advancement could significantly benefit Intel, as maximizing frequencies will likely play a crucial role in their performance in benchmark reviews.
The Skymont E-cores have received upgrades, including a boost in the number of execution units. Compared to the LP E-core in Meteor Lake, single-thread floating-point performance has surged by up-to 38%, and multi-thread integer performance has supposedly risen by up-to 68%.
Furthermore, it’s worth noting that Intel’s performance evaluations are purely hypothetical. Therefore, it’s prudent not to rely solely on any corporation’s performance figures presented at these conventions until the product is actually available for purchase and in the hands of consumers.
Interestingly, Intel claims that under low-power operation, the Skymont E-core can match the performance levels of the Raptor Cove performance cores featured in their 13th generation Core processors for desktops.
However, if you look at the bottom left-hand corner of these slides, you’ll notice that familiar disclaimer once more: these results are solely based on Intel’s internal projections and estimations, with a margin of error of 10%.
This revelation significantly dampens any excitement one might have for these performance figures, to be candid. It appears Intel is conjuring them out of thin air, reminiscent of the time they attempted to pass off VLC video footage as real-time gameplay to showcase their integrated graphics.
Another way for Lunar Lake to boost efficiency is via the “Thread Director,” which informs the OS of the CPU core status and allocates tasks, has been improved. In Meteor Lake, tasks were first allocated to the LP E-cores, with any remaining workload being shifted to the E-cores and P-cores.
However, with only two LP E-cores offering relatively low performance, the bulk of the workload fell on the E-cores and P-cores. Intel has enhanced the precision and effectiveness of its algorithms, providing them with finer granularity.
This allows for better allocation of the appropriate CPU core for a given task, addressing a longstanding concern among gamers.
Since the introduction of their hybrid-based processor designs, gamers have faced issues where during critical gameplay moments, their processors’ efficient cores were scheduled, leading to reduced performance and potential stuttering problems.
Intel attempted to address this issue with the Application Optimizer (APO), which primarily increased the frequency of the E-cores during gameplay for their 14th generation processors.
While the Application Optimizer (APO) did enhance performance to some degree, it failed to completely address the core issue. Furthermore, this solution created an additional problem for owners of 13th generation processors, who were unable to benefit from the APO despite having identical hardware.
In Meteor Lake, the frequent utilization of P-cores presented difficulties in minimizing power consumption. Despite the efficiency improvements brought by the Intel 4 process, the actual core performance of Meteor Lake CPUs was observed to be inferior to that of previous generation hardware. This performance gap allowed Ryzen rivals to maintain an edge over Meteor Lake in terms of efficiency.
Lunar Lake, on the other hand, will prioritize E-cores and utilize P-cores for overflow tasks. The E-core cluster in Lunar Lake is notably sizable and potent, enabling it to handle the majority of applications solely with E-cores.
For instance, with Microsoft’s video conferencing application “Teams,” Lunar Lake can efficiently run the application predominantly on E-cores, thereby minimizing the use of power-intensive P-cores. Consequently, Lunar Lake achieves a notable reduction in power consumption by 35% compared to Meteor Lake.
Lunar Lake is set to incorporate Intel’s latest “Xe2” microarchitecture for its integrated graphics solution. Xe2 represents Intel’s next-generation GPU architecture, focusing on enhancing efficiency, performance, and compatibility.
Notably, addressing performance and compatibility has been critical, as previous solutions like ARC Alchemist struggled in these areas, offering limited functionality beyond serving as the top AV1 encoder among all three graphics card vendors.
To enhance performance, efficiency, and compatibility, the Xe2 architecture introduces a revamped vector engine capable of delivering a 50% improvement in gaming performance compared to the Xe-LPG integrated graphics found in current Meteor Lake processors.
The architecture comprises eight “Vector Engines” dedicated to vector calculations and eight “XMX” matrix math engines specialized for AI processing. These components, alongside load/store units and an expanded 192 KB of L1 cache per SLM, constitute the “Xe Core.”
The Vector Engine supports SIMD16 instructions, enhancing computational efficiency.
The XMX Engine stands out as a pivotal element of Xe2. With a computational performance reaching 67 TOPS, this engine serves as an AI processing unit previously featured in Intel’s standalone GPU series, “Intel Arc,” but omitted in the integrated GPU of Meteor Lake. However, Lunar Lake marks its resurgence.
Intel points to the resurgence of the XMX Engine as a response to the surging market demand for AI-powered applications. This demand, resembling a burgeoning bubble that has only just begun to inflate, is underscored by mandates such as Microsoft’s requirement for vendors to amass 40 TOPs of AI power to meet Copilot specifications.
One might question the necessity of both a GPU and an NPU for AI processing in Lunar Lake. Intel addresses this by explaining that the requirements vary depending on the application. NPUs are suited for scenarios where power consumption must be minimized, such as noise reduction in audio and video processing. Conversely, GPU AI accelerators excel in tasks demanding substantial processing power, such as object detection in videos and image generation.
When the AI capabilities of the GPU, NPU, and CPU are combined, the platform achieves a performance level of 120 TOPS. However, workloads that effectively leverage all three types of cores, each with distinct characteristics, are not immediately apparent.
Xe2 comprises four enhanced Xe2 units and four “RTUs” (Ray Tracing Units). The “Render Slice” integrates the Xe Core and RTU alongside components like samplers, geometry engines, and rasterizers. Lunar Lake’s integrated GPU boasts two of these Render Slices.
Intel reports that the performance of Lunar Lake’s integrated GPU has seen a maximum improvement of 1.5 times compared to that of Meteor Lake. Furthermore, Lunar Lake consolidates the applications previously catered to by two processors, namely the high-performance version Meteor Lake-H and the low-power version Meteor Lake-U, into a single processor.
In Lunar Lake’s Compute tile, notable features include enhancements to both the media engine and display engine. The media engine introduces support for the next-generation media codec, “H.266/VVC,” through the integration of a new decoder.
H.266/VVC is touted for its ability to reduce file sizes compared to the emerging AV1 codec. Moreover, it introduces features like “Adaptive Resolution Steam,” which seamlessly adjusts video resolution based on available bandwidth, and facilitates the decoding of 360-degree panoramic videos.
In the Display Engine, a significant aspect is its compatibility with “Embedded DisplayPort 1.5” (eDP 1.5) as the connection interface for display panels. eDP 1.5 incorporates power-saving functionalities such as “Panel Self Refresh” to lower display panel power consumption and extends the display synchronization technology “Adaptive-Sync.”
While the support for cutting-edge codecs is commendable, what truly catches my attention about Lunar Lake is the substantial shift in system memory architecture. Much like Apple’s proprietary SoC “Apple M” series, Lunar Lake integrates main memory directly onto the SoC package.
This implementation holds the promise of reducing PCB footprint, slashing power consumption, and ramping up performance. However, the limited lineup of main memory capacities, restricted to only 16GB and 32GB, raises concerns.
However, when it comes to thin notebook PCs, the demand for memory capacities exceeding 32GB might not be significant. Yet, this consideration is beside the point. Lunar Lake is specifically tailored for mobile applications, distinguishing it from other designs.
While the embedded memory, with its varying speeds and capacities, will undoubtedly play a crucial role in enhancing CPU performance, particularly benefiting Lunar Lake’s integrated graphics, which are highly sensitive to system memory frequency, it also means that users will be unable to upgrade or expand their system’s memory setup or capacity in the future.
Lunar Lake will incorporate LPDDR5X memory options of 16GB and 32GB, with speeds up to LPDDR5X-8533. This integration saves significant motherboard space, approximately 250 sq. mm, thanks to what Intel refers to as a “Memory side cache.”
This cache consists of a substantial 8MB SRAM pool strategically positioned amidst the Media Engine and P-cores. It is intricately connected to each section of the Compute tile via a high-bandwidth “Network-On-Chip Fabric” (NOC fabric). This configuration allows for direct, rapid, and energy-efficient data exchanges within the Memory side cache, bypassing the need to access the main memory.
This setup bears resemblance to the concept of “Embedded DRAM” (eDRAM) found in certain Broadwell and Skylake processors, albeit on a smaller scale and with LPDDR5X memory.
Intel’s Broadwell CPU, which debuted in 2014, introduced a groundbreaking feature known as Embedded DRAM (eDRAM). This eDRAM served as a level 4 (L4) cache, positioned between the CPU cores and the main system memory (RAM).
With a substantial capacity of 128 MB, the eDRAM cache operated at a higher frequency compared to conventional system memory solutions available at the time. While the primary aim of the vast L4 cache was to enhance integrated GPU (iGPU) performance, it also proved beneficial even when the integrated graphics were disabled. In such cases, the main CPU cores could leverage the faster data access and throughput offered by the eDRAM.
The integration of eDRAM into Intel’s Broadwell CPUs played a crucial role in addressing memory latency issues, resulting in enhanced system responsiveness and performance, particularly in tasks like gaming. However, this innovative design came with a significant cost, as reflected in the atypical chiplet die configuration.
Intel’s decision to abandon this approach in subsequent architectures stemmed from their dominant position in the CPU market at the time. However, this move ultimately proved to be a double-edged sword. While Intel pioneered groundbreaking innovations in the consumer market, their subsequent stagnation with multiple generations of the Skylake architecture allowed competitors like AMD to not only catch up in terms of performance metrics but also surpass them.
AMD’s success in gaming performance, particularly with its innovative 3D stacked “V-Cache” designs, illustrates the shifting dynamics in the CPU market and the importance of illusive pools of secondary cache that only managed to bare fruit as video games became far more demanding in terms of graphics memory.
In addition to spending an entire hour discussing Lunar Lake CPUs, products built using their competitor’s foundries and not expected to be commercially viable until 2025.
Intel also announced a new range of Xeon processors. Surprisingly, this comes as AMD recently hit a record high in server and desktop market share.
With AMD’s EPYC processors effectively dominating Intel by offering more CPU cores in a more efficient package with comparable IPC figures, AMD has achieved a record 23.6% market share in the server ecosystem. As a result, more institutions are transitioning from Xeon to EPYC, with an increasing number of corporations making the switch.
AMD’s primary challenge in capturing more market share has been the limited availability of TSMC wafers. Now, with Intel choosing to leverage TSMC foundries for their processors, one has to wonder if this move is strategically aimed at reallocating production capacity away from Ryzen.
AMD has been steadily gaining market share, not only in desktop segments but crucially in mobile as well. This shift by Intel could potentially impact AMD’s growth trajectory by constraining their access to the necessary manufacturing resources.
Nevertheless, during their COMPUTEX presentation, Intel have announced “Xeon 6” a new lineup is comprised of two families. The E-core-powered Sierra Forest models with up to 144 cores are launching today, but 288-core models will follow next year.
Intel’s efficiency-focused 288-core chips will compete with AMD’s upcoming density-focused 192-core EPYC Turin models when they hit the market later this year.
The P-core Granite Rapids models, featuring up to 86 P-cores, are set to launch in Q3 of 2024. Early next year, Intel plans to release Granite Rapids models with up to 128 cores.
These will compete with AMD’s performance-focused Turin models, which will also arrive later this year with up to 192 cores. Intel will further expand its Xeon 6 lineup in Q1 of 2025 with variants optimized for network and edge applications.
Intel’s Sierra Forest chips are the first to utilize the new Intel 3 process node, offering significant power and performance improvements. These chips are specifically designed for web and scale-out containerized microservices environments, networking, content delivery networks, cloud services, and AI workloads.
Sierra Forest aims to deliver exceptional performance consistency with a strong emphasis on power efficiency, thanks to the Intel 3 process node and Crestmont cores.
Compared to the previous 5th generation “Xeon Scalable Processor,” the 6th generation Xeon chips boast a more modular design. While the “I/O die,” responsible for input/output management, remains unchanged, the “Compute die,” which houses the CPU, can be flexibly configured according to the required number of CPU cores.
I much prefer this design over Intel’s previous approach of essentially gluing two massive monolithic dies together to create one mega monstrosity, which ultimately got demolished by Zen 4-based EPYC processors.
Despite using vastly inferior “efficient cores,” Intel positions the 6700E models against AMD’s P-core-powered EPYC processors. AMD’s density-focused Bergamo platform offers 12 memory channels, whereas the 6700E series supports eight memory channels to reduce costs.
Intel presented a series of benchmarks highlighting its performance advantages across various workloads, with many comparisons focusing on the generational performance improvements expected from upgrading to these newer models.
But given how AMD EPYC processors have been absolutely dominating performance metrics and the market, coupled with Intel’s speculative performance figures for Lunar Lake with a 10% variance, it’s hard to believe that Intel’s self-produced benchmarks hold any tangible merit, even with 144 Meteor Lake E-cores.
On the other hand, the Xeon 6 P-cores are targeted at big data analysis, design work, AI learning, and inference, similar to traditional Xeon processors. These CPUs use the same “Redwood Cove” core architecture found in Meteor Lake’s Core Ultra P-cores.
While detailed performance metrics were not disclosed, Intel claims that in AI inference processing, these new Xeon 6 processors will achieve up to 3.7 times the performance of AMD’s EPYC server CPUs. Additionally, they are said to offer about 2 to 2.3 times the performance of the 5th generation Xeon Scalable Processor in terms of AI performance and HPC.
And that concludes an hour of Intel CEO Pat Gelsinger’s keynote address. The entirety of the presentation was primarily focused on their next-generation mobility processors, Lunar Lake, inadvertently revealing the dire state of their own Intel Foundries.
While Intel did announce some new Xeon processors that are expected to garner attention upon release, they are likely to continue trailing behind AMD’s considerable performance lead in this segment.
With Zen 5-based EPYC processors set to debut later this year, featuring various configurations of Zen 5 and Zen 5C compact cores that crucially share the same ISA, Intel is faced with a formidable challenge.
Intel’s Xeon 6 lineup, which includes models with up to 144 E-cores derived from Meteor Lake processors, seems to prioritize core count over actual performance throughput, similar to their hybrid desktop CPUs.
Even with the substantial performance differences between Crestmont and Redwood Cove cores, it’s doubtful that Intel’s 128 performance cores will effectively compete against an AMD EPYC processor with up to 192 cores, even if they are a mix of Zen 5 and Zen 5C cores.
Nowhere in the presentation was there any mention of Arrow Lake, which shares the same architectural foundation as Lunar Lake. Arrow Lake is expected to release later this year to compete against AMD’s recently announced Ryzen 9000 series.
Previous leaked information suggests that there’s little enthusiasm for single-core performance improvements when transitioning from Raptor Lake to Arrow Lake, especially on the desktop. Instead of concrete details, Intel’s entire presentation seemed to rely on “trust me bro” performance metrics based entirely on hypothetical internal evaluations, this keynote presentation likely suggests that Intel does not hold very much confidence in their 15th generation desktop processors.