A Thermal Model for Integrated Circuits (ICs) for Power Delivery Estimation for Realistic Power Map Including the Hot Spots Master's Degree Programme in Mechanical Engineering (Smart Systems) Master's thesis Author(s): Malik Quamrus Samawat 12.07.2024 Turku The originality of this thesis has been checked in accordance with the University of Turku quality assurance system using the Turnitin Originality Check service. Acknowledgement I would like to offer my gratitude to the company for allowing me to do my thesis with them. This opportunity has been enlightening and fun. It was also a networking opportunity. I sincerely thank my manager, supervisor, and colleagues for their support throughout my thesis work. I got introduced to the Finnish work culture during this thesis work and I appreciate and cherish every moment I spent there in the company. I would like to thank Professor Wallace Moreira Bessa and Dr. Andrey Mityakov from University of Turku for being my thesis supervisor and co-supervisor respectively. I express my gratitude towards them for guiding me during this thesis work and writing. Master's thesis Subject: Master's Degree Programme in Mechanical Engineering (Smart Systems) Author(s): Malik Quamrus Samawat Title: A Thermal Model for Integrated Circuits (ICs) for Power Delivery Estimation for Realistic Power Map Including the Hot Spots Supervisor(s): Professor Wallace Moreira Bessa and Dr. Andrey Mityakov Number of pages: 72 pages Date: 12.07.2024 Abstract Thermal behavior of integrated circuits (IC) is usually represented by IC thermal models. Such model is the simplification of the detailed IC package, and its main use case is in electronic design. IC thermal models help in prediction and analyzation of heat dissipation within an IC and its surrounding environment. Heat dissipation inside an IC drives the temperature within it which has a direct effect on its performance reliability and efficiency. Though some heat dissipation is expected, more than optimum amount will adversely affect the performance and power consumption of the designed IC. Hence, it is crucial to study the thermal characteristics of ICs for estimating their lifespan and efficiency. IC thermal models and related simulations are utilized for power mapping and packaging in the design process. In practice, thermal models are available at the later stages of IC development process. However, by then, it is already too late to modify the design and optimize it. But thermal models and simulation can be used to find out the potential thermal challenges during the conceptual and early design phases in System-on-Chip (SoC) development. This will allow the research teams to address the thermal issues before they become a critical problem. A realistic thermal model can give a holistic view of the thermal landscape which would help both SoC R&D (Research and Development) and BTS (Base Transceiver Station) unit PD (Product Development) teams to take informed decisions jointly to address present and upcoming thermal challenges. This will also help to validate proposed thermal solutions and strategies and encourage engineers to take the most effective thermal management techniques. This will, eventually, be cost-effective and save a lot of resources. The aim of this thesis is to provide a realistic and accurate power map of the package through system- level simulations at various stages, especially at the earlier phases, of the chip design process. The objectives include proposing a model to enhance the fidelity and reliability of the system-level simulations by considering various scenarios. The research is carried out in two phases. Initially, a thermal model is proposed based on worst-case scenarios. As the research moves ahead, alternatives are explored, and the model is improved gradually. Finally, an accurate thermal model for power delivery is proposed to provide a realistic power map including the hot spots and fine-tuning the thermal model for specific user cases. This thesis explores different IC designs for finding out realistic power mapping and packaging techniques through system-level simulations. It aims at providing practical methodologies at various stages of IC design process through implementation of proposed thermal model. The proposed model provides a better understanding of the thermal dynamics of ICs. Furthermore, it links the IC design with the BTS unit thermal optimization in early phase and provides accurate hot spot information for system level thermal solution dimensioning. Thus, this approach will help engineers make informed decisions and optimize the efficiency and reliability of their design at early phases of IC development. Key words: Heat, heat transfer, integrated circuits (IC), system-on-chips (SoC), thermal behaviours, thermal modelling techniques, thermal simulation tools, chip thermal model, thermal analysis, chip thermal profile. Table of contents 1 Introduction 7 2 Heat – Generation, Dissipation & Transfer 10 2.1 Generation of Heat 10 2.2 Dissipation of Heat 10 2.3 Heat Transfer 10 2.3.1 Conduction 10 2.3.2 Convection 11 2.3.3 Radiation 12 2.3.4 Combined Mode of Heat Transfer 12 2.4 Boundary Conditions 13 3 Integrated Circuits (IC) and System-on-Chips (SoC) 15 3.1 Integrated Circuits (ICs) 15 3.1.1 Evolution of ICs 15 3.1.2 Types of ICs 16 3.2 System-on-Chip (SoC) 17 3.2.1 Evolution of SoC 17 3.2.2 SoC Architecture 18 3.2.3 SoC Design Flow 19 3.2.4 Why SoC? 20 4 Thermal Behaviours of ICs and their impact on SoC design 22 4.1 Thermal Properties of Semiconductors 23 4.2 Impact of Temperature on Semiconductor Devices 24 4.2.1 Electromigration 25 4.2.2 Reduced Carrier Mobility 25 4.2.3 Thermal Stress 25 4.2.4 Thermal Fatigue 26 4.3 Thermal Stress and SoC design 26 4.4 Thermal Management of SoCs 28 4.4.1 Heatsink 28 4.4.2 Fins 29 4.4.3 Thermal Vias 29 4.4.4 Internal Fans 29 4.4.5 Other Cooling Mechanisms 29 4.5 Thermal Analysis Trends in SoC Development 30 5 Thermal Modeling Techniques and Tools for Thermal Simulation 32 5.1 Thermal Modelling Techniques 32 5.1.1 Compact Thermal Models 33 5.1.2 Detailed Thermal Models 34 5.1.3 Thermal Models Derived through Numerical Methods 34 5.1.4 Dynamic or Transient Thermal Model 38 5.1.5 Electro-Thermal Co-Simulation 39 5.2 Software Tools for Thermal and Electro-thermal Modelling 39 6 Development of Chip Thermal Model Using Modeling Tool 41 6.1 Introduction to the Tool Used for Generating the Thermal Model and its Analysis 42 6.2 The IC Thermal Model & its Analysis 43 6.2.1 Power Profile Generation 44 6.2.2 Chip Thermal Model (CTM) Generation 45 6.2.3 Chip Package Level Thermal Analysis 52 7 Chip Package Level Thermal Analysis & Discussions about Generated Thermal Profiles 53 7.1 First Iteration of Package Level Thermal Analysis 53 7.2 Second Iteration of Package Level Thermal Analysis 55 7.3 Final Iteration of Package Level Thermal Analysis 56 7.4 Results & Discussions 58 7.5 Challenges 59 7.5.1 Power Profiles for IPs 59 7.5.2 Collecting Data 59 7.5.3 Communication 60 7.6 Limitations 60 7.7 Future Studies 60 8 Conclusion 62 References 64 Nomenclature 70 Appendices 72 Appendix 1 Approximate values of heat transfer coefficient 72 7 1 Introduction Since the beginning of civilization, society has depended on machines and technology for advancement. Civilization has evolved through several industrial revolutions. At the beginning, people mostly used mechanical machinery which then changed to electronic equipment. At present, in these eras of modernization and digitalization, there is hardly anyone without using at least one digital device regularly. As we move towards the future, we are learning to develop new technologies with our existing resources. But we are not limited by that. Researchers and engineers are constantly searching for new solutions and upgrading existing ones by ensuring maximization of resource use. We can see the effect of major innovations in almost every field or industry nowadays. Especially, as the current cutting-edge technology is highly dependent on semiconductor devices like – diodes, transistors, ICs etc. [1], major R&D is expected and implemented in semiconductor industry. Semiconductor devices use the electrical properties of semiconductor materials to operate [1]. The most popular semiconductor device is MOSFET [1] and it is widely used in building digital integrated circuits (ICs) or chips used in digital devices. Since digital devices are impacting regular lives and playing a key role in accelerating various businesses on a daily basis [2], the semiconductor industry has boomed over the years. Market research shows that in semiconductor industry, a sales growth of approximately 20% has been seen in 2021 [2]. With the rapid growth of AI technology, electric vehicles, automation, and advanced and reliable communication needs [2], researchers and designers of semiconductor devices have been active more than ever to come up with new solutions for existing market and to capture the future market. Since the communication and networking solutions got digitalized, this sector has faced the need for developing innovative, secure, and reliable solutions for their infrastructures. Since their introduction in the 80s [3], mobile networks have progressed significantly. The companies building this industry have become more and more competitive and are always investing in R&D to come up with advanced solutions to maximize their market share. The mechanical part of the infrastructures used for networking solutions are quite huge and needs a lot of work to upgrade. However, changing the digital part i.e., the IC of these devices needs much less effort when it comes to installation. As most of the efficiency to their networking solution is brought 8 by the advanced ICs, it only makes sense to invest in developing ICs with better performance in terms of efficiency, reliability, security and speed in comparison to the available ICs in the market or the previous generation of ICs developed by the company itself. With this view in mind, companies providing networking solutions spend a major portion of their resources in designing and developing their own chips. When IC design engineers develop their design, they put thousands of electronic components in it [4]. When IC is connected to a device, as it runs, the components consume power and start working. This produces heat inside the IC, raising its temperature. Since ICs are built with semiconductors, it is highly affected by thermal dissipation inside it [4]. Excessive heat produced inside the IC may cause malfunction or a complete breakdown of the device in extreme cases [4]. If the amount of heat dissipation inside the IC is not known in advance, the cooling system cannot be built to mitigate the effect of the heating, and this makes the IC and the system very unreliable. No matter how accurate or efficient the system is, in the communication and networking industry, an unreliable system is not accepted by the customers. Reliability is one of the principal requirements and hence, it is crucial for the companies to know the thermal profile of the IC to design the IC cooling system and packaging for ensuring reliability guarantee longevity of their networking solution. To ensure the temperature and heat dissipation inside the IC does not break down, the design engineers study thermal models of the IC. Industry experts pointed out that the thermal model is usually available during the end phases of the chip development process. Even though this model helps the engineers to find the hotspots inside the IC, at this stage, it is too expensive to redesign the IC to minimize the heat generation. They can only design the cooling system to minimize the effect, but this is also an expensive solution. Hence, the question arises – is it possible to develop a thermal model early enough in order to find out potential hotspots and accordingly take redesign decisions for the components of the IC to minimize their power consumption, reducing heat generation and to help the engineers understand cooling needs for dissipation of generated heat before building a prototype (post-silicon stage) of the IC? This research aims to address this question and to find a solution to it. The research is carried out in three phases. Initially, a thermal model is proposed based on worst-case scenarios. As the research moves ahead, alternatives are explored, and the model is improved gradually. Secondly, an accurate thermal model for power delivery is proposed to provide a realistic power map including the hot spots and fine-tuning the thermal model for specific use cases. Finally, a 9 tunable model is introduced that can be used as a form of digital twin to map power in different real-life use cases from the very beginning of the IC development process. This thesis contains detailed information regarding my studies prior to starting the research work, the work itself and the results. In chapter 2, I have given a general introduction of heat generation and the natural ways of heat transfer. Next, the document addresses an introduction to ICs and SoCs in Chapter 3. In Chapter 4, I discuss the thermal behavior of ICs and its impact on the SoC design. Various thermal modeling techniques and available tools to support thermal modeling are talked about in Chapter 5. Chapter 6 contains the detailed process of developing the thermal model while Chapter 7 discusses the analysis of the model and the outcome of the analysis. The challenges faced during the research, its limitations and future scopes are also pointed out in Chapter 7. Finally, the work is summarized and concluded in Chapter 8. AI Disclaimer: While writing and compiling this thesis work, I have used AI tools like ChatGPT by OpenAI and Gemini by Google for grammar checking, advanced sentence formation and for searching and sorting relevant materials. I have reviewed the output generated by the tools and made necessary modifications before using them and I take full responsibility for this publication and its contents. 10 2 Heat – Generation, Dissipation & Transfer Everything in the world is run by multiple forms of energy, of which, thermal energy is one of the dominant ones. A major part of the energy available in nature is in the form of heat. Thermal energy is represented by heat and is measured by the temperature difference between two bodies. 2.1 Generation of Heat Heat is generated through the friction of particles such as atoms and molecules, that build up the matter [5, 6]. Heat is generated by various mechanisms like chemical reactions, rubbing of the surfaces of materials, introducing electricity through the matter, nuclear reactions, mechanical work etc. [6]. Due to these external forces, the atoms and molecules of matter start to vibrate, and their kinetic energy is transformed into heat energy [5,6]. Thus, the temperature of the matter rises by the generated heat. 2.2 Dissipation of Heat Once heat is generated, the matter usually tries to release or dissipate the heat into its surrounding environment for maintaining thermal equilibrium [5, 6]. The direction of the heat dissipation usually depends on the temperature. Heat is usually dissipated from a hotter surface to a relatively cooler environment until the temperature difference of the mediums becomes zero [6]. 2.3 Heat Transfer Heat dissipation is caused by transfer of heat from one medium to another. The rate of heat transfer needs to be studied for designing any equipment where heat is generated when the equipment is run [7]. Heat transfer generally takes place through three modes – conduction, convection, and radiation [5, 6, 7]. 2.3.1 Conduction This is a mode of heat transfer that happens in both solids and fluids. Heat transfer takes place through conduction when the materials having a temperature difference are physically in contact with each other [7]. The rate equation for this mode of heat transfer is based on Fourier’s 11 law of heat conduction that states “The heat flow by conduction in any direction is proportional to the temperature gradient and area perpendicular to the flow direction and is in the direction of the negative gradient.” [7]. Using this law, the proportionality constant known as thermal conductivity is obtained through the following formula: 𝑄  =  𝑘𝐴  ௗ் ௗ௫ ......... (i) Where, Q is the heat flow, k is the thermal conductivity, A of the area, dx is the thickness of the material and dT is the temperature difference between the materials. Unit of thermal conductivity is Watt per meter Kelvin and is represented as W/(m*K) [7]. Since, temperature difference is the same whether measured in Celsius or Kelvin, the measurement scale of temperature does not impact the unit for thermal conductivity [7]. However, usually T represents temperature measured in Kelvin. Thermal conductivity is specific to materials and can vary with varying temperature and pressure. Good conductors have higher thermal conductivity at room temperature. However, as temperature goes up, the lattice vibration at the atomic level of the material increases causing uneven scattering of the electrons resulting the thermal conductivity to decrease [7]. On the contrary, for insulators, thermal conductivity increases with an increase in temperature as lattice vibration initiates the movement of electrons in the atomic level which is usually not happening at room temperature, in this case [7]. 2.3.2 Convection This mode of heat transfer is typically seen in the fluids – liquids and gases. In this mode, the fluid near the hotter surface gets heated and rises by pushing cooler fluids from the top towards the warmer surface. Once again, as this cooler layer of fluid is warmed up, it goes upwards pushing a new layer of cooler fluid towards the warmer surface. This continues until the whole system comes to a thermal equilibrium. Newton tried to incorporate all the parameters affecting the convection process to develop the rate equation for heat transfer by convection [7] and the equation is as follows: 𝑄  =  ℎ𝐴 (𝑇ଵ − 𝑇ଶ) ......... (ii) 12 Where, Q is the heat flow, h is the heat transfer coefficient for convection and 𝑇ଵ & 𝑇ଶ are the temperatures of the hotter and cooler surfaces measures in Kelvin scale. The unit for h is watt per square meter Kelvin (W/(m2K)). 2.3.3 Radiation This is the final mode of heat transfer that requires no medium. However, the surfaces must be in visual contact for radiation to take place [7]. In radiation, heat transfer takes place in their form of electromagnetic waves or particles [7]. The rate equation for this form of heat transfer is obtained from Stefan-Boltzmann law stating “The radiated heat is proportional to the fourth power of the absolute temperature of the surface and heat transfer rate between the surfaces” [7] and is represented through the following equation: 𝑄 = 𝐹𝜎𝐴(𝑇ଵସ − 𝑇ଶସ) .........(iii) Where, Q is the heat flow, F is a factor depending on the geometry and surface properties, σ is the Stefan Boltzmann constant having a value of 5.67*10-8 W/(m2K4) and 𝑇ଵ & 𝑇ଶ are the temperatures of the hotter and cooler surfaces measures in Kelvin scale. 2.3.4 Combined Mode of Heat Transfer It is easier to understand the modes of heat transfer separately. But in practice, multiple modes of heat transfer are taking place at the same time [7]. Heat generation can also occur simultaneously along with heat dissipation. Hence all these factors need to be considered while designing any electro-mechanical system. Figure 1 represents a combined mode of heat transfer. Here, a wall receives heat (𝑇ଵ) on one side through convection and radiation. The convection and radiation coefficient on this side of the wall are ℎଵ & ℎ𝑟ଵ respectively. After the heat is carried through the wall by conduction, it (𝑇ଶ) is transferred by convection and radiation to the next surface. Here, the coefficients of convection and radiation are ℎଶ & ℎ𝑟ଶ respectively. 13 Figure 1. Combined mode of heat transfer. [7] If k is the thermal conductivity of the material of the wall and L is the width of the wall, A being the area, the heat flow equation for the combined mode can be given as follows: 𝑄 = ( భ் ି మ்) ቀ భ೓ೝభశ೓భ ቁାቀಽೖቁାቀ భ ೓ೝమశ೓మ ቁ ∗ 𝐴 .........(iv) 2.4 Boundary Conditions Other than the heat transfer coefficients and thermal conductivity, to solve heat transfer equations during a CFD i.e. computational fluid dynamics analysis, at least two boundary conditions need to be defined for each direction of the coordinates along which the heat transfer has significant impact [8]. The following four boundary conditions are commonly used for calculating heat transfer: Dirichlet Boundary Condition: This defines the fixed surface temperature of the materials between heat transfer [8]. Neumann Boundary Condition: This gives the heat flux at the surface of the material. It can be expressed by Fourier’s law of heat conduction i.e. thermal conductivity of the material [8]. Convection and Radiation Boundary Condition: It is the heat transfer coefficient for heat exchange in a convection mode defined under specific parameters [8]. Similar boundary condition can be derived for radiation as well [8]. Interface Boundary Condition: This is important when the body is made of multiple layers of different materials [8]. In such cases, the temperature at the area of contact must be the same 14 and any of the surfaces cannot store any energy. Only if these two conditions are met, interface boundary condition can be applied [8]. The rate of heat transfer varies not only based on the modes of heat transfer, but also depending on the steady state conditions and transient conditions of the system. Other factors like laws of thermodynamics, Newton’s laws of motion, Law of conservation of mass and the rate equations for different modes of heat transfer affect the heat dissipation rate of the systems well. It is a very complex field for physics which must be crucially considered for any electromechanical system; especially, when it is as small as an IC. 15 3 Integrated Circuits (IC) and System-on-Chips (SoC) We live in an era where we can hardly imagine our lives without electronics. Everyday starting from waking up till we go to bed, even during our sleep, we depend on electronics directly or indirectly. Be it a small device such as smart watch or a huge satellite for defence industry, everywhere we see the use of electronic devices. Integrated circuits, in short ICs (Integrated Circuits), are the main building blocks of this vast world of modern electronics. In this chapter, ICs, their evolution over the years and types of ICs are discussed in brief. In addition to that, SoCs, their design flow and importance in modern digital devices are also explained in this chapter. 3.1 Integrated Circuits (ICs) ICs are made of semiconductor materials, like silicon. These are minute electronic chips fabricated in the laboratories with great precision. Each IC contains billions of transistors connected to perform various functions. ICs are compact electronic chips; minute in structure yet powerful enough to perform full functionalities of gigantic electronic devices which revolutionized and modernized today’s world. Figure 2 is an example of complex IC package. Figure 2. Integrated circuit (IC). [9] The IC or die itself is very tiny and thus needs a package layout with connectors to attach it to the device. It typically has multiple nodes or legs made of conducting material coming out from the package. The package is then connected to the rest of the device through these nodes. 3.1.1 Evolution of ICs ICs are widely used due to their miniature structure, diverse functionalities, high efficiency, and cost effectiveness. From toothbrushes to mobile phones, toy cars to airplanes, video games to 16 spaceships – nowadays almost all devices use ICs. ICs may perform as processing units, amplifiers and/or memory units inside each of these devices based on their requirement and design. Initially, when ICs were first introduced around 1950s, these had only a few transistors [18]. However, they gained popularity soon and due to intense research in this area, within a decade it was possible to integrate thousands of transistors inside one IC. As the capacity of ICs increased, co-founder of Intel Gordon Moore saw their potential and concluded that the number of transistors on a chip doubles every two years [19]. This is popularly known as Moore’s Law. Within the next few decades technological advancement introduced large-scale and very large- scale integration (VLSI) process which soon made it possible to minimize the size of ICs while increasing their power by incorporating millions of components in them. Current technologies can accommodate billions of transistors forming millions of components within a chip having an area of a few square millimetres. In fact, research has been ongoing all over the world to make the chip size even smaller while trying to maximize the capacity and performance of ICs. 3.1.2 Types of ICs ICs are used in different devices for different purposes. Based on their complexity and the purpose of use, ICs can be classified into some broad categories, like – digital, analogue, mixed- signal, memory ICs, application specific ICs (ASICs) etc. [18, 12]. Digital ICs are mostly used in digital devices which need a lot of processing power. Such ICs contain many logic gates, multiplexers, flipflops and other circuits fitted into an area of few square millimetres. Analog ICs, on the other hand, are used for continuous signal processing. Such ICs contain fewer logic gates and are quite complex to design [18]. Mixed-signal ICs are a combination of both digital and analogue ICs and are used in communication devices, portable electronics, phones, cars etc. In today’s world, most electronic devices are built with mixed- signal ICs. Memory ICs and ASICs are the most advanced forms of ICs [12]. Memory ICs functions as data storage and are building blocks for RAM (Random Access Memory) and/or ROM (Read Only Memory). These are the largest in terms of transistor count, have the highest capacity and need the fastest tools for simulation. ASICs are built to perform a specific task with the highest efficiency. These are customized ICs targeted for some particular functions [18]. 17 Silicon dies are extremely small and delicate to handle. It is almost impossible to manipulate these directly to perform respective functionalities. Hence, packaging is introduced for handling ICs. Packaging includes a protective enclosure made of non-conducting materials and have connecting pins coming out of the enclosure to enable users to connect the IC with the circuit board [18]. 3.2 System-on-Chip (SoC) As mentioned earlier, the most popular gadgets of today’s tech enthusiast world are generally small with high performance quotient while consuming as little power as possible. To make such electronic devices, engineers need to incorporate multiple ICs of various components on a single chip. Such chips should individually be strong enough to perform as a system and thus should include their own data processing units like CPU (Central Processing Unit), GPU (Graphics Processing Unit,), video and audio processing, embedded memory system etc. [13] Figure 3 shows a typical SoC. Here, each smaller block works as an individual unit. Combinedly, they make a whole chip to replicate a system. Hence, it is named as system-on- chip, in short, SoC. Figure 3. System-on-Chip (SoC) diagram. In comparison with a device incorporating multiple ICs, a device with SoC will have much less power consumption and a smaller die area [14]. It will ensure higher performance and reduce the costs. Thus, SoC is popular in mobile devices, automotive systems, IoT (Internet of Things) devices, consumer and medical electronic devices, gaming consoles etc. 3.2.1 Evolution of SoC With the advancement of IC designing and technology very large-scale integration (VLSI) technics came into the industry. In the 1990s, several uniprocessors were introduced by 18 implementing VLSI technology which were aimed for multimedia and communication purposes [15]. Chips were developed using VLIW processors which guaranteed parallel processing and simplified programmability [15]. Continuous development in this field gave rise to the concept of SoC as engineers started to integrate modems, Bluetooth, wi-fi and other components to the same chip. Lucent Daytona designed the very first microprocessor SoC for wireless base stations where the same signal processing takes place in multiple data channels. [16]. The boom of wireless communication and cellular technology SoCs (System-on-Chips) needed to be more powerful and have higher processing capabilities for the integrated graphical interfaces in mobile phones. At present, SoCs are designed for very specialized applications like industrial automation, machine learning (ML) technology, AI (Artificial Intelligence), edge computing and many more. 3.2.2 SoC Architecture SoCs may have a single-core-based or a multicore-based architecture [17]. In single-core architecture, the SoC has only one CPU and one or multiple ASICs whereas the multicore architecture allows implementing multiple processing units inside the same SoC [17]. As mentioned earlier, each SoC includes multiple components making it a full system. Figure 4 [18] represents a typical architecture of a SoC. It has CPU, DSP (Digital Signal Processing), memory units, storage, USB, UART, RF (Radio Frequency) unit etc. Figure 4. A typical SoC architecture. [18] 19 Processors act as the vital component for the SoC and the DSP performs all the work related to data collection and processing [18]. Memory is used for data storage and is implemented in the forms of SRAM (Static Random-Access Memories), DRAM (Dynamic Random-Access Memories) and ROM (Read-Only Memory). Encoder and decoders are used for encrypting and decrypting information [18] and RF units are important for data transfer and communication. Network interface card helps in creating a connection between the chip and the network [18]. There could be other peripheral devices or components included or connected externally with the SoC to ensure its smooth performance. Not all SoCs have the same architecture or components. These specifications depend on the purpose and use case of the SoC being developed. 3.2.3 SoC Design Flow SoCs development involves several different engineering roles and has multiple steps in the process. Each step needs to be managed and supervised by industry experts like system architects, design engineers, verification engineers and then finally the manufacturer fabricates the die in special fabrication labs. ANSYS, a simulation software providing multinational company describes the SoC design flow in their blogs [13]. The writer in the blog states [13], at first the functionalities of the SoC to be designed are specified based on its purpose of use. Based on that the architecture of the SoC is planned. Next, the logical design of the chip is developed using hardware description language (HDL). This is then simulated to verify the functional behaviour of the designed chip. This is then synthesized using special tools to break it down into transistors and interconnects. The next phase is to choose appropriate transistor components and physical location on the silicon using implementation tools. Once everything is ready and validated, the finalized graphic files are sent to the manufacturer for production. After the SoC is ready, it is encapsulated in the protective packaging and delivered for use. Figure 5 represents the SoC design flow through a flowchart. 20 Figure 5. SoC design flow 3.2.4 Why SoC? SoC has some limitations. Since it is a single chip designed for specific purposes and requires high efficiency; it takes a lot of resources, effort and time from the developers and designers. This often delays the time to market. Furthermore, if one component in the chip fails, the whole chip is unusable. There is no way to remove or change the faulty component and reuse the chip. However, these limitations are negligible compared to the immense benefits they bring. SoCs are extremely compact. They take up minimum space and thus today’s consumer devices can fit into pockets. Moreover, SoCs are power efficient. Larger components take up more power. So, replacing those with SoCs reduces power consumption manifolds. Though the cost of production is high, a single SoC is much less expensive compared to multiple ICs used for the same purpose. Single SoC needs lesser connections and thus are more reliable and less likely to malfunction. Since all necessary components are embedded in a single chip, SoCs provide higher efficiency and great speed which combinedly leads to better performance. 21 As SoC technology advances, the complexity of the design is also increasing. But the ability of the designers to efficiently manage and create them is not increasing at the same pace. This is creating a designer productivity gap [19]. This gap manifests in multiple ways. For more intricate design, engineers need longer design time. The increased complexity leads to more errors and bugs in the program. This needs to be debugged, which requires more resources. These issues affect the development cost and raise it. Measures are being taken to bridge this gap. Standardization of components and their reuse are encouraged to minimize the use of resources. Collaboration among the stakeholders, designers and users is ensured to make the design process faster and to narrow down the number of bugs and errors. Most importantly, new advanced tools are being developed for design, simulation, and verification. These tools have been helping and will continue to help design engineers to efficiently create and implement more complexities in the SoCs of the future. 22 4 Thermal Behaviours of ICs and their impact on SoC design Each material behaves in a certain way when some heat is applied to them. This phenomenon is known as the thermal behaviour of that specific material. At different temperature, the same material may demonstrate different behaviour. Such sensitivity of each material to temperature is known as the thermal characteristic of that material. Specific heat, heat capacity, thermal conductivity, melting point, thermal shock resistance, boiling point etc, are some examples of such properties. Every matter is made of atoms and molecules. In basic physics and chemistry, the structure of matters is represented in detail. Each atom is formed with positively charged protons, negatively charged electrons, neutral neutrons, and other subatomic particles [20, 24, 25]. Protons and neutrons form the nucleus – the centre, while the electrons remain towards the outer side of the atom. The electrons can move but the nucleus is rigid. There are several bands or orbits along which the electrons rotate around the positive nucleus [20, 24]. Depending on the movement of electrons, electrical conductivity and such other properties of materials are determined and explained. Molecules are made of combination of atoms. It is the smallest unit that holds the material property of any matter [20]. Heat is usually conducted because of the vibration of these molecules due to application of any form of energy [20, 26]. Thermal conductivity refers to the property which allows heat to be conducted freely through the material [21]. Materials with high thermal conductivity are good conductors of heat. Materials like metal have molecules with higher mobility and thus have high thermal conductivity. So, these are generally good conductors of heat and electricity. In conductor materials, there are free electrons which can move freely within the molecular orbitals causing each electron to detach from their parent atom [20]. In metals, there are empty bands available and when electrons receive momentum, they can jump to these empty bands easily. This is possible as the bands are quite close to one another. However, with application of heat, the electrical conductivity of conductors decreases [20]. In some materials, the bands are far away from one another, and it is difficult to move the electrons from one band to another even after applying additional energy. Materials with such limited molecular movement create hindrance in free flow of electricity. Thus, such materials are categorized as insulators. Increasing temperature does not affect much in such materials. 23 Semiconductors are materials falling in between conductors and insulators. Materials are grouped under semiconductors based on their ability to conduct electricity. Typically, semiconductors, as suggested in the name, are not good at conducting electricity. However, under certain conditions like electromagnetic field and external influence like heat or light, their electricity conductivity can be enhanced. Especially, the conductivity of electricity of semiconductors is highly dependent on temperature. With the rise of temperature, usually their electrical conductivity also rises. Silicon, germanium, gallium arsenide are some examples of semiconductor materials. In semiconductor materials, the band gap is smaller allowing electrons to jump into the next band if extra energy is provided. This extra energy is often heat or light. The electrical conductivity of semiconductors increases with temperature [20]. Impregnating semiconductors with impurities introduces new band structure in the molecular level increasing electrical conductivity by raising the number of free electrons [20]. Semiconductors provide crystalline molecular structure with covalent bonds. Such covalent bonds are formed with shared electrons. When heat and/or light is applied, these electrons gain enough energy to move freely throughout the material and these can be manipulated to move towards a specific direction [22]. Based on the purity of semiconductor materials, their electrical and physical properties vary. It is quite common to introduce impurities to their molecular structure to change or manipulate such properties. This process is called doping. Pure semiconductors are known as intrinsic materials and those produced after doping are known as extrinsic materials [22]. 4.1 Thermal Properties of Semiconductors Thermal resistivity, which is the property of the material which resists the heat flow through it, of semiconductor materials is inversely proportional to temperature [21]. That means, the thermal resistivity decreases as the temperature rises and increases as the temperature drops. Typically, their resistivity is less than insulators but more than conductors. These materials are also sensitive to thermal change. At absolute zero temperature, the electron in semiconductor atoms stays in the lower bands, also known as valence bands. As temperature rises, the excitation of the electrons rises, and they jump to the outer bands or conduction bands weakening the nuclear pull. As a result, the semiconductor becomes more conductive. This phenomenon is shown in figure 6. 24 Figure 6. Movement of electron due to thermal excitation in semiconductor material [20]. ICs are nothing but minute semiconductor chips integrated with innumerable electronic components built to perform various functions. When electricity is passed, each of these electronic components generates heat. As current flows through these components, heat is produced due to the friction of electrons inside the IC [23]. This produced heat affects the performance of the ICs, and their behaviour depends on these effects largely. As the IC technology advances, the size of the ICs has shrunk significantly while the number of logic gates has increased manifolds. The ICs have become more compact with billions of transistors in them. Though these ICs come with thermally enhanced packages to deal with the generated heat, designing such compact ICs with higher concentration of components is facing challenges like voltage scaling for performance boost while limiting energy consumption, controlling power dissipation of ICs, maintaining reliability etc [24] For these miniscule ICs, packaging is also getting smaller but resulting in poorer thermal properties [25]. 4.2 Impact of Temperature on Semiconductor Devices ICs are semiconductor devices which consist of a die attached to an insulating substrate and is connected to a conducting base plate [25]. Power dissipation and energy loss in ICs generate heat at the component level [23]. This excess heat needs to be removed effectively to guarantee efficient and reliable performance of the IC. Electronic devices with ICs, based on their use cases, may need to be installed in extreme environments. ICs in such devices may demonstrate unexpected behaviour outside their temperature limit making the device completely unreliable. These effects are clearly explained by Altium in their blog [25]. Such devices are affected by heat stress when these are subjected to pulsed loading effect. Due to this rapid switching between on and off, the die faces rapid 25 temperature changes. This causes thermal stress on the IC. This stress can fracture various layers of the IC resulting in changes in its electrical properties. Some examples could be slower operational speed, increasing propagation delays and in extreme cases, total failure of the IC. When it comes to reliability, thermal discharge and heating can adversely affect the credibility of an IC and its performance. Especially in delicate systems where precision is mandatory, reliability of the device is an uncompromised requirement. The reliability of an IC depends on its design, components of the IC, its packaging and cooling mechanisms [26]. To design the packaging and cooling system of the IC, it is important to know the temperature profile of the IC and the hotspots. Uncontrolled heat dissipation by the components on the chip can damage the reliability of the device in multiple ways. Some of the adverse effects of excess temperature of the IC are discussed here. 4.2.1 Electromigration Excessive heat results in electromigration. In their website, Synopsys [27] defines electromigration as “... the movement of atoms based on the flow of current through a material. If the current density is high enough, the heat dissipated within the material will repeatedly break atoms from the structure and move them. This will create both ‘vacancies’ and ‘deposits.” Electromigration is a severe problem that may lead to open-circuits and/or short-circuits depending on migration direction. Thus, the reliability of the ICs can be put to test due to electromigration resulted by thermal stress. Thermal stress can also result in mechanical stress and fatigue in the IC package causing degradation and failure of the system. 4.2.2 Reduced Carrier Mobility When electricity passes, the electrons inside the ICs move and carry the signal. This is known as carrier mobility. Scientifically, carrier mobility can be defined as the rate of an electron or a hole travelling in a metal or semiconductor material when external electric field is applied [28]. Higher mobility refers to faster IC operation. This is, however, affected by temperature. Higher thermal stress reduces carrier mobility and slows down overall operating speed of ICs. 4.2.3 Thermal Stress Thermal stress in an IC is closely related to its power consumption behaviour. A rise in internal temperatures reduces the insulating properties of the insulating materials inside the IC package. This leads to cracks and current leak through these cracks creating unintended ways for passing 26 electricity. This causes additional power consumption and misfunction in the logic. Thermal stress can break the PCB and cause electricity to leak out to other parts of the device [26]. The building blocks of ICs are transistors, and their switching behaviour depends on the threshold voltage. Higher temperature lowers this threshold voltage which makes the transistors switch earlier than expected. Thus, the switching speed increases and results in higher power consumption. It also raises the conflicts among the logic gates, as they seem to act before their turn and there rises a potential race condition [29]. 4.2.4 Thermal Fatigue Due to cycling heating, the plastic encapsulation and metal interconnects within the IC faces continuous thermal stress. As discussed by V. Lakshminarayanan et.al. In their research paper [26], due to the difference in thermal coefficient of these materials, their expansion and contraction rates are not the same. An increased temperature in the IC can add to this thermal stress causing fatigue in the materials. This fatigue leads to mechanical stress induced by thermal expansion and over time can damage the device completely. Excessive temperature is one of the major issues that results in semiconductor device malfunction and failure. The lifecycle of any IC can be predicted by understanding the thermal profile of the IC. It is crucial to get an estimation of the effect of temperature on the IC well ahead it is produced on a commercial basis to ensure the reliability of the chip. 4.3 Thermal Stress and SoC design Multiple processors and components are implemented on a SoC to provide an enhanced speed and better performance. However, such a performance boost comes at the cost of more power consumption, higher power density and recurring hotspots, if not designed carefully taking all the thermal factors in mind [30]. Furthermore, multiple IPs with varying components make it exceedingly difficult to have a uniform temperature profile across the chip. The spatial temperature gradient, which is defined as “...a vector quantity with dimension of temperature difference per unit length” in Wikipedia [31], causes degradation in system reliability and performance [32]. SoCs are designed to perform multiple tasks concurrently at the same time. This may lead to a variation in temporal temperature at a single point on the chip [33]. Thus, it is crucial to take thermal discharge and its impact into account while designing a SoC. 27 While designing a SoC, designers should make sure to implement proper thermal management. Keeping this in mind, techniques like heat spreaders, thermal vias, heat sinks, fans etc. must be implemented in the design to dissipate away the extra heat produced, especially from the components which are critical for the chip. Incorporating advanced packaging technologies such as flip-chip and 3D integration etc. can ensure improved thermal performance for the chip. Such packaging technologies help in reducing thermal resistance and increases heat dissipation. Minimizing power consumption often helps to keep the thermal stress in SoC devices on check. Keeping the unused parts of the SoC inactive by clock gating, techniques like dynamic voltage and frequency scaling can help in minimizing heat generations as well optimize power consumption. The SoC's layout/floor plan must be made carefully to mitigate the effects of thermal discharge and heating. Optimizing the architecture and layout means careful placement of the components, routing, and ground traces to minimize resistance. To minimize hot spots, heat- generating parts could be placed closer to the heat sinks. Special material can also be used to design such parts to minimize heat dissipation. Furthermore, as mentioned earlier, power gating and clock gating techniques can be implemented to ensure reduced power consumption at idle stages. However, a well-planned chip layout and proper power management can only contribute to lowering the heat generation by the SoC. These steps cannot completely turn off heat generation. Thus, it is essential to implement a proper cooling mechanism for the SoC so that it can dissipate the generated heat outside the package as fast as possible. For keeping the SoC cool while operation, the rate of heat dissipation must be more or at least equal to the rate of heat generation by the chip. Proper cooling mechanism includes both passive and active cooling mechanism designed specifically for each SoC based on their purpose of usage. As an improperly designed chip and faulty cooling mechanism may generate and store high heat which might adversely impact the device's reliability. While in operation, the rate of temperature increase inside the chip depends on the heat generation inside the chip, heat capacity of the material and the heat dissipation rate of the chip to the heat sink [34]. If the heat dissipation rate is lower than the heat generation rate, thermal equilibrium will not be reached causing the chip to overheat. Hence, despite heat dissipation management being cumbersome process, its effect in improving the reliability of the IC is unparallel [34]. 28 4.4 Thermal Management of SoCs As mentioned earlier, even the very well-designed ICs and/or SoCs will generate heat as they consume power. Other factors affecting heat generation inside the semiconductor devices include thermal resistance and thermal conductivity. These are closely connected to each other and need to be carefully considered when deciding on the internal construction of the SoC as well as its heat dissipation rate in the ambient surrounding [35]. As defined by Cadence [35], “Thermal resistance (θ) is analogous to electrical resistance, and indicates the resistance offered by the semiconductor device to the flow of heat generated internally to the ambient.” On a chip-level, it is defined for junction-to-case (heat transfer efficiency from the chip to the package's case) and junction-to-ambient (heat transfer efficiency from the chip to the ambient environment) and on a package-level it is defined for junction-to- top (heat transfer efficiency from the chip to the top surface of the package) and junction-to- bottom (heat transfer efficiency from the silicon die to the bottom of the package and into the PCB) etc. [35]. The unit used for measuring thermal resistance is °C/W [35, 36]. It is a good parameter to measure the reliability of a semiconductor device coming from different manufacturing companies using similar packaging style [35]. Thermal conductivity, as defined in Chapter 2, the rate of heat flow through a certain material. Thermal management of the IC is an important factor in its design and development process. It is important to create a gateway for the generated heat to dissipate in the surroundings. This brings us to the cooling mechanism or system for the SoC. Typically, the SoCs are cooled either passively or actively. Various techniques are used to implement passive and/or active cooling systems inside the SoC package. 4.4.1 Heatsink Passive cooling is obtained through the natural methods of heat transfer, i.e., conduction, convection, and radiation [37]. A heat sink is an example for passive cooling and is widely used in SoC design [36, 37]. A heat sink is a metal object whole one side is attached to the SoC’s surface filled with the components [36]. This is the surface of the SoC where all the heat is usually generated. The metal surface of the heatsink ensures good thermal contact with the heat generating surface of the chip [36] and it can effectively transfer heat from the source throughout its conductive body and the ambient surroundings through conduction and convection respectively [37]. Material of the heatsink being a metal ensures good thermal 29 conductivity. Usually, this material has low thermal resistance for efficient heat dissipation [36]. 4.4.2 Fins The other end of the heatsink dissipates the conducted heat through convection process to the ambient environment. Heat transfer by natural convection process, as we can see from its unit (W/(m2K)), is largely dependent on the surface area of the material. The surface area of the heatsink is thus increased by incorporating fins on the convection surface. Fins are thin plates of metal stacked in a zigzag orientation on the other side of the heat sink, which is facing the ambient surroundings. Because of the fins' structure, the surface area increases manifolds, increasing the system's rate of heat transfer. 4.4.3 Thermal Vias There are thermal dissipation pads on the PCB which contains some padded holes drilled into them. These holes are known as thermal vias and are used for maximizing the heat dissipation efficiency [38]. Thermal vias are arranged in grids and are a very common methods used for passive cooling of the SoC. 4.4.4 Internal Fans Natural convection by heat sinks and fins has been in practice for a long time. However, its heat dissipating ability is limited to an equivalent of maximum 300 W [39]. To manipulate the rate of convection process, fans can be incorporated around the system to fasten the convection process. It makes the warm air move away faster while the cool air takes its place. A balanced airflow can be a more efficient way to dissipate heat and the limit can reach up to an equivalent of 2,000 W maximum [39]. This is a kind of active cooling mechanism. 4.4.5 Other Cooling Mechanisms Depending on the use cases, SoCs may consume a lot of power and generate a huge amount of heat. Only heatsink, fins and internal fans are not enough to keep the system cool. Hence, other active cooling mechanisms like synthetic jet air cooling, electrostatic fluid acceleration etc. are used [39]. A cooled air based active cooling system may help dissipate an equivalent of 3,000W whereas a cooled liquid based active cooling system may dissipate up to and equivalent of 30 8,000W [39]. Fluids used for such systems are usually non-conductive of electricity and are excellent for thermal conductivity [37]. In most cases, multiple cooling mechanisms are implemented to the SoC to keep the internal temperature at an optimum level. To design the appropriate cooling system for the chip, it is crucial to know the hot spots on the chip and its thermal profile. 4.5 Thermal Analysis Trends in SoC Development Discussion with a few industry experts from SoC development sector has led me to believe that the reliability of the designed SoC is one of the most important factors for them. As temperature heavily affects the reliability of the SoC, it is beneficial for the developer to predict the temperature beforehand to predict the lifecycle and reliability of the designed SoC. The rule of thumb in the field for SoC development states that a 10°C rise in temperature will reduce the lifecycle of the SoC by half [26, 40, 41]. Hence, the earlier engineers can get an estimation regarding the thermal behaviour of the designed SoC, the earlier they may take actions to improve the design for providing a better life cycle and to ensure reliability. With proper thermal analysis, engineers can minimize the risk of product failure. It also protects the SoC development engineers from any unwanted surprise at the end of SoC development process when the prototype is being tested. A very high temperature profile found out during the prototype testing could be quite damaging in terms of reliability and can be very expensive to rectify the design at this phase. On the contrary, if this temperature profile can be formulated at an early stage of the product development, then design engineers may avoid those extremely high temperatures at the product level. Figure 7 represents a typical timeline for the SoC development process and associated cost of redesigning the SoC. The early phases of SoC development process consists of requirements analysis, architecture exploration, RTL design and netlist synthesis. This phase usually continues inside the company who are developing the SoC. Next the design is shared with the silicon or SoC vendor for floor plan exploration, placing and routing of the design on the chip using various tools. Once everything is finalized, silicon vendor proceeds to create a prototype by fabrication of the chip and then validates the performance through circuit test. Once it passes the circuit test, the chip is ready for delivery. 31 Figure 7. A typical timeline for the SoC development process and associated cost of redesigning [9]. Usually, the thermal models for the SoC are available during the placement and routing stage by the silicon vendor. They share it with the company for their feedback on the temperature profile. At this stage, if the temperature is too high, the engineers need to redesign the SoC to minimize the respective component's power consumption and heat dissipation. This is the generally established practice. Nowadays, many design optimization tools are available in the market. Designers can employ such tools to simulate the thermal behaviour of their designed SoC at the early stages and find out potential hotspots. This allows them to modify and optimize their design before the chip is ready for production. Not all tools in the market are equally reliable. However, these tools can guide the design engineers to improve the thermal performance of the SoC and prevent overheating of the chip. They can also ensure the reliability of the SoC under varying thermal conditions. Thermal stress consists of temperature gradient, thermal cycling etc. And has a significant impact on the reliability and performance of contemporary ICs and SoCs [30]. Hence, thermal management is a key factor when designing any electronic device that includes semiconductor components. Such components are susceptible to heat. Irony is the heat they themselves generate creates enough challenges in maintaining the temperature of the core [25]. As the number of such components increases and the area of the chip's shrinks, conventional power and temperature management techniques fail to paint a proper picture. Hence, engineers need proper tool support to predict the electro-thermal behaviour of ICs while designing a SoC. 32 5 Thermal Modeling Techniques and Tools for Thermal Simulation As IC designers design their electrical circuit, a thermal system is also generated in parallel. In the early stages, when ICs were just starting to be developed, this thermal system's effect was negligible. However, in this current era of increased power and packaging densities, this thermal effect has a significant effect on the performance of the ICs. Technology downscaling and increased power density raises chip temperature which, if remained unchecked, may change device parameters making the circuit faulty and unreliable [42]. Cooling of the ICs in a system was previously handled by mechanical engineers, who used to design the external cooling system for the IC packages or the chips [42]. But in the present days, this is not enough. The heating issue must be addressed at the start of the IC design process, and the design engineers need to make their circuits electrically and thermally viable. IC thermal design directly impacts the functionality, reliability and leakage power of the electronic equipment where it will be implemented [24]. Even though most of the heat production in ICs is caused by the power dissipation of silicon junctions, electromigration may also be caused by the self-heating effects of the metal lines embedded into the chips to establish connections between the blocks in low k-dielectrics [24]. Hence, having an IC-level thermal analysis is more crucial now than ever. To understand and analyse thermal behaviour of ICs, simulation of such behaviours is essential. Thus, many researchers have drawn to this field of research and have agreed about the necessity of carrying out thermal behaviour simulations at the package level, PCB level, system level etc. Carrying out such simulations during the mechanical and electrical design process is crucial to develop an optimum and cost-effective solution for meeting the physical, electrical and thermal requirements of the designed IC [24]. 5.1 Thermal Modelling Techniques IC design and verification engineers depend on thermal modelling techniques and software tools for the development of cost-efficient, reliable, and high-performance cooling technologies for futuristic electronic devices. Implementation and analysis of such modelling techniques can help engineers to make their design more reliable, help increase the longevity of the system, optimize the performance of the designed IC, prevent thermal runaway, optimization of power delivery etc. Such techniques are also crucial for validating the design, designing the cooling system for the chip and detecting the hotspots in the die and mitigating their effect. 33 There are several thermal modelling techniques used in the IC development industry. Some of these techniques are purely analytical whereas some are numerical. A few of the most used techniques are elaborated here. 5.1.1 Compact Thermal Models As discussed by several industry experts during this research, predicting the heat formation and dissipation inside an IC has always been quite difficult. Even with tools supporting the process, the equation for heat conduction being complicated, it had not been possible to get a solution using analytical methods [43]. Hence, the need arose for a better solution and a compact thermal model was introduced by the researchers in this field. This model considers the key temperature nodes and represents the thermal model in an abstract model. This model is established based on “...in the active part of the semiconductor device the uniform distribution of temperature occurs”, as described by Górecki, Krzysztof et al. [43]. This emphasizes that the internal temperature of an IC can be determined and discussed based on the temperature of certain nodes inside that IC. Thus, a compact thermal model takes behavioural approach to predict the temperature at the package level of the IC only at a few critical points with less computational effort [44]. Figure 8 shows a typical compact thermal model example. Here, the whole area is divided into four regions based on similarity on heat dissipation calculated at four highest points. Figure 8. Example of a Compact Thermal Model. Compact thermal models are more abstract and do not consider the geometry or material properties of the original ICs. Such models usually use thermal resistor network to formulate an analogue model of the electrical network of the IC [44]. This is a simple and efficient model 34 for fast analysis during the initial exploration stages of IC development process. Such models can be helpful for identifying potential hotspots and mapping heat dissipation inside the IC during the preliminary investigation. However, due to lack of details and complexities, using such models can provide much less accuracy compared to some other thermal modelling techniques. Compact thermal models sacrifice accuracy to gain fast computational power. It is a very simple modelling technique as well. Because of these values such models provide, compact thermal models are quite popular in analysing electronic networks and ICs. 5.1.2 Detailed Thermal Models A detailed thermal model tries to represent the physical geometry of the IC in as much detail as possible. This model is almost the same as the real package and requires tools that integrate electrical and mechanical design kits. Such models, if designed properly, should be independent of boundary conditions defining the behaviour of the chip and should be able to predict temperature accurately irrespective of the cooling mechanism [44]. Detailed thermal models are very complex and require a lot of time and computational resources. Hence, these are not feasible for ICs with multiple packages. Such models are not commonly used unless the developed IC needs too accurate thermal calculations. 5.1.3 Thermal Models Derived through Numerical Methods Various numerical methods can be implemented to come up with thermal models for IC design and development. These numerical methods are usually versatile and can include complex geometries and material properties of the ICs. They can also capture the temperature distribution throughout the IC area in a better way than the compact thermal models. Even though numerical methods are computationally expensive and require some expertise for setting up and interpreting the simulation, these have gained popularity among the researchers and IC development engineers for their better accuracy and efficiency. In the industrial level, multiple numerical methods are used by different companies to create and develop the supporting tools for thermal model generation. Most used methods among these are finite element method, finite difference method, Fourier expansion method etc. 35 i. Finite Element Method (FEM): The Finite element method, commonly known as FEM is a numerical method to solve partial differential equations for mathematical modelling purposes [45]. The physical geometry used for modelling gives rise to space and time related partial differential equations [46]. However, such continuous equations are quite impossible to solve in an analytic approach. Hence, these are transferred into discrete counterparts (the process is known as discretization) and an approximate partial differential equation is formed using those. When solved, these equations can provide an approximation of the real solution [46]. FEM is a method to solve these equations. To model the heat dissipation of the chip surface, attention needs to be paid to the geometry of the system, its domain, the material properties and the system’s boundary, initial and loading conditions. As discussed in the book “Finite Element Method: A Practical Course” by G.R. Liu and S.S. Quek [47], modelling using FEM follows four steps to achieve the result:  Modelling of Geometry – The real structures of physical components and domains are very complex and need to be broken down into the simplest form of geometry. Usually linear elements are used, and the structure is simplified by representing it by piecewise straight lines, triangular or quadrilateral flat surfaces [47]. The greater the number of linear elements, the better the representation of the structure. But the higher number of elements makes a rather complex geometry and requires more computational power and time. In most cases, the outcome from an optimum geometry is not so different from a detailed geometry regarding technical figures. A detailed finely tuned geometry may provide better aesthetics, which is not so important in technical calculations. Hence, judgement needs to be applied to select the number of elements based on design requirement, available tools, and resources [47]. Figure 9 represents an example of finite element piecewise approximation to model the geometry in one-dimension. In FEM, a continuous function is broken using piecewise linear functions into multiple elements to represent the whole structure irrespective of shape and size [47]. 36 Figure 9. One-dimensional finite element piecewise approximation example. [47] Modern software tools are very efficient for creating properly optimized geometry for finite elements. There are even computer aided design (CAD) tools available to model and manipulate such geometry. Despite that, practical knowledge and experience is essential for modelling of the geometry for any system. Hence, expert engineers are needed to operate these tools so that they can take modelling decisions based on their requirement and guide the tools towards generating the right geometry.  Meshing or Discretization – In the next step, the modelled geometry is broken down into small discrete pieces. Each of these pieces is called an element or a cell. This process is known as meshing. It is almost impossible to come up with a common function for the whole geometry as the engineering problems are usually quite complex and unpredictable. Hence, it is much easier and achievable to divide the geometry into small elements and solve each of these elements individually. The aggregated solution for all the elements then gives the solution from the whole domain [47]. Meshing is an important step for formulating FEM equations [47]. It is mostly done using the software tools available in the market. Commercial tools have packages included for helping with meshing. Most common shapes of elements used for meshing are triangles and quadrilaterals [47]. 37 The following figure 10 shows how FEM software generates mesh and solves differential equations for various modelling purposes. Figure 10. Example of a mesh and its solution using FEM method in FEM software. [45]  Specification of Properties of Material – It is important to specify physical properties of material to formulate the FEM equations. For solving the FEM equations, based on the kind of simulation, different material properties are needed to be fed to the tools for each of the materials used to physically make the system. For instance, the materials' thermal conductivity coefficient is necessary for a thermal analysis of the system. Database is available for commercial use with the names of materials and their physical properties. Information can be collected from such a database and fed to the respective area of the software tools and the rest will be taken care of by the tools. However, care needs to be taken so that properties for all the constructing materials have been input accurately for an accurate solution.  Specification of Boundary Conditions – Based on the boundary conditions of the systems, the solutions for FEM equations may vary significantly. It is good to provide the real-life conditions around which the real system would be operating. Feeding the accurate boundary conditions to the tools can efficiently and accurately solve the FEM equations providing more realistic simulation results for further analysis. FEM is a practical and versatile method popularly used in thermal modelling tools for ICs. It is a very powerful numerical technique and can solve extremely complex 38 problems. It can handle irregularities in the geometry and boundary conditions. Hence, this method is popularly used in fields like structural analysis, fluid flow, electromagnetic potential, heat dissipation, transports etc. [45] ii. Finite Difference Method (FDM): As the name suggests, this is another mathematical method to solve differential equations for modelling purposes. In FDM, the equation is solved iteratively, and the equation represents the exact physical system without any approximation [48]. Approximation is made only in mathematical modelling, not in the equation formation [48]. The book by H. S. Govind Rao [48] discusses FEM and FDM in a comparative manner. In comparison to FEM, FDM is less complex and needs a larger number of nodes to get a good outcome. Nodes are important for FDM as this method does not give any value other than the nodes and does not even give any approximation in between the nodes. That means, in FDM, irregular geometry can be approximated only in a stair-like form. For all these limitations, FEM is more popular than FDM. However, as it is less complex and less time-consuming, for faster modelling of less critical issues, often FDM is used. iii. Fourier Expansion Method: The Fourier expansion method used fast Fourier transformation for Fourier expansion. This is a very fast and effective procedure to implement in modelling tools. However, the algorithm is limited to a few boundary conditions on a rectangular surface only [42]. That is why it can be applied to a limited number of cases. Hence, even though problem definition is easier using Fourier expansion method, FEM is more accepted due to its versatility. 5.1.4 Dynamic or Transient Thermal Model Such thermal models consider the effect of time in the heat and power dissipation process. It is essential to consider the effect of time while calculating thermal models of ICs, as stationary models can be inadequate as they do not give the real-time effect of time, power and heat on the IC [49]. Transient thermal model is required for transient analysis of heat dissipation on the chip. Such models are crucial to predict the effect of dynamic loads on the chip and help to plan the load management strategies [49]. Transient thermal simulations can be done using the discussed 39 numerical methods. However, due to limitations of time and available data, this topic is out of the scope of this thesis. 5.1.5 Electro-Thermal Co-Simulation In electro-thermal co-simulation, the interaction between the electrical and thermal behaviour and responses of the IC are captured simultaneously. In the chip, the temperature is not usually evenly distributed throughout the area. Due to the differing heat dissipation by different components mounted on the chip, one or more hotspots are created on the chip [50]. Depending on the ambient condition of the system, thermal properties of the material, etc., heat is distributed over the package through thermal conduction and over the surrounding through convection and radiation [50]. In electro-thermal co-simulation, the engineers get the most comprehensive analysis of power- integrity and thermal management. Here, the electrical and thermal equations are solved at the same time providing more accurate results in a quick manner [50]. Electro-thermal co-simulation modelling technique is the most complex method and requires specialization and expertise on the tools. Tools using this technique require high computational power and resources. Many tools, now-a-days, use this technique for electro-thermal modelling of ICs. 5.2 Software Tools for Thermal and Electro-thermal Modelling There are currently many software tools available in the market to help the IC development engineers to help with thermal and electro-thermal modelling. Such tools are essential to engineers to design and develop a cost-efficient, high-performance, and reliable cooling solution for their ICs [24]. Software tools aid in multi-scale modelling of the electro-thermal effect on the chip ranging from die level to package level to system level [24]. Tools can also provide a clear model of heat dissipation and hotspots at pre-silicon and post-silicon stages of chip development. Many tech companies are providing multiphysics engineering simulation tools. Some of such companies are Siemens, ANSYS, COMSOL, Synopsys etc. Flotherm, Icepak, Redhawk-SC Electrothermal are few commonly used tools for thermal modelling and analysis of ICs. Siemens’s Flotherm, ANSYS’s Icepak and COMSOL’s Multiphysics use FEM methods. 40 ANSYS’s Redhawk SC Electrothermal uses both FEM and electro-thermal co-simulation techniques. Amid so many modelling techniques and tools, which one to be used is a big design question. The answer depends on certain factors like complexity of the IC, accuracy level, available computational resources, design level etc. For simpler designs compact thermal models will suffice. This method requires less computational effort and is usually good for initial exploration. Hence tools based on such methods are to be used in these cases. On the other hand, complex designs which need better accuracy will benefit from FEM or co- simulation. However, these methods need higher computational power and are resource intensive. It is often beneficial if such models are used during the later stages of IC development process. In these cases, tools based on FEM and electro-thermal co-design methods are useful. Generation of thermal models and their analysis is crucial to understand the thermal behaviour of the ICs. This is also important to design the cooling system for the package to come up with an optimal and reliable solution. This is a complex and tedious process, if done manually. Current mathematical models and software tools provide innovative and fast solutions for this purpose. So, it only makes sense to use these available tools to the fullest to design and develop the best possible IC in terms of size, power consumption, efficiency, and cost. 41 6 Development of Chip Thermal Model Using Modeling Tool With the continuous research and development (R&D) activities going on in various industries that use SoCs, the need for application specific circuits with higher processing power, lesser area coverage and better accuracy is also growing. A new benchmark is created every day, and some new chip comes and surpasses that benchmark giving a new milestone to the developers quite frequently. Hence, most companies invest much of their resources in R&D units. Since development of an efficient SoC is already very expensive, it is crucial to optimize its physical design during the development stage. To help the engineers with this process, various vendors are coming up with new technologies and tools to predict, simulate and emulate the IC behaviours as accurately as possible during the early stages of chip development. As the architectures, technologies and development processes of SoCs emerge and get modernized, the necessity of modelling and predicting their thermal behaviour also rises. To ensure their performance is not compromised by aging or electromigration, it is crucial to look at the chip-level thermal simulation of the under development SoCs for finding out the potential reliability issues due to unexpected hotspots [51]. Identifying such issues early enough enables design engineers to make necessary architectural changes across the chip before it goes for mass production [51], saving a lot of time, money, and human resources for the company. As mentioned in chapter 3, after interviewing several industry experts, I came to know that the common practice is to collect the thermal profile of the SoC from the silicon vendor during the placing and routing. However, by then, it is already too late to go back and redesign the SoC. The redesign is expensive and pushes the delivery of the designed chip further [43]. This might affect the project timeline and competitors’ SoC could come into the market manipulating the clients to move to a different product rather than waiting for the company’s product. That is why the SoC designers feed the need of thermal profiling model for their designed ICs at an early phase of the development. During this thesis, a similar thermal model has been developed and analysed using a modelling tool. The thermal model is based on an example model of an in-house developed SoC of Company A. The necessary example data is collected from the company itself and their silicon vendor. The tool used for modelling and analysis purposes in this thesis is developed by the company’s approved tools vendor. 42 6.1 Introduction to the Tool Used for Generating the Thermal Model and its Analysis The tool used during this thesis has been developed by a renowned simulation software company. We managed to get full cooperation from their support team regarding creating an early model of the physical design for the chip which is to be used for early thermal analysis and thermal model creation for the system-level thermal analysis software, also developed by the same tools vendor. The tool has been introduced on their website nicely with a lot of technical details [52]. It has been stated there as a Multiphysics simulation platform providing a complete solution for layout extraction, signal and power integrity analysis, thermal profiling, etc. for chip packages and their interconnects. The tool allows detailed electrothermal analysis for early-design exploration, post-layout design verification, 3D (three-dimensional) IC model assembly, optimization, accurate thermal modelling of 3DIC systems, stating and transient thermal analysis and many more. The tool includes the best features of other thermal and mechanical tools developed by the tools vendor and combines those to solve power (PI), heat, signal integrity (SI), stress equations etc. of a heterogenous system. It can concurrently analyse up to a billion instances. As mentioned on the website [52], this tool is certified as a high capacity electrothermal solver for prototyping and Multiphysics co-analysis. Hence, it can be used for prototyping with early block estimates. It has analytic engines to facilitate seamless handling of heterogenous inputs. This tool can be integrated with other simulation and supporting tools developed by the same company. The tool supports multiple formats of data making it possible to use data files generated by other tools from various vendors as inputs for the tool. The tool uses different solvers for the different Multiphysics and uses Finite Element Analysis (FEA) using FEM [53] to perform static and transient thermal analysis, mechanical stress and warpage etc. [52]. The tool also supports flows from early prototyping to final signoff. In this thesis we purely concentrate on thermal integrity analysis and providing Chip Thermal Model (CTM) data for a system-level thermal analysis with simulation software developed by the tools vendor. The tool accepts the package layout for the chip and the extracted netlist (only used for combined electro-thermal analysis). Based on the inputs, it can generate a chip thermal model 43 (CTM) and runs thermal analysis based on other related inputs. Finally, it generates the result of the thermal analysis and visualizes the thermal hotspots on the die, Figure 11 represents these steps in short. Figure 11. A block diagram of the tool and its inputs/outputs. [52] This tool, referred to as ‘the thermal analysis tool’ from now onwards, is a one stop solution that meets all the functionalities required for this thesis. Hence, we decided to move ahead with this thermal analysis tool and compare the outcome with lab results of the developed IC and the simulation results of silicon vendor. 6.2 The IC Thermal Model & its Analysis To replicate the thermal model, a chip that is almost towards the end of the development process is selected. The IC has been designed and developed by company (A) and will be manufactured 44 by the silicon vendor. The IC has multiple company-specific IPs and some vendor specific IPs. Generation of the thermal model and its analysis consists of the following three steps – I) Power profile generation II) Chip thermal model (CTM) generation III) Chip package level thermal analysis Figure 12 shows inputs and outputs for each step followed during thermal modelling and analysis of ICs. Figure 12. Steps included for thermal modelling and analysis. 6.2.1 Power Profile Generation The tool we used can generate the CTM on its own. However, to generate the CTM, the tool requires information about the size of the chip, shape and location of all the IPs on the chip and 45 the amount of power each of them individually consumes. This information can be generated using various simulation tools, its libraries and the RTL netlist of the designed IPs [51]. The generated file is called the power profile of the IC and is used as an input for tool. Modern power profile generation tools allow accurate early power profiling using only RTL description [51]. These tools analyse different waveforms that represent the activities inside the designed IP under a specific scenario. These tools are equipped to consider factors like clock- gating and other physical effects to generate a realistic power profile making it easier for the engineers to concentrate on their own design and architecture [51]. Engineers can set up the tools to generate the power profile for each IP based on the best-case, worst-case or any other real-life use-case scenarios. During our case, the design engineers had already generated the power profile of the IPs present in the IC for their worst-case scenario using power profiling tools. They shared their power profile files with us for CTM generation. We also received an additional power profile from the silicon vendor’s end. 6.2.2 Chip Thermal Model (CTM) Generation The power profile of all the IPs can be then combined to get the system level view of the IC. This is how usually architects visualize and simulate the power and thermal behaviours of the SoCs and system level. This power profile along with the physical data can be combined and used for simulating and verifying power integrity, reliability, steady state and transient thermal analysis [51]. The thermal analysis tool can perform all these analyses. But in this thesis, we only perform and discuss steady state thermal analysis. A CTM represents the power density and its variation across the SoC under a specific scenario in detail [51]. As it captures the power density, it captures the temperature-dependent power variation as well if the right settings are applied during the generation of the model [51]. In addition to the power profile of the IPs set on top of the die, generation of a CTM requires some additional information like the floorplan of the SoC, materials used in the chip and their physical and thermal properties, density of the materials, technology used for manufacturing the chip etc. Based on all this information, the thermal analysis tool can generate CTM as accurately as possible. 46 Depending on the accuracy of the inputs, a precise CTM can be generated that could also be used as a prototype for the SoC. Figure 13 shows the generation of CTM for a SoC under a specific scenario for system level analysis. Figure 13. Generation of CTM for a specific scenario for system level thermal analysis. Here, the Layout Floorplan gives the floorplan of the die that includes information regarding the shapes and positions of the IPs. There are many technologies available for semiconductor manufacturing, like – TSMC, UMC, Intel, Samsung etc. Each of these has their own design requirements. The Technology file provides this specific information to the tool. The chip is made by adding materials layer by layer on top of each other. The material density of each layer varies based on the design, vendor and technology. Such information is present in the Material Density file. The Power Density Model contains the power profile of each IP. Once the thermal analysis tool was installed, we collected the data sheet of power profile from the company’s (A) design engineers. The data sheet contained the instance names and their respective hard macros (HM) and their area coordinates. HM refers to the physical representation of an IP. HM may also contain multiple IPs or just a part of a bigger IP. Each HM may be used once or several times in the same chip. For each usage, the HMs have separate names known as instances. The IC area is considered as a plane on x-axis and y-axis and in relation to that, the IPs are placed on the area coordinate covered by the SoC. Hence, we need the area coordinates of the HMs to find their position and area on the die’s top surface. The sheet also contained the power values for each HMs. a) The First Model: We wrote a python script to create an input file in the right format collecting the necessary information from the data sheet. The input file contains area coordinates of the IC, names and area coordinates of each instance and their respective power. Based on the data presented in the input file, the thermal analysis tool generated the first physical model with a power density of the IC. 47 Figure 14 shows the block diagram of the process and figure 14 shows the power density model generated by the tool based on the input file. Figure 14. Block diagram of the power density model generation process. The data sheet we received was in .xls format. But this thermal analysis tool cannot directly use data from .xls. Furthermore, the data sheet contains a lot of other information which is irrelevant to our task. Hence, we need to sort the data and feed the required information in the right format to the tool. The generated input file is in the format that the tool accepts. As mentioned above, figure 15 is an example image showing how a physical model of the chip with the instances and their power profile may look like in the tool. Figure15. Example of a physical model of the chip with the instances and their power profile. 48 Here, the base represents the IC area and the small rectangle blocks of different sizes represent each of the instances present in the generated input file. The colour of the blocks represents the power number of that specific instance – as the colour changes from blue to red, the power consumed by the block rises. The bluer the block, the less the power consumption and as it moves towards the red colour, the power consumption increases. The first model we generated was incomplete. There was a lot of free space on the IC and the floorplan we had did not match the generated layout. Production of IC is very expensive and hence while designing, designers always take care so that maximum area of the die is covered. The size of the die is usually dependent on the optimum arrangement of the instances of the HMs. The free space in the IC layout indicated that the data sheet we received had some missing IPs and hence the model is not usable for further analysis. Despite the model being imperfect, as there were no errors or defects in the model, the thermal analysis tool allowed us to generate the CTM for this model. It is possible to generate two types of CTM using the thermal analysis tool – version 0 (v0) and version 1 (v1). For v0, we just need to input the physical model we generated using the tool from the input file. The tool takes care of other information like material used in the die, physical properties of the material, their density etc. For v1, we need to input this information separately. Since we did not have this information available to us, we decided to generate CTMv0. Figure 16 is an example image representing a CTMv0. Figure 16. Example of a CTMv0 for the example model represented in figure 14. 49 The CTM in figure 15 shows the power density in more detail and represents a thermal model for the IC. An accurate thermal model can be generated after further analysis of an accurately generated CTM. However, as our first model is incomplete, we decided not to move forward with further analysis. b) The Second Model: We communicated with the internal teams regarding missing HMs and other technology and density related information. We got an updated data sheet with missing HMs, their instances and coordinates. Company’s (A) internal team shared the floorplan of the SoC with us. Comparing the new data sheet and the floorplan, we noticed some IPs were still missing. After some investigation, we found out that the missing IPs are from silicon vendor and information of those IPs is only available with them. We communicated with them and collected the updated data sheet from them as well. By combining both data sheets, we finally were able to generate the input file with all the instances, their power numbers and coordinates. We fed the updated input file to the tool and generated the updated physical model for the SoC with the power density. This model has optimally used all the area of the die. However, when we proceeded with CTM generation, it failed. We investigated further and found there are multiple overlaps of IPs. When the IPs overlap with each other, the power of the overlapped areas cannot be determined. Hence, the CTM generation failed. Figure 17 is an example of a physical IC model with a few overlapped IPs. As the image is quite dark, the overlapping IPs are marked with red boxes for better understanding. Figure 17. Example of a physical model of a chip with overlapping IPs. 50 Like simulation, in real case also, in SoCs, the IPs can never overlap with each other. In some cases, the block boundaries of IPs can overlap – but their active areas can never be covering one another. This might cause severe malfunction due to electricity passing following the wrong path. This might make the SoC unreliable and may totally break the system. It is one of the basics of IC design and so, it is quite impossible for the SoC design architects to make such a mistake while designing the chip. On further investigation, the cause of this error was clear. Some of the IP blocks were rectilinear. But the thermal analysis tool can only accept rectangle blocks as IPs. Hence it is necessary to break the rectilinear IPs into rectangular blocks in the input file. We decided this would solve the issue. c) The Final Model: Breaking the rectilinear blocks was quite complicated. Only breaking them into rectangle was not enough. As we had the power profile for the whole block, breaking it into a small rectangle meant distributing the power among the smaller blocks properly. Error in power distribution might completely change the power density of the SoC and change the original hotspots. This will take the simulation result far from real- case scenario. We discussed within our team and decided to distribute the power based on the area of the smaller rectangle. We used the following formula to distribute the power: 𝑃௦௕ = ቀ ஺ೞ್ ஺಺ು ቁ ∗ 𝑃ூ௉ ……… (v) where, 𝑃௦௕ and 𝐴௦௕ are the power number and the area of the smaller block respectively. 𝑃ூ௉ and 𝐴ூ௉ represents the power and area of the whole IP that needs to be broken into smaller rectangular units. Once the rectilinear IPs were broken and power was distributed, we modified the input file by replacing the rectilinear IPs with the smaller rectangle blocks. Using the input file, we could generate the accurate power density model of the chip. Figure 18 represents an example of a complete model with rectilinear IPs. 51 Figure 18. Example of a physical model of an SoC with power density with rectilinear IPs. At this stage, we already have all the other files containing information related to the technology and material density of the chip. Hence, we can generate CTMv1, instead of v0. The example CTM is shown in figure 19. Figure 19. Example CTM of a physical model of an SoC with power density with rectilinear IPs. The latest CTM accurately represents the floorplan of the SoC we received from the company’s (A) internal team. It contains all the IPs (company-specific as well as silicon vendor-specific IPs) without any overlap and includes all technical details regarding materials, their properties and density. Thus, we can conclude that this CTM is an accurate prototype for demonstrating the actual functionality of the original SoC designed and developed by the company (A). We decided to move forward to the system level analysis of this model for generating an accurate thermal model, which is the main goal for the thesis. 52 6.2.3 Chip Package Level Thermal Analysis Now that we have the chip level thermal model ready based on the power profile of the designed IC, we need to know where the hotspots are created once the chip is in action under a certain situation. For this, the chip needs to be connected to a system. The chip/die is too small to directly connect to a system and hence it needs to be set up on a package. The package contains a PCB layer on top of which the die sits. On top of the die a heat sink is placed to dissipate the extra heat produced in the die when it is in working mode. The package is then placed in a system for which the chip has been originally designed. The system is then installed and used in the real world. By combining the generated CTM, package layout and boundary conditions, the thermal analysis tool can perform an FEA and provide a high-resolution thermal profile for the designed IC. The package layout file (ODB++ format) and the boundary conditions come from the silicon vendor. The package layout defines the specifications, internal structure, information about layers of the IC, PCB structure etc. However, it does not have information regarding the power map and floorplan of the chip. The boundary conditions define the ambient conditions at which the system would be working. Figure 20 is a representation of package level thermal analysis of the thermal analysis tool. Figure 20. Package level thermal analysis. The package level analysis of our CTM has been quite extensive as it needed multiple trials to generate an accurate thermal profile. Hence, the analyses of the final CTM and their benchmarking are discussed in the following chapter. 53 7 Chip Package Level Thermal Analysis & Discussions about Generated Thermal Profiles Usually, the package level analysis is done to provide the thermal profiling of the chip and based on that the temperature hotspots are found out. Since the IC for which we are trying to build the thermal model, is already designed and is the post-silicon phase. This means the IC is tested, verified and ready for mass production. So, we already have the thermal profile and hotspot determined for the IC which has been produced by the silicon vendor and tested and replicated inside the company (A). At this point, the question may arise – why are we redoing it with a new tool? The reason is, we are trying to build a model based on the ready chip so that the company (A) may use it later for new chips under-development. They want to develop an adjustable model for thermal profiling the SoCs at the early phases which would give a close estimation of final heat generation by the chip before it is ready for fabrication. The tool we are using promises to provide accurate results for such early-stage thermal profiling. Furthermore, it can generate a transient level thermal analysis, which can provide a realistic behavioural model for the designed IC. If by using this tool, we can replicate the results obtained from the post-silicon test results of the silicon vendor or the lab test results of the IC done by the company (A) itself, the thermal profiling generated by the tool can be verified as accurate. Then we may use this model to generate the thermal profile of the SoCs at a pre- silicon stage i.e. during the early-development stage of the chip. This will help the engineers in predicting the hotspots earlier enough to contemplate and take important decisions regarding the architecture of the SoC, the design, and properties of the heatsink etc. This will enable the company to save a lot of resources and expenses as the SoC development process will get more efficient. 7.1 First Iteration of Package Level Thermal Analysis We collected the package layout file and extracted the physical die from it using the thermal analysis tool. Next, to design the package, PCB and heatsink, we reached out to the company’s (A) internal team. They had tested the physical chip inside their lab by placing several temperature sensors on the chip. During the lab test, an ambient temperature of T in Celsius and heat transfer coefficient between the material of the heatsink and air around it as 100 W/(m2K) 54 were used as boundary conditions. The heat transfer coefficient is applied at the outer surface between the heatsink surface and the ambient air. Based on the information, we built the package level model using the package layout we receive, our CTM and a built-in heatsink provided by the tool. We applied the known ambient temperature as the constant temperature boundary condition, heat transfer coefficient between the heatsink surface and ambient air as the convection boundary condition and thermal conductivity of the heatsink material as heat flux boundary for CFD analysis. To run the FEA, we needed to define Maximum no. Of edges for meshing and Cutting layers for meshing. For our case, we decided after discussion with the tool vendor and company’s (A) experts to use Maximum no. Of edges for meshing = 500 and Cutting layers for meshing = 5. The tool’s simulator calculates the temperature distribution on the chip based on the geometry, dissipation pattern and material properties [42]. Simulators can solve the complex equations for heat dissipation using FEA which uses FEM [42]. As explained in chapter 4, the tool breaks down the structure into small elements using the boundary conditions and solves equations to find out the heat-flow in these small volumes. Then they provide a combined profile of the whole surface. Figure 21 shows the setup for the PCB, IC and the heatsink for running the static thermal analysis. The die is powered by the final CTMv1 we generated. Figure 21. Setup for PCB, IC and heatsink for static thermal analysis. After running the analysis, Tmax (maximum temperature on the chip’s thermal profile) is we got was almost 200% of the Tmax received from the lab test. The whole chip was completely red showing extremely high heat dissipation all over the die area, which is quite impractical. This is way off from the lab test results. We can clearly observe that the analysis has failed, and we need to dig deeper to find out the cause of this huge temperature gap between the thermal profiles received from the lab test and from the thermal analysis of our CTM. 55 7.2 Second Iteration of Package Level Thermal Analysis Since we already have perfected the CTM before moving to the analysis phase, we know the problems could not be in the CTM. There must be some miscommunication and missing data while designing the package. The PCB is generated through the package layout received from the silicon vendor. If there are some errors in that layout, it should affect IC’s physical test results as well. In the physical assembly, the heatsink is the only part that we have used from the custom-build models provided by the tool. Hence, we focused on perfecting the heatsink. We raised the issue with company's (A) internal team. They dug into their datasheet and found some discrepancies. The ambient temperature inside the lab is almost 35.3% less than the temperature inside the package around the IC. For this analysis, we should use the temperature inside the package as ambient temperature. Another misunderstanding was regarding the value used as heat transfer coefficient for convection. It should have been 100 kW/(m2K), not 100 W/(m2K). In practice, it is quite impractical to achieve a heat transfer coefficient of 100 kW/(m2K) with the air [appendix 1]. However, as per discussion with the silicon vendor and the company’s (A) internal team, this value is decided for the simulation purpose only. As we are not designing the entire cooling system with all the heatsink layers, fins, other active and passive cooling units for the simulation, and we use a very small area of the heat sink for the simulation, so we need to scale up the heat transfer coefficient enough to mimic the real physical conditions at which the chip will be working. As we can see in equation (ii) from chapter 2, which is, 𝑄  =  ℎ𝐴 (𝑇ଵ − 𝑇ଶ) ......... (ii) To ensure that the amount of heat dissipation (Q) for the simulation is exactly similar to the real physical chip, we need to adjust the other factors from the right-side of the equation. Since, the difference in temperature is fixed, if we reduce the surface area of the heatsink for simplification of design, we need to scale up the heat transfer coefficient (ℎ) considerably. Hence, though the value is too high to be in practice, we used ℎ = 100 kW/(m2K). We ran a new thermal analysis with the new and correct boundary conditions, and this resulted in a much more believable thermal profile. Comparing this thermal profile with the thermal profile received from the lab test, we notice the patterns of hotspots somewhat match each other. Both the profiles give similarly shaped high temperature region pointing to the hottest spots at 56 the same area. However, the Tmax in this case is 125% compared to the Tmax received from the lab test. This is still a lot more than the accepted ballpark value. Figure 22 shows an example of how the thermal profile of an IC should look like. Figure 22. An example of a thermal profile of an IC. In the middle, the darkest red oval area represents the hotspot. The further we move away from the hotspot, the heat dissipation reduces, and it is lowest in the greenish blue region. 7.3 Final Iteration of Package Level Thermal Analysis Since there was still mismatch in the Tmax, we restarted our communication with the internal team (A) and silicon vendor. As we had reached out to the silicon vendor after the first thermal analysis, they shared their thermal profile for this IC using their simulation tools. At this stage, the company’s (A) team had proceeded with their own system level analysis and the outcome was close to the silicon vendor’s thermal profile. Just one difference was, as they were performing system level analysis, they used the same ambient temperature as before (T). Using the same boundary conditions, silicon vendor has managed to achieve a Tmax of 91% compared to the lab test value. The internal team from the company (A) also performed their system level simulation and their Tmax, this time, is 90% (approximately) of their initial lab test result. The results were in the same ballpark area. Together we discussed further the steps we followed to build our package and how it could be perfected to reach the closest to the current Tmax. For system level analysis, the company’s internal team has followed the heatsink and PCB layering used by the silicon vendor. They have 57 used pure copper plates as heatsinks due to its high conductivity of heat. This allows the heat sink to absorb the extra heat produced on the die to dissipate faster and thus speeding the cooling process. They have also used multiple layering with different materials for designing the PCB. The thermal analysis tool we are using allows manual construction along with automatic building from input files. So, we collected the composition, layer information and material properties for the PCB and heatsink in detail from the company's internal team. With this information, we manually built the PCB and heatsink using the thermal analysis tool and assembled them with our die to run the analysis. Figure 23 shows the block diagram of layers in the whole assembly of an example IC package. The PCB is divided into two layers. Some dielectric material is used between the PCB layers. The lower PCB layer is sitting on top of solder balls which acts as the base of the package. At the middle of the upper PCB layer sits two layers of microvias separated by core via at the middle. The die powered by our CTMv1 sits on top of the second microvia. The heatsink is sitting on top of the chip. To minimize the effect of surrounding free space between the upper PCB layer and heatsink, some stiffener is attached to the PCB layer with stiffener adhesive. Figure 23. Block diagram of the layers for an example package design. The analysis ran for about 16 minutes for two cycles giving a maximum temperature of 89% of the initial lab test results, which is on the same ballpark region of the Tmax obtained by the silicon vendor and system level simulation by the company (A) which are 91% and 90% respectively. The shape of the high heat region of the thermal profile obtained after the run matches the previously discussed shapes in the thermal profiles obtained from the silicon vendor and the company (A), showing the highest hotspot at the same area. Thus, we could conclude, simulation results of the thermal analysis tool managed to replicate the results of the physical test and simulation result of the die at post-silicon stage quite well. 58 7.4 Results & Discussions Considering the maximum temperature obtained initially during the lab test of the designed IC, the three thermal profiles and their analysis results are accumulated and presented in Table 1. Table 1. Comparison of Thermal Analysis Results Silicon Vendor System level simulation by Company A Thermal Model Analysis of generated CTM Power source Actual chip Chip model Our final CTM Package & PCB model Own model Replicated silicon vendor's model Replicated silicon vendor's model Ambient Temp (°C) 35.3% more than T T 35.3% more than T Heat transfer coefficient (kW/(m2K)) 100 100 100 Tmax in Die (°C) 91% of reference 90% of reference 89% of reference For the system level analysis, the company’s (A) team used ambient temperature T as this is the usual temperature of the environment where the system will be placed. The package is situated inside the system, where the temperature is expected to be higher, about 35%-36% higher than T. For the power source, all used different sources based on their methods of simulation. However, theoretically, the power source is the designed IC for each method. We represented the IC with our CTM. As we can see from the table, the Tmax for all three thermal profiles are very close and is in the same ballpark. The difference is negligible. Even though the intensity varies a little, the hotspots are in the same region for all three thermal profiles. Hence, from the discussion, it can be concluded that our generated CTM and its thermal analysis using the thermal analysis tool works. The process we followed is verified and this modelling technique can be used for any future IC that will be designed in the company (A) or in general, in the industry. 59 7.5 Challenges This study has been exciting, and I got to learn a lot. The results are satisfactory, and the objectives of the research have been met. However, to reach this destination, there were obstacles needed to be overcome and challenges needed to be faced. I tried my best, with the help of my team, to overcome all the challenges as smoothly as possible. This is what has led me to an adequate thermal model. However, some of the most difficult challenges I needed to face throughout the thesis work are discussed here. 7.5.1 Power Profiles for IPs Having the power profiles generated for each IP based on a specific scenario is a long and tedious process. It is simpler to collect the power profiles wither by best-case scenario or the worst-case scenario. Decisions needed to be made regarding which power profile would be better for our study. Power profiles of the IPs at the best corner i.e. best-case scenario will give us a too optimistic power map of the IC. On the other hand, the same at the worst corner i.e. worst-case scenario will give a too pessimistic power map. We could get the power profile for two extreme corners only. Stuck at this situation, we needed to contemplate the outcome for both cases and weigh the pros and cons. Too optimistic result from simulation will create too much confidence regarding the IC design and when the physical chip will deviate from on the negative side, it will be already too late to take steps to rectify the design. On the contrary, a pessimistic result would push the IC developers to design as precisely as possible. Iin the end, the results from the physical IC will still deviate from the simulation results, but in a rather positive direction. Based on these assumptions, we decided to go ahead with the worst-corner power profile of the IPs. 7.5.2 Collecting Data Multiple teams inside the company (A) are responsible for the development of the IC. Each of these teams have their own datasheet regarding the power numbers of the IPs. We received datasheets with various power values at different points on time. It was difficult for us to choose which power number to consider. Furthermore, not all datasheets had the same HMs or instances. For some HMs, occurrence of few instances was missing. In short, sorting the datasheet and combining the required information in one single file needed quite some effort and time. 60 7.5.3 Communication Since we needed to communicate among the company’s (A) internal teams and with the silicon vendor to develop the accurate CTM and build the package and PCB model perfectly, it was difficult to get the right information at once. We had to reach the final CTM model and/or the final thermal profile through several iterations as each time some information will be either wrong or missing. This was frustrating at times. But, hopefully, now that we know what information is needed to generate the CTM and to perform thermal analysis, this communication will go much smoother next time. 7.6 Limitations Despite reaching an acceptable outcome, there are still some limitations of this study. These are as follows:  The study is based on worst-corner power profiles of the IPs. All IPs will not be consuming their maximum power at the same time during practical situations. Hence, the CTM we generated using this power profile is pessimistic.  During generating the CTM, we did not consider the temperature factor. From 25°C - 125°C, we considered the total power inside the chip to be constant. But in practice, the power varies with the varying temperature.  For this thesis, due to time and resources constraints, we only performed a static analysis of our CTM. A transient analysis would have presented a more detailed and accurate thermal profile and electrothermal results.  Our analysis is based on post-silicon data. To produce an accurate chip thermal model that could serve the engineers during the early development phases, the same analysis needs to be done with pre-silicon data. 7.7 Future Studies Studying thermal models is essential for SoC design engineers. This research is only the beginning point. There are a lot of angles with which this study could be further enriched. As the temperature factor has not been included in this research, this could be one scope for perfecting the generated thermal model. Transient level analysis of the CTM is also important. 61 The study can be continued, and a transient thermal model analysis can be done to generate a more accurate model. We considered an extremely high value for heat transfer coefficient (ℎ) between the heatsink material and the ambient air. However, in reality, as the area of the heatsink is much bigger, with a lower heat transfer coefficient, it is possible to dissipate the generated heat on the SoC fast enough. But while using the model at pre-silicon stage, considering the right value for ℎ would be difficult and this needs further research. Internal teams of the company (A) are trying to emulate the power profile of the IPs at a pre- silicon stage. Once the information is available, this same analysis can be done using the pre- silicon data. This will give the thermal profile of the chip at an early stage helping the engineers find out the potential hotspots. Based on that engineers would be able to design their IPs better or take useful decisions regarding the architecture, packaging or heatsink design for the IC. This field has a multiple range of usefulness and much scope for research and development. Hence, the scope of future studies possible in this area is quite difficult to listed down. 62 8 Conclusion To ensure reliability of the SoCs, knowing the thermal behaviour of the chip is crucial. The current practice is to collect the thermal profile from the silicon vendor once the chip is ready after placement and routing. But by then, it is already too late to make any design changes in the associated IPs inside the SoC. Designing the cooling system and meeting the initial requirements from the customer’s end also becomes difficult if excessive heat generation is found at this stage of the SoC development process. Thermal models generated during the later stages of SoC development process have some limitations too. The model is generated based on one case-scenarios. But the SoC might be used for an entirely different case in practice. Hence it is also important to get the thermal profile of the SoC for as many scenarios as possible. Only then can the reliability of the chip and the device be guaranteed. To address this issue, a discussion with the company’s (A) experts gave an idea of proposing a thermal model at early stages of the SoC development process. Based on this idea, the research goals and research plan have been formed. Throughout this work, I have tried to be as close as possible to the initial objectives and thus succeeded in building a working prototype of the thermal model of an existing chip designed by the company (A). During the research, IC design and development process is studied. Existing research papers on various thermal modelling techniques for ICs, their analysis, success, challenges, and limitations have been studied thoroughly. The gaps have been addressed and then the chip thermal model (CTM) has been created using an existing EDA tool picked from the market. The CTM has been modelled by replicating an existing IC which is developed by the company (A) and is in the post-silicon stage. The CTM is then analysed by using the same EDA tool after collecting the boundary conditions from company (A) and their silicon vendor. During the research, all relevant data was collected by communicating with the company’s internal team and their silicon vendor’s team. As the chip is already in the post-silicon stage, keeping the silicon vendors in the loop and referencing their model was important. This is because the physical die is now present and tested on a system level in the lab and the resulted thermal model from physical test is available. It was important to validate the designed thermal model by comparing it with the results obtained from the silicon vendor and the company (A). 63 The comparison of results reported the hotspots and maximum temperature on the die to be in the same region and ballpark respectively. The hotspots of the SoC for all three thermal profiles were at the same place. The thermal profile shapes also are very close to one another. The maximum temperature of the die for all three models is close differing by 1%, which is a negligible amount. So, it can be concluded that the thermal profile obtained from the thermal model developed during this research work has successfully replicated the thermal profiles obtained from the silicon vendor and the system-level simulation of the SoC at the company (A). There have been some limitations of this research. Due to the bounded timeline for this thesis and limited resources at the company (A), the generation of thermal model at a pre-silicon stage has not been possible. While generating the CTM, temperature variation around the SoC has not been considered. A transient analysis would give an even better thermal profile of the model. However, this was not possible due to the limitation of time and information. Furthermore, the CTM has been generated using the worst-case scenario power values of the IPs. In practice, all IPs will never be at their maximum power usage mode at the same time. This study is just the starting point towards developing an early thermal model for SoCs. The limitations of this study have opened new research areas for the future. The same study can be continued further by including the temperature factor into it. It can be taken further by performing a transient analysis and generating a better thermal profile through this. In future, once the power consumption values of each IP at pre-silicon stage is available, the same model can be used to generate a pre-silicon level thermal profile – which will help the engineers take design decisions early enough based on the simulated hotspots of the SoC. 64 References [1] Semiconductor device. Wikipedia. [online]. Available: https://en.wikipedia.org/wiki/Semiconductor_device. [Accessed: 23.05.2024]. [2] The semiconductor decade: A trillion-dollar industry. McKinsy & Company. [online]. Available: https://www.mckinsey.com/industries/semiconductors/our-insights/the- semiconductor-decade-a-trillion-dollar-industry. [Accessed: 23.05.2024]. [3] A Beginners Guide to Mobile Communication Infrastructure. Pocket Coders. [online]. Available: https://www.packetcoders.io/a-beginners-guide-to-mobile-wireless- communication-infrastructure/. [Accessed: 23.05.2024]. [4] Seda Ogrenci-Memik. Heat Management in Integrated Circuits: On-Chip and System- Level Monitoring and Cooling: On-Chip and System-Level Monitoring and Cooling. 1st ed. Vol. 2. Stevenage: The Institution of Engineering and Technology, 2016. Print. [5] Walker, Jearl, David Halliday, and Robert Resnick. Fundamentals of Physics. 8th ed. / Jearl Walker. Hoboken, NJ: Wiley, 2008. Print. [6] Giancoli, Douglas C. Physics for Scientists and Engineers with Modern Physics. 3. 4th ed. Upper Saddle River [NJ: Pearson Prentice Hall, 2008. Print. [7] Kothandaraman, C. P. Fundamentals of Heat and Mass Transfer. Rev. 3rd ed. New Delhi: New Age International P Ltd., Publishers, 2006. Print. [8] What is Boundary and Initial Conditions – Definition. Thermal Engineering. [online]. Available: https://www.thermal-engineering.org/what-is-boundary-and-initial- conditions-definition/. [Accessed: 03.06.2024]. [9] Collected from an industrial expert who has permitted me to use the image. [10] What is an Integrated Circuit (IC)? ANSYS Blog. ANSYS. [online]. Available: https://www.ansys.com/blog/what-is-an-integrated-circuit. [Accessed: 05.03.2024]. [11] S. Tyagi, "Moore's Law: A CMOS Scaling Perspective," 2007 14th International Symposium on the Physical and Failure Analysis of Integrated Circuits, Bangalore, India, 2007, pp. 10-15, doi: 10.1109/IPFA.2007.4378049. 65 [12] Integrated circuit. Wikipedia. [online]. Available: https://en.wikipedia.org/wiki/Integrated_circuit. [Accessed: 05.03.2024]. [13] System on a Chip: How Smaller, Faster Devices are Made. ANSYS Blog. ANSYS. [online]. Available: https://www.ansys.com/blog/what-is-system-on-a-chip. [Accessed: 05.03.2024]. [14] System on a chip. Wikipedia. [online]. Available: https://en.wikipedia.org/wiki/System_on_a_chip. [Accessed: 06.03.2024]. [15] W. Wolf, A. A. Jerraya and G. Martin, "Multiprocessor System-on-Chip (MPSoC) Technology," in IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 27, no. 10, pp. 1701-1713, Oct. 2008, doi: 10.1109/TCAD.2008.923415. [16] B. Ackland et al., "A single-chip, 1.6-billion, 16-b MAC/s multiprocessor DSP," in IEEE Journal of Solid-State Circuits, vol. 35, no. 3, pp. 412-424, March 2000, doi: 10.1109/4.826824. [17] Ben Abdallah, Abderazek. Advanced Multicore Systems-on-Chip: Architecture, on- Chip Network, Design. Singapore: Springer, 2017. Print. [18] Architecture of SoC. Geeks for Geeks. [online]. Available: https://www.geeksforgeeks.org/architecture-of-soc/. [Accessed: 04.06.2024]. [19] Sudeep Pasricha, Nikil Dutt. On-Chip Communication Architectures - System on Chip Interconnect. 1st ed. v.Volume-. San Diego: Elsevier, 2008. Print. [20] Chemistry. LibreTexts. [online]. Available: https://chem.libretexts.org/. [Accessed: 12.03.2024]. [21] Thermal conductivity and resistivity. Wikipedia. [online]. Available: https://en.wikipedia.org/wiki/Thermal_conductivity_and_resistivity. [Accessed: 12.03.2024]. [22] Why silicon still dominates the IC industry. Engineers Garage. [online]. Available: https://www.engineersgarage.com/ic-manufacturing-semiconductors-silicon- germanium-gallium-arsenide/. (Accessed: 12:03.2024]. 66 [23] Lienig, Jens, and Hans Bruemmer. Fundamentals of Electronic Systems Design. Cham: Springer, 2017. Print. [24] Garimella, S.V et al. “Thermal Challenges in Next-Generation Electronic Systems.” IEEE transactions on components and packaging technologies 31.4 (2008): 801–815. Web. [25] IC Thermal Analysis: Thermal Management for Integrated Circuits. Altium. [online]. Available: https://resources.altium.com/p/thermal-management-integrated-circuits. [Accessed: 14.03.2024]. [26] V. Lakshminarayanan and N. Sriraam, "The effect of temperature on the reliability of electronic components," 2014 IEEE International Conference on Electronics, Computing and Communication Technologies (CONECCT), Bangalore, India, 2014, pp. 1-6, doi: 10.1109/CONECCT.2014.6740182. [27] What is Electromigration? Synopsys. [online]. Available: https://www.synopsys.com/glossary/what-is- electromigration.html#:~:text=Definition,vacancies%27%20and%20%27deposits%27. [Accessed: 14.03.2024]. [28] Poncé, Samuel et al. “First-Principles Calculations of Charge Carrier Mobility and Conductivity in Bulk Semiconductors and Two-Dimensional Materials.” arXiv.org (2019): n. pag. Web. [29] Yeap, Kim Ho, and Jonathan Sayago, eds. Integrated Circuits/Microchips. London: IntechOpen, 2020. Print. [30] Iranfar, Arman et al. “TheSPoT: Thermal Stress-Aware Power and Temperature Management for Multiprocessor Systems-on-Chip.” IEEE transactions on computer- aided design of integrated circuits and systems 37.8 (2018): 1532–1545. Web. [31] Temperature gradient. Wikipedia. [online]. Available: https://en.wikipedia.org/wiki/Temperature_gradient#:~:text=The%20temperature%20s patial%20gradient%20is,%2C%20climatology%20and%20related%20fields). [Accessed: 14.03.2024]. 67 [32] Coskun, Ayse K et al. Energy-Efficient Variable-Flow Liquid Cooling in 3D Stacked Architectures. Berlin, ACM/IEEE Press. Print. [33] Chantem, Thidapat et al. “Enhancing Multicore Reliability through Wear Compensation in Online Assignment and Scheduling.” 2013 Design, Automation & Test in Europe Conference & Exhibition (DATE). NEW YORK: IEEE, 2013. 1373– 1378. Web. [34] McPherson, J.W. (2019). Heat Generation and Dissipation. In: Reliability Physics and Engineering. Springer, Cham. https://doi.org/10.1007/978-3-319-93683-3_18. [35] The Thermal Resistance vs. the Thermal Conductivity of Semiconductor Devices. Cadence. [online]. Available: https://resources.system- analysis.cadence.com/blog/msa2021-the-thermal-resistance-vs-the-thermal- conductivity-of-semiconductor-devices. [Accessed: 06.06.2024]. [36] Thermal management (electronics). Wikipedia. [online]. Available: https://en.wikipedia.org/wiki/Thermal_management_(electronics). [Accessed: 06.06.2024]. [37] Active vs passive cooling: Thermal management of electronic devices. Arrow. [online]. Available: https://www.arrow.com/en/research-and-events/articles/thermal- management-of-electronics-active-vs-passive-cooling. [Accessed: 06.06.2024]. [38] Electronics Cooling Methods for PCB Thermal Management. Cadence. [online]. Available: https://resources.system-analysis.cadence.com/blog/msa2022-electronics- cooling-methods-for-pcb-thermal-management. [Accessed: 06.06.2024]. [39] Cooling Electronic Systems. Circuit Cellar. [online]. Available: https://circuitcellar.com/research-design-hub/cooling-electronic- systems/#:~:text=Electronic%20components%20can%20be%20cooled%20by%20con duction%2C%20convection,the%20heat%20is%20transferred%20through%20direct% 20molecular%20collision. [Accessed: 06.06.2024]. [40] Semiconductor Lifetime: How Temperature Affects Mean Time to Failure. Jetcool. [online]. Available: https://jetcool.com/post/semiconductor-lifetime-how-temperature- affects-mean-time-to-failure-device-reliability/. [Accessed: 06.06.2024]. 68 [41] Does a 10°C Increase in Temperature Really Reduce the Life of Electronics by Half? Electronics Cooling. [online]. Available: https://www.electronics- cooling.com/2017/08/10c-increase-temperature-really-reduce-life-electronics-half/. [Accessed: 06.06.2024]. [42] Szekely, V, M Rencz, and B Courtois. “Tracing the Thermal Behavior of ICs.” IEEE design & test of computers 15.2 (1998): 14–21. Web. [43] Górecki, Krzysztof et al. “Compact Thermal Models of Semiconductor Devices – A Review.” International Journal of Electronics and Telecommunications 65.2 (2019): 151–158. Web. [44] Compact thermal modeling in electronics design. Electronics Cooling. [online]. Available: https://www.electronics-cooling.com/2007/05/compact-thermal-modeling- in-electronics-design/. [Accessed: 20.03.2024]. [45] Finite element method. Wikipedia. [online]. Available: https://en.wikipedia.org/wiki/Finite_element_method. [Accessed: 20.03.2024]. [46] The Finite Element Method (FEM). Multiphysics Cyclopedia. Comsol. [online]. Available: https://www.comsol.com/multiphysics/finite-element-method. [Accessed: 26.03.2024]. [47] Liu, G.R, and S. S Quek. Finite Element Method: A Practical Course. 1st ed. San Diego: Elsevier Science, 2003. Web. [48] Rao, H. S. Govinda. Finite Element Method vs. Classical Methods. 1st ed. New Delhi: New Age International P Ltd., Publishers, 2007. Print. [49] Curatelli, F, and G.M Bisio. “Characterization of the Thermal Behaviour in ICs.” Solid-state electronics 34.7 (1991): 751–760. Web. [50] Sercu, Jeannick, and Heidi Barnes. “Mesh Conforming Electro-Thermal Co-Analysis with Application to PCB Power Integrity.” 2017 IEEE International Symposium on Electromagnetic Compatibility & Signal/Power Integrity (EMCSI). IEEE, 2017. 585– 590. Web. [51] K. Srinivasan et al., "An early system-level thermal analysis methodology for advanced electronic subsystems," 2018 34th Thermal Measurement, Modeling & 69 Management Symposium (SEMI-THERM), San Jose, CA, USA, 2018, pp. 92-97, doi: 10.1109/SEMI-THERM.2018.8357358. [52] RedHawk-SC Electrothermal. ANSYS. [online]. Available: https://www.ansys.com/products/semiconductors/ansys-redhawk-sc-electrothermal. [Accessed: 17.04.2024]. [53] What is Finite Element Analysis (FEA)? ANSYS. [online]. Available: https://www.ansys.com/simulation-topics/what-is-finite-element-analysis. [Accessed: 17.04.2024]. 70 Nomenclature A area, m2 L length, m POW power, W Q heat transfer rate, W q heat flux, W/m2 T temperature, K t temperature, °C σ Stefan Boltzmann constant, 5.67*10-8 W/(m2K4) 3D Three dimensional AI Artificial Intelligence ASICs Application Specific Integrated Circuits CAD Computer aided design CFD Computational Fluid dynamics CPM Chip power model CPU Central Processing Unit CTM Chip Thermal Model DRAM Dynamic Random-Access Memories DSP Digital Signal Processing EDA Electronic Design Automation FDM Finite Difference Method FEA Finite element analysis FEM Finite Element Method GPU Graphics Processing Unit 71 HDL Hardware Description Language HM Hard macros IC Integrated Circuit IoT Internet of Things IP Intellectual Property ML Machine Learning MOSFET Metal-oxide semiconductor field-effect transistor PCB Printed Circuit Board Pkg Package R&D Research and Development RAM Random Access Memory RF Radio frequency ROM Read-only Memory RTL Register transfer level SI Signal integrity SoC System-on-Chip SRAM Static Random-Access Memories Tmax Maximum temperature, K TSMC Taiwan Semiconductor Manufacturing Company UART Universal Asynchronous Receiver/Transmitter UMC United Microelectronics Corporation USB Universal Serial Bus VLIW Very long instruction word VLSI Very Large-scale Integration 72 Appendices Appendix 1 Approximate values of heat transfer coefficient Table 2. Approximate values of heat transfer coefficient Conditions of Heat Transfer W/(m2K) Gases in free convection 5 – 37 Water in free convection 100 – 1200 Oil under free convection 50 – 350 Gas flow in tubes and between tubes 10 – 350 Water flowing in tubes 500 – 1200 Oil flowing in tubes 300 – 1700 Molten metals flowing in tubes 2000 – 45000 Water nucleated boiling 2000 – 45000 Water film boiling 100 – 300 Film-type condensation of water vapor 4000 – 17000 Dropsize condensation of water vapor 30000 – 140000 Condensation of organic liquid 500 – 2300 source: https://www.thermopedia.com/content/5263/heat_t_c_t1.gif