# A GLOBAL MULTIPLE-VALUED CLOCK APPROACH FOR HIGH-PERFORMANCE MULTI-PHASE CLOCK INTEGRATED CIRCUITS | Approved by: | |------------------------------| | tchell A Thornton, Professor | | Dworak, Assistant Professor | | r. Sukumaran Nair, Professor | # A GLOBAL MULTIPLE-VALUED CLOCK APPROACH FOR HIGH-PERFORMANCE MULTI-PHASE CLOCK INTEGRATED CIRCUITS A Thesis Presented to the Graduate Faculty of Bobby B. Lyle School of Engineering Southern Methodist University in Partial Fulfillment of the Requirements for the degree of Master of Science in Computer Engineering with a Major in Computer Engineering by Rohit Pathiyam Menon (Bachelor of Technology, Amrita Vishwa Vidyapeetham University, Tamil Nadu, INDIA) December 17<sup>th</sup>, 2011 B.Tech, Amrita Vishwa Vidyapeetham University, 2008 Menon, Rohit Pathiyam A Global Multi-Valued Clock Approach For High-Performance Multi-Phase Clock Integrated Circuits Advisor: Professor Mitchell A Thornton Master of Science conferred December 17<sup>th</sup>, 2011 Thesis completed November 20<sup>th</sup>, 2011 In the today's high performance Integrated Circuit (IC) design world, millions of transistors are present on the IC chip to perform complex computing at higher clock frequencies. The disproportional increase in clock uncertainties due to higher clock frequencies has made designing clock distribution networks in high performance integrated circuits increasingly difficult. Hence, it is sometimes the case that the high frequency external global clock with frequency f is split into N periodic multi-phase non-overlapping clock signals with frequency f/N. These N periodic multi-phase non-overlapping clock signals propagate on several independent clock interconnection lines and are distributed to a large number of disjoint sets of load circuits scattered on the entire IC chip with their own clock distribution network. The load circuits present on high performance IC chips comprise of combinational logic elements and levelsensitive latches as clocked storage elements. Though designs with multiple phase clock distribution networks have shown enhanced throughput and performance, they also exhibit many drawbacks. The presence of different clock distribution networks requires the usage of significant amount of metal interconnection resources which becomes a bottleneck for future high performance microprocessors in reducing the size or area of the IC chip. Maintaining low skew among the constituent clock distribution networks adds another complex design constraint. iii Also, a significant amount of total power consumed by the microprocessor is dissipated in the clock distribution networks. This research work presents a new concept in overcoming the above mentioned problem by having a single clock distribution network instead of several independent clock distribution networks for multiple phase non-overlapping clock signals. A multi-valued (MV) clock signal propagates on this single clock distribution network to all the load circuits in the IC chip. The presence of a MV clock signal requires the design of a new set of level-sensitive latches as state holding elements which are designed to be transparent for a specific portion of the global MV clock signal. This new design can reduce the usage of metal interconnection resources to a great extent and hence the area constraint for the future designs of high performance microprocessors. In addition to that, this new approach has also shown an increase in design clock frequency. Certain design comparison results have been shown in this research work supporting this new idea. # TABLE OF CONTENTS | LIST OF TABLES. | vii | |--------------------------------------------------------------|------| | LIST OF FIGURES. | viii | | ACKNOWLEDGEMENTS | X | | CHAPTER | | | 1. INTRODUCTION. | 1 | | 1.1 Clocking in Synchronous Systems | 2 | | 1.2 Clock Distribution in Modern Digital Integrated Circuits | 4 | | 1.3 Thesis Outline | 5 | | 2. BACKGROUND. | 7 | | 2.1 Clock Distribution Networks | 7 | | 2.1.1 Buffered Clock Distribution Tree Networks | 8 | | 2.2.2 Symmetric H-Tree Clock Distribution Networks | 9 | | 2.2.3 Mesh Clock Distribution Networks | 10 | | 2.2 Multiple Phase Clock Distribution. | 11 | | 2.3 Multiple Valued Logic | 15 | | 3. RESEARCH APPROACH. | 17 | | 3.1 Standard Cell IC Implementation | 17 | | 3.1.1 Modified Literal Selection Gate | 19 | | 3.1.2 New Level-Sensitive Latches | 23 | | 3.1.3 Multi-Valued Logic Clock Generator | 30 | | 3.2 Binary Logic Implementation | 31 | |-------------------------------------------------------------------|----| | 4. EXPERIMENTAL RESULTS. | 34 | | 4.1 Functional Simulation Results for New Level-Sensitive Latches | 34 | | 4.2 Functional Simulation Results for MVL Clock Generator | 38 | | 4.3 Comparison of Results for Binary and MVL Designs | 39 | | 5. CONCLUSIONS AND FUTURE RESEARCH. | 42 | | REFERENCES | 45 | # LIST OF TABLES | Table | | Page | |-------|---------------------------------------------------------|------| | 1 | D-latch characteristic table. | 14 | | 2 | Truth Table for $J_i$ | 20 | | 3 | Truth table for D-latch-0. | 24 | | 4 | Truth table for D-latch-1 | 26 | | 5 | Truth table for D-latch-2 | 28 | | 6 | Truth table for D-latch-3 | 29 | | 7 | Comparison of Binary and MVL Designs in FPGA technology | 40 | # LIST OF FIGURES | Figure | | Page | |--------|---------------------------------------------------------------------------|------| | 1 | A Finite State Machine (FSM) Design. | 3 | | 2 | Buffered Clock Distribution Tree Network. | 9 | | 3 | H-Tree Clock Distribution Network. | 10 | | 4 | Mesh Clock Distribution Network. | 11 | | 5 | Multiple phase clock distribution system. | 12 | | 6 | Clock signal waveforms. | 13 | | 7 | D-latch logic symbol. | 14 | | 8 | High level block diagram of an IC chip with MVL clock signal distribution | 10 | | | network | 18 | | 9 | $\Phi_{clk}$ MVL Clock signal waveform. | 18 | | 10 | Literal Selection Gate $J_i$ | 19 | | 11 | Transistor level structure of $J_i$ | 22 | | 12 | D-latch logic circuit design. | 23 | | 13 | D-latch-0 logic symbol | 24 | | 14 | HSPICE Simulation waveforms for D-latch-0. | 25 | | 15 | D-latch-1 logic symbol. | 26 | | 16 | HSPICE Simulation waveforms for D-latch-1 | 27 | |----|-------------------------------------------------------------------------------|----| | 17 | D-latch-2 logic symbol. | 27 | | 18 | HSPICE Simulation waveforms for D-latch-2. | 28 | | 19 | D-latch-3 logic symbol. | 29 | | 20 | HSPICE Simulation waveforms for D-latch-3. | 30 | | 21 | MVL Clock Generator | 31 | | 22 | Clock Phase Sensitive Latches for a Binary-encoded MV Clock Signal | 33 | | 23 | Functional Simulations of new level-sensitive latches using SystemVerilog HDL | 37 | | 24 | Functional Simulations of MVL Clock Generator | 38 | ## **ACKNOWLEDGEMENTS** First and foremost, I would like to express my sincere gratitude to my adviser, Dr. Mitch Thornton, for his excellent guidance, motivation and continuous support in making my research work a wonderful experience. I consider myself fortunate to get his expertise for the successful completion of the research work and writing the thesis. Also, I would like to thank the members of the Hardware Security Group in SMU: Dr. Jennifer Dworak, Dr. Theodore Manikas and fellow students, for their insightful comments and suggestions which helped me in clearing many hurdles throughout the research. My sincere thanks to Dr. Sukumaran Nair, for readily agreeing to be in the thesis committee and for his valuable advices throughout my Masters degree program in SMU. Last but not least, I would like to thank my parents for their unwavering support and love which has always been my strength. Dedicated to my dear parents #### Chapter 1 #### INTRODUCTION Clocking is an essential concept in the design of synchronous digital systems [1]. Utmost care has to be taken for clocking while designing a digital system else it has proven to be very costly in the digital world. In the early days, clocking was given less significance as the older computers or digital circuit designs had high tolerances to the variations in the clock signal and had less timing requirements. The older computer designs were mainly designed using discrete components or Large Scale Integration (LSI) chips. These designs operated at frequencies in the range of a few Megahertz. The first electronic computer, Electrical Numerical Integrator and Calculator (ENIAC), operated at a frequency of 18 Kilohertz [3]. At such low level of integration, it was not difficult to set the clock to a desired frequency. To ensure that the clock arrived at all the circuit components at approximately the same time, the length of wires that supplied the clock signals or the delay elements on the circuit were easily adjusted. As the design technology gradually evolved from LSI to Very Large Scale Integration (VLSI), the ability to achieve the desired clock frequency became a difficult task for designers. For VLSI circuit designs, the clocks signals are generated and distributed internally on the Integrated Circuit (IC) chip. Also, the clocked storage elements (Latches or Flip Flops), present on IC chip had to absorb much of the clock signal variations. Hence, for high speed VLSI circuit designs, much emphasis has been given to the importance of clocking since the desired clock frequency also started rising at an enormous rate, doubling every three years [18]. However, the uncertainties in a clock signal have not been scaling proportionally with the rate of clock frequency increase. This disproportional increase in clock uncertainties due to higher clock frequencies has made designing clock distributions in high-performance microprocessors increasingly difficult. Faster clocked storage elements having the ability to absorb the clock skew on the high frequency clock signals are needed to be designed for enhanced circuit performance. Hence, new methods of designing the digital integrated circuits are to be invented. ## 1.1 Clocking in Synchronous Systems In a synchronous digital system, a clock signal is used to define a time reference for the movement of data within the system [17]. A clock signal is vital to the proper operation of a synchronous system as it is loaded with greatest fanout, travel over longest distances and operate at highest speeds of any other signals, either control or data, within the entire system. As a result, a clock distribution system alone can consume up to 30-40% of the entire power of the IC chip [7,5]. Since the movement of data in a synchronous system is governed by a clock signal, the clock waveforms need to be very precise and sharp. With the evolution of design technology scaling, the dimensions of global interconnect lines have become less and thus highly resistive which affects the waveforms of the clock signal lines present in the IC chip. Hence, the performance of a synchronous system vastly depends on the clock signal distributed on the entire chip. A typical synchronous digital system is comprised of clocked storage elements (Latches and Flipflops) and combinational logic which together make up a Finite State Machine (FSM) as Figure 1. A Finite State Machine (FSM) Design shown in Figure 1. The functional requirements of the synchronous digital system are met by the logic gates present in the combination logic circuit whereas the timing requirements are met by the proper design of the clock distribution network and clocked storage elements which affect the performance. The delay components generally present in a synchronous system are memory storage elements, logic elements, and the clock distribution network. These delay components play a vital role in achieving the maximum levels of performance and reliability of a synchronous system. The 'Y' outputs of FSM in Figure 1 are determined by both the 'X' inputs and present state $(S_n)$ . Also, the next state $S_{n+1}$ is a function of data input signal as well as the present state $(S_n)$ . Hence these output signals can be represented as follows: $$S_{n+1} = F(S_n, X)$$ ; $Y = F(X, S_n)$ The clock signal and the functionality of the clocked storage elements determine the change from $S_n$ to $S_{n+1}$ . Thus, it is clear that how much a clock signal can affect the performance of a synchronous system. Next section describes about the clock distribution methodologies adopted in modern microprocessor designs. #### 1.2 Clock Distribution in Modern Digital Integrated Circuits The design of clock distribution networks has great significance in synchronous digital integrated circuits. It poses a great challenge for the circuit designers to develop an efficient clock distribution network design that can distribute a tightly controlled clock signal to all the synchronous clocked storage elements present on the IC chip. A typical clock distribution network can greatly affect the speed, physical die area and the power dissipation in a digital integrated circuit. Both design and structural topologies of the clock distribution network have to be considered while developing a clock distribution system. Various clock distribution methods have been developed which include buffered clock distribution trees, symmetric H-Trees and other compensation techniques in controlling clock skew by minimizing the impedances and capacitive loads between clock signal paths. With the evolution of design technology for integrated circuits along with the increasing clock frequency rate, many modern high performance microprocessors are now designed with multi-phase shifted clock signals. The high frequency external global clock input to the IC chip is phase shifted to generate *N* periodic multi-phase non-overlapping clock signals. Separate independent clock distribution networks are used for each of the *N* multi-phase clock signals propagating on the IC chip. The clocked storage elements used in designs with periodic multiphase non-overlapping clock signals are level-sensitive latches. The movement of data between the clocked storage elements in such a digital system is controlled by the active phase of the clock and hence the clock phase controls the transfer of information. The multiple phase shifted non-overlapping clock signals so generated are distributed to large number of disjoint sets of sub-circuits scattered over the entire IC chip with the help of different stages of buffer circuits. This design methodology has proven to show enhanced throughput and performance but has got its own drawbacks. The presence of different clock distribution network for these multiple clock signals requires a significant amount of metal interconnection resources which affects the chip cost and area constraint to a great extent. Since one of the main objectives of current design methodologies is to reduce the size of the chip, the requirement of such significant amount of metal interconnection resources multi-phase clock designs could be a hindrance. Hence, new methodologies for effective clock distribution networks with lesser metal interconnections and hence lesser area occupied in digital integrated circuits need to be invented which is the main focus of this research work. More about the multi-phase clock distribution design in the modern high performance microprocessors will be described in Chapter 2. #### 1.3 Thesis Outline The research work presented here on Multiple Valued Logic (MVL) clock distribution network to address the drawbacks mentioned in the previous section which helps in reducing the metal interconnection resources required for multiple phase clock distribution networks in high performance designs have been organized into multiple chapters. Chapter 2 gives an overall idea about how multi-phase non-overlapping clock signals are propagated in present high performance designs and its drawbacks. Our new approach with MVL concepts in overcoming the problem is summarized and its advantages have been described in detail in Chapter 3. Also, several experimental results comparing the new and existing approaches with Finite State Machines have been discussed in Chapter 4. Finally, Chapter 5 concludes our work and the scope for future work has been discussed. #### Chapter 2 #### **BACKGROUND** #### 2.1 Clock Distribution Networks Clock distribution networks play a vital role as the clock signals are used as the time reference for all the temporal operations in a synchronous digital system [2,8]. The clock signal as time reference ensures that correct data signals are made available for all the desired computations in the circuit. Since there could be a large number of data signals coming from different parts of the integrated circuit, it is imperative that a tightly controlled clock signal within the temporal limits need to be propagated to all the synchronous registers present on the IC chip for proper capture of the data. It is challenging for the designers to distribute such a tightly controlled clock signal on the entire IC chip as there are various factors that affect the clock signal distribution. As with the evolution of design technology in reducing the chip dimensions, the number of interconnects used for the clock distribution gained significant importance. A proportional scaling down of the metal interconnects along with the reduction of transistor feature sizes operating at high frequencies was not an easy task for the designers. The reduction in metal interconnection widths on the IC chip resulted in increased impedances which in turn created a substantial delay on the signals propagating on the interconnects. This interconnect delay is of major concern as it could lead to incorrect data capture by the synchronous registers driven by the signals affected by the interconnect delay. The mismatch in the arrival time of the clock signals at two different synchronous registers due to either the interconnect delay or nonequivalent interconnect lengths is called the *Clock Skew*. Also, the mismatch in the arrival time of the clock signals at two different synchronous registers due to different kinds of noises caused by several parallel lines of interconnect lines distributing these clock signals is called the *Clock Jitter*. The clock skew and clock jitter can be minimized by designing an efficient clock distribution network. There are several methods developed in designing an efficient clock distribution network in improving the performance of a digital system. Some of the major clock distribution networks developed are Buffered clock distribution tree networks, Symmetric H-Tree clock distribution networks and Mesh clock distribution networks. #### 2.1.1 Buffered Clock Distribution Tree Networks A Buffered clock distribution tree network, as shown in Figure 2, design consists of a clock source that generates the global clock signal for the VLSI circuit. The global clock signal is buffered at various points on the metal interconnects to provide the necessary amplification. The clock source generating the global clock is referred as the *root* of the tree. The various paths driving registers present on the chip are referred as *branches* and the registers as *leaves*. The buffers present on each branch of this tree network can cause additional delays in the clock signal path which is one of its drawbacks. Another drawback for the Buffered clock distribution tree is that the clock signal transition at each buffer input and output results in dynamic power loss which affects the performance of the circuit to a great extent. Figure 2. Buffered Clock Distribution Tree Network ## 2.1.2 Symmetric H-Tree Clock Distribution Networks The Symmetric H-Tree clock distribution networks, as shown in Figure 3, adopts a planar H-Tree symmetric structure for the metal interconnects and buffers such that there is near zero clock skew for the clock signals arriving at the registers of each clock path from the global clock signal source. The global clock driver connected to the center of the main H-Tree structure drives the clock signals to the four corners of the structure. The clock signals at these four corners act as the input to the next level of H-Tree hierarchy, eventually driving the registers. Each next level of H-Tree hierarchy has its width scaled down to the ratio 1:3. In this type of clock distribution network, the delay on each clock path from global clock source to a clocked register is minimal but the variations in process parameters affecting the metal interconnect impedances can still create some delay. Also, there is significant thermal power loss in H-Tree clock distribution network interconnects which affects the performance of the circuit. Figure 3. H-Tree Clock Distribution Network #### 2.1.3 Mesh Clock Distribution Networks The Mesh clock distribution network, as shown in Figure 4, is similar to the Buffered clock distribution tree network except that there are additional buffers present along the interconnect paths to reduce the interconnect resistance. The presence of these additional buffers also helps in better amplification of the clock signals and circuit reliability. Again, Mesh clock distribution network has the same drawbacks as that of Buffered clock distribution tree since the additional buffers result in more delay and power loss affecting the circuit performance. Figure 4. Mesh Clock Distribution Network ## 2.2 Multiple Phase Clock Distribution High performance integrated circuit designs [24] today including Application Specific Integrated Circuits (ASIC) employ multiple phase clock systems [21] in clocking all the sub circuits present in the IC chip. The sub circuits present in the chip comprise of combinational logic elements and sequential logic storage elements (level-sensitive latches). A multiple phase clock system design is essential to clock these sub circuit logic elements in order to increase the performance and efficiency of circuit designs. Several multiple clock lines with periodic clock pulses propagating on separate independent metal interconnection lines, phase relative to one another, are distributed over the entire chip to all the sub circuits at various locations. A typical multiple phase shifted clock system design is shown in Figure 5. Figure 5. Multiple phase clock distribution system Many prior designs use two-phase clocking schemes for master-slave latch based designs [6,9,10,21,23] but the clock skew occurrence between the master latch clock rising/falling edge and the corresponding slave latch clock falling/rising edge negatively affected the performance. Hence, most of the modern high performance circuit designs are implemented with multiple clock phases distributed to disjoint sets of sub circuits. In Figure 5, a Phase Locked Loop (PLL) [11,22] receives the binary external clock input to generate a high frequency signal which is then fed to a phase generation circuit. The phase generation circuit basically comprises of a counter with a combinational logic implemented to combine the outputs of the counter in several ways for the generation of N multiple phase shifted signals. One of the phase shifted clock signal generated by the phase generation circuit is fed back to the PLL. The feedback signal to PLL is to ensure that the external clock input and the generated clock phase signals are synchronized for minimal clock skew in the clock system design. Also, the N multiple phase shifted clock signals generated are represented by $\Phi_0$ , $\Phi_1$ ,..., $\Phi_N$ in Figure 5. The research work presented here is based on a quaternary logic system, defined in next section 2.3, and hence we assume N=4 for efficient comparison with the proposed idea. The clock signal waveforms for the external clock input and the four phase shifted clock signals by $\Phi_0$ , $\Phi_1$ , $\Phi_2$ and $\Phi_3$ are shown in Figure 6. Figure 6. Clock signal waveforms The subcircuits present on the chip, shown in Figure 5, have sequential logic storage elements as level-sensitive latches or D-latches [13]. A level-sensitive latch or a D-latch is a data storage element which has a data input signal (D), a clock/enable input signal (EN) and output signals (Q and Q'). When EN input has a logic-1, the output Q reflects the logic level present on input D and when EN input has a logic-0, the previous D input value is latched at the output Q for computational logic operations in the design. The logic symbol and the characteristic table for a binary D-latch is shown in Figure 7 and Table 1 respectively. Figure 7. D-latch logic symbol Table 1. D-latch characteristic table | EN/CLK | D | Q | Q' | |--------|---|---------------------|--------| | 0 | X | $Q_{\mathrm{prev}}$ | Q'prev | | 1 | 0 | 0 | 1 | | 1 | 1 | 1 | 0 | Each of the four phase shifted clock signals $\Phi_0$ , $\Phi_1$ , $\Phi_2$ and $\Phi_3$ as shown in Figure 6 are distributed to disjoint sets of sub circuits shown in Figure 5 with their own independent clock distribution network (CDT<sub>0</sub>, CDT<sub>1</sub>, CDT<sub>2</sub> and CDT<sub>3</sub>). The design of clock distribution network for each of the non-overlapping phase shifted clock signals could be of any type including the designs explained earlier in this Chapter such as Buffered clock distribution tree networks, Symmetric H-Tree clock distribution networks and Mesh clock distribution networks. The presence of separate clock distribution network designs for each of the non-overlapping phase shifted clock signals implies significant amount of metal interconnection resource usage which results in increased area occupied on the chip and hence the cost of the chip. This problem is the focus of our research work by having a multiple valued clock signal distributed over the entire chip to all the disjoint sets of sub circuits on a single clock distribution network instead of several independent clock distribution networks which will be described in detail in Chapter 3. ## 2.3 Multiple Valued Logic (MVL) Multiple Valued Logic (MVL) can be defined as a non-binary logic system which can represent more than two logic states, logic 0 and logic 1. Many applications of MVL have proved it to be a better solution for problems in binary logic system. For example, the complexity of a multiple output Boolean function can be simplified by having a single output multiple valued function which can be realized by representing the output with a MVL variable. The concept of MVL has been the area of research for many years in the Integrated Circuit (IC) Design world [12, 14, 15]. With the evolution of IC design technology, the present VLSI circuits are faced with the limitations of having the number of external pin connections for the integrated circuit (Pin- Out problem) and the number of metal interconnections inside the IC chip as they affect the size and cost of IC chip significantly. Circuits designed with MVL could be a potential solution to these problems in the existing design technology if the signals can assume multiple logic values rather than binary values. However, the MVL circuits need to be realized in such a way that they are compatible with today's binary VLSI circuits which poses a great challenge for the circuit designers. The research work presented here focuses on the application of MVL on the clock signal distribution network in the VLSI circuits which could reduce the number of metal interconnections compared to the high performance design circuits with separate clock distribution network for each of the phase shifted clock signals. The presence of a MV clock signal acts as a replacement for N periodic multi-phase clock signals propagated in the integrated circuit. The single clock distribution network containing the MV clock signal is propagated to all the clocked storage elements present in the high performance IC chip. For the purpose of the research work, we have focused on a quaternary logic system for the new clock distribution network. A quaternary logic system is a four-valued logic system which means that a signal can assume any of four valid logic values. The four logic values for a quaternary logic system are logic-0, logic-1, logic-2 and logic-3. The binary level-sensitive latches are augmented to contain a modified literal selection gate, described in Chapter 3, inserted in-line between MV clock signal input and the latch gate. There are four different modified literal selection gates where each one of them produces a logic-1 corresponding to one of the four clock phase domains and a logic-0 otherwise. #### Chapter 3 #### RESEARCH APPROACH #### 3.1 Standard Cell IC Implementation Our research work introduces the idea of having a single clock distribution network for a MV clock signal rather than separate clock distribution networks in the existing approach for each non-overlapping phase shifted clock signal as explained in Chapter 2. The single clock distribution network distributes the multi-valued logic (MVL) clock signal to all disjoint subsets of circuits, as shown in Figure 8. The MVL clock signal is denoted as $\Phi_{clk}$ in Figure 8 which is generated from a MVL clock generator. The MVL clock generator is an analog circuit generating a MVL signal with different voltage levels corresponding to different logic levels and hence the $\Phi_{clk}$ can be represented by a waveform as shown in Figure 9. Each subinterval in the $\Phi_{clk}$ period in the waveform shown in Figure 9 represents one of the four logic levels (logic-0, logic-1, logic-2 and logic-3) in a quaternary logic system. Also, the level-sensitive latches or D-latches present inside the subcircuits for the existing approach, shown in Figure 5, are designed for binary logic system which means the D-latch shows its transparent nature or latches the previous input depending on the binary values, logic-1 or logic-0, for the clock/enable input signal. However, since the clock/enable signal in the new approach shown in Figure 8 is a MVL signal, a change in latch design is essential for it to be compatible with the new MVL clock signal ( $\Phi_{clk}$ ). The design change for the level-sensitive latches is implemented by inserting a modified Figure 8. High level block diagram of an IC chip with MVL clock signal distribution network Figure 9. $\Phi_{clk}\,MVL$ Clock signal waveform literal selection gate in series with the latch gate or enable input, which is explained in the following section. ## 3.1.1 Modified Literal Selection Gate $(J_i)$ A Literal Selection Gate is a unary quaternary logic gate, denoted by $J_i$ , designed with Field Effect Transistors (FET) [2]. The symbol i denotes the desired logic level for which the output of $J_i$ gate has a non-zero value. For a quaternary logic implementation, i can take values 0, 1, 2 and 3 representing the corresponding four logic levels. The $J_i$ logic symbols for four different logic states are shown in Figure 10. Figure 10. Literal Selection Gate, $J_i$ Previous work defines the non-zero output of a quaternary $J_i$ gate to be a logic 3 [19]. In our work, we have modified the transistor level structure of $J_i$ such that the non-zero output of $J_i$ is a standard binary logic 1 which allows for compatibility with existing multi-phase clock domain binary circuits. Hence, the modified literal selection gate, $J_i$ is defined as shown in the truth table in Table 2. Table 2. Truth table for $J_i$ | OUT | $J_0$ | $J_1$ | $J_2$ | $J_3$ | |-----|-------|-------|-------|-------| | 0 | 1 | 0 | 0 | 0 | | 1 | 0 | 1 | 0 | 0 | | 2 | 0 | 0 | 1 | 0 | | 3 | 0 | 0 | 0 | 1 | The fundamental building blocks of the transistor level designs are Field Effect Transistors (P-Channel Depletion and N-Channel Depletion) since they are highly reliable and inexpensive. Each Field Effect Transistor can assume different threshold voltage levels (V<sub>th</sub>) based on the design requirement which implies different doping levels for each transistor during their fabrication. The transistors P1, P2, N1 and N2 present in all the transistor level designs of modified literal selection gates shown in Figure 11 restricts the non-zero output value of the gate to be a logic 1. V0, V1, V2 and V3 represent the four different voltage levels with values 0Volts, 1.1Volts, 2.2 Volts and 3.3Volts respectively. The threshold voltages of the transistors are determined such that the structure shows the desired quaternary operation. The circuitry with P1, P2, N1 and N2 transistors essentially represents a buffer circuit. a) Literal Selection Gate, $J_0$ b) Literal Selection Gate, $J_1$ # c) Literal Selection Gate, $J_2$ d) Literal Selection Gate, $J_3$ Figure 11. Transistor level structure of $J_i$ #### 3.1.2 New Level-Sensitive Latches A typical CMOS voltage-mode D-latch circuit can be implemented in a fashion [16] as shown in Figure 12. The Data input signal (D) is fed as input to a transmission gate controlled by the clock/enable signal (EN). The EN input serves as the latch's input and is connected to the output of the modified literal selection gate. The output of the transmission gate is connected to a latch comprised of two inverters where the topmost inverter serves as a *keeper logic* circuit. Figure 12. D-latch logic circuit design When EN = 1, the output Q has the value present on D input and when EN=0, the previous D input value gets latched at the output Q using the *keeper logic* circuit, shown in Figure 12. Since the EN signal is a MVL signal in the new approach, we introduce four new D-latches, namely D-latch-0, D-latch-1, D-latch-2 and D-latch-3, which are discussed in detail in the next section. Depending on the particular clock distribution tree domain, one of these four new latches is used. Each new latch is implemented by connecting the appropriate modified literal selection gate in series with the binary D-latch gate inputs. #### D-latch-0 The D-latch-0 logic circuit shows its transparent behavior when the EN input signal is at logic-0 and the previous D input value gets latched for rest of the logic values, logic-1, logic-2 and logic-3, on the EN input signal. To achieve this behavior, we connect a Literal Selection Gate, $J_0$ , with its output connected to the gate input of the D-latch. We know that $J_0$ outputs logic-1 only when the input is logic-0. Hence, the output of $J_0$ is a logic-1 or logic-0 depending on the EN input value which enables the latch to show its binary behavior. The logic symbol and truth table for D-latch-0 are shown in Figure 13 and Table 3 respectively. Figure 13. D-latch-0 logic symbol Table 3. Truth table for D-latch-0 | EN/CLK | D | Q | Q' | |--------|---|-------------------|--------| | 0 | 0 | 0 | 1 | | 0 | 1 | 1 | 0 | | 1 | X | Q <sub>prev</sub> | Q'prev | | 2 | X | Q <sub>prev</sub> | Q'prev | | 3 | X | Q <sub>prev</sub> | Q'prev | Transient analysis for the D-latch-0 logic circuit was performed using the HSPICE AVANWAVES visualization tool and is shown in Figure 14. Figure 14. HSPICE Simulation waveforms for D-latch-0 ## D-latch-1 The D-latch-1 logic circuit shows its transparent behavior when the EN input signal is at logic-1 and the previous D input value gets latched for rest of the logic levels, logic-0, logic-2 and logic-3, on the EN input signal. Similar to D-latch-0, the logic design for D-latch-1 is implemented with the output of Literal Selection Gate, $J_I$ , connected to the gate input of the latch. The logic symbol and truth table for D-latch-1 are shown in Figure 15 and Table 4 respectively. Figure 15. D-latch-1 logic symbol Table 4. Truth table for D-latch-1 | EN/CLK | D | Q | Q' | | |--------|---|-------------------|--------|--| | 0 | X | $Q_{prev}$ | Q'prev | | | 1 | 0 | 0 | 1 | | | 1 | 1 | 1 | 0 | | | 2 | X | Q <sub>prev</sub> | Q'prev | | | 3 | X | Q <sub>prev</sub> | Q'prev | | Transient analysis for the D-latch-1 logic circuit was performed using the HSPICE AVANWAVES visualization tool and is shown in Figure 16. ## D-latch-2 The D-latch-2 logic circuit shows its transparent behavior when the EN input signal is at logic-2 and the previous D input value gets latched for rest of the logic levels, logic-0, logic-1 and logic-3, on the EN input signal. Similar to D-latch-0, the logic design for D-latch-2 is implemented with the output of Literal Selection Gate, $J_2$ , connected to the gate input of the Figure 16. HSPICE Simulation waveforms for D-latch-1 latch. The logic symbol and truth table for D-latch-2 are shown in Figure 17 and Table 5 respectively. Figure 17. D-latch-2 logic symbol Table 5. Truth table for D-latch-2 | EN/CLK | D | Q | Q' | | |--------|---|-------------------|--------|--| | 0 | X | Q <sub>prev</sub> | Q'prev | | | 1 | X | Q <sub>prev</sub> | Q'prev | | | 2 | 0 | 0 | 1 | | | 2 | 1 | 1 | 0 | | | 3 | X | Q <sub>prev</sub> | Q'prev | | Transient analysis for the D-latch-2 logic circuit was performed using the HSPICE AVANWAVES visualization tool and is shown in Figure 18. Figure 18. HSPICE Simulation waveforms for D-latch-2 # *D-latch-3* The D-latch-3 logic circuit shows its transparent behavior when the EN input signal is at logic-3 and the previous D input value gets latched for rest of the logic states, logic-0, logic-1 and logic-2, on the EN input signal. Similar to D-latch-0, the logic design for D-latch-3 is implemented with the output of Literal Selection Gate, $J_3$ , connected to the gate input of the latch. The logic symbol and truth table for D-latch-3 are shown in Figure 19 and Table 6 respectively. Figure 19. D-latch-3 logic symbol Table 6. Truth table for D-latch-3 | EN/CLK | D | Q | Q' | | | |--------|---|-------------------|--------|--|--| | 0 | X | Q <sub>prev</sub> | Q'prev | | | | 1 | X | Q <sub>prev</sub> | Q'prev | | | | 2 | X | Q <sub>prev</sub> | Q'prev | | | | 3 | 0 | 0 | 1 | | | | 3 | 1 | 1 | 0 | | | Transient analysis for the D-latch-0 logic circuit was performed using the HSPICE AVANWAVES visualization tool and is shown in Figure 20. Figure 20. HSPICE Simulation waveforms for D-latch-3 ## 3.1.3 Multi-Valued Logic Clock Generator For the new approach presented in our work with the MV global clock signal for a multiphase clock domain circuit, the PLL and phase generation circuit in Figure 5 is replaced by an analog circuit named MVL clock generator. The MV clock signal, $\Phi_{clk}$ , is generated by this MVL clock generator. A simple implementation of MV clock generation circuit is comprised of a MV incrementing circuit (Quaternary full adder [4]) along with a MV registered output as shown in Figure 21. Other implementations are certainly possible and are a topic of further investigation. Such implementations will depend upon the target fabrication technology. Figure 21. MVL Clock Generator # 3.2 Binary Logic Implementation When the implementation technology target is a commercially available programmable device such as an FPGA or a standard cell ASIC with a binary logic library, existing logic cell structures must be employed. This restriction prevents the incorporation of the modified literal selection gate and the MV clock generation circuit as described in the previous section. This section describes how the ideas presented in our work may be modified such that implementation on such devices is possible. Most commercially available FPGAs contain resources to support a single binary clock distribution network. When a multi-phase clock domain design is required, the tools must route the different CDT networks using on-chip routing resources. This is often inefficient and requires the use of multiple programmable interconnects which in turn can severely impact performance since the delay added by the programmable interconnects can be significant. For this reason, we are interested in also exploring the use of FPGA target technologies that can take advantage of multi-phase clock domain designs while not suffering from undue clock signal delays due to the heavy use of programmable interconnects in the distribution of the multiple CDT networks. Since the MV clock generator cannot be easily implemented on most available FPGAs, an intermediate approach is used where the N CDT networks of a traditional multi-phase clock domain IC are replaced by log(N) CDTs that are routed within the FPGA. The log(N) CDT networks carry a binary-encoded version of the MV clock signal. For the example of a quaternary design, two binary signals, labeled A and B, are propagated to each storage cell that cycle through the values of 00, 01, 10, and 11. Each clock value is the binary encoded representation of the global MV clock signal. The MV clock generator can then be implemented as a binary counter that cycles through the clock phase values. Also, in commercially available FPGAs, it is not possible to insert a modified literal selection gate in series with each storage device. Instead of a modified literal selection gate, a binary decoder is inserted that receives the encoded global clock signal as input. The appropriate decoder output can then connected to the latch gate input as shown in Figure 22. The particular decoder output used is based upon the particular phase domain of the original design. Although, an entire decoder is shown in Figure 22, this is for illustrative purposes only. Since each latch responds to a particular clock phase only, a more economical implementation would be the use of a two-input binary AND gate with input inverters that select the appropriate clock phase. Figure 22. . Clock Phase Sensitive Latches for a Binary-encoded MV Clock Signal ### Chapter 4 ### EXPERIMENTAL RESULTS The different types of experiments carried out to support the new approach presented in this research work, described in Chapter 3, has been discussed in detail here. First, the design implementation and functional simulation of new level-sensitive latches with the modified literal selection gates is described. In the next section, the functional simulation of MV clock generator using SystemVerilog HDL is presented. Finally, a comparison showing the design implementation of various multi-phase synchronous circuits at RTL level using FPGA target technology along with their equivalent MVL design implementation using binary logic has been summarized. ### **4.1** Functional Simulation Results for New Level-Sensitive Latches The new level-sensitive latches were designed with the appropriate modified literal selection gates selection gate connected to the gate input of binary latches. The modified literal selection gates were designed at transistor level and simulated using HSPICE AVANWAVES visualization tool. In this experiment, multi-threshold FETs were used allowing for different threshold voltages to control the switching. The transistor level structural diagrams of the modified literal selection gate are shown in Figure 11 and the HSPICE simulation waveforms of the new level sensitive latches with the modified literal selection gate are also shown in Chapter 3. The functional simulations of these new level-sensitive latches were also performed using SystemVerilog HDL. The SystemVerilog HDL supports extended data types that allow for non-binary, higher valued radix discrete signals to be easily represented. The modified literal selection gates were modeled using a simple case construct in SystemVerilog HDL to perform the simulation of new level-sensitive latches and the simulation waveforms are shown in Figure 23. a) Functional Simulation of D-latch-0 # b) Functional Simulation of D-latch-1 # c) Functional Simulation of D-latch-2 ## d) Functional Simulation of D-latch-3 Figure 23. Functional Simulations of new level-sensitive latches using SystemVerilog HDL Each of the waveforms in Figure 23 validates the functionality of the new level-sensitive latches. The 'reset' signal resets the output 'q' of the latch to logic 0. The 'en' signal is the MVL signal, represented by the sequence 0,1,2,3,0,1,2,3..., given as input to the literal selection gates whose output is connected to the gate input of the binary level-sensitive latch. The output signal 'q' shows its transparency to the data input signal 'data' according to the type of modified literal selection gate connected to the latch. ### 4.2 Functional Simulation Results for MVL Clock Generator The functional simulation of MVL clock generator was also performed using SystemVerilog HDL. For this simulation, the MVL clock generator circuit was modeled using the quaternary adder design [4] as shown in Figure 21. The 'Clk' signal [clock period used is 20ns] supplied to the register in Figure 21 is a periodic binary pulse train. The 'Reset' signal initializes the output Q of the register to be logic 0. For every positive edge of the binary clock, the register stores the result of the quaternary full adder circuit. Hence, the output Q will take values in the sequence 0,1,2,3,0,1,2,3,0,1... which acts as the MVL clock/enable signal for the control of the D-latches with modified literal selection gates. The functional simulation waveform for the MVL clock generator for a simulation period of 60ns is shown in Figure 24. Figure 24. Functional Simulation of MVL Clock generator ### 4.3 Comparison of Results for Binary and MVL Designs Several different synchronous circuits were designed and implemented at RTL level using FPGA target technology for both with multi-phase clock domain approach and their new equivalent MVL approach as described in Chapter 3. The designs with both these approaches were compared for number of programmable interconnects structures used and the performance in terms of clock frequency. The synchronous circuit designs used for comparing the results obtained using the two approaches were Finite State Machine (FSM) controllers, which were designed using Altera Quartus II (Subscription 5.0) tool. In the multi-phase clock domain approach, each FSM controller design with *n* different states is implemented with multiple binary non-overlapping clock distribution networks driving a subset of level-sensitive latches. However, in their equivalent MVL approach, each FSM with *n* different states is implemented with a single MVL clock distribution network where the level-sensitive latches are driven by decoders. The synthesis of all MVL FSM controller designs using the Altera Quartus II tool was performed by representing the MVL clock signal logic values with their equivalent binary-encoded values [0:00, 1:01, 2:10, 3:11]. All FSM controller designs were mapped to a Stratix Altera FPGA and the results obtained for this comparison are summarized in Table 7. The results in Table 7 clearly indicate a significant reduction in the number of programmable interconnects for synchronous circuit designs with MVL approach compared to their binary counterparts. Table 7. Comparison of Binary and MVL Designs in FPGA technology | | FINITE STATE MACHINE CONTROLLER | | | | | | | | | | |-----------------------------|---------------------------------|------|---------|------|---------------|-----|------------|------|----------------|-------| | | Counter | | Vending | | State Machine | | Traffic | | Electronic Key | | | | | | Machine | | S0 | | Controller | | Lock | | | No: of States | 4 | | 6 | | 4 | | 4 | | 10 | | | | Binary | MVL | Binary | MVL | Binary | MVL | Binary | MVL | Binary | MVL | | Interconnects | 20 | 5 | 48 | 38 | 63 | 51 | 89 | 48 | 74 | 32 | | of Interest | | | | | | | | | | | | % Reduction | 75% | | 21% | | 19% | | 46% | | 57% | | | Worst Path | 14.87 | 8.55 | 21.08 | 19.5 | 32.4 | 32 | 49.5 | 37.5 | 22.62 | 19.33 | | Delay, t <sub>pd</sub> (ns) | | | | | | | | | | | | Increase in<br>Clock Speed | 43% | | 7% | | 1% | | 24% | | 15% | | It is evident from the Table 7 above that there is a significant reduction in the number of interconnects required for implementing designs with MVL clock signal approach compared to their binary counterparts. Since metal interconnections occupy significant amount of area on the IC chip, the substantial decrease in the number of interconnects as seen in the Table 8 helps to reduce the area constraint for the IC chips to a great extent. Hence, the results obtained in Table 7 support the primary goal of this research work with MVL clock distribution network approach in circuit designs. Also, MVL designs have also shown increase in clock frequency compared to their binary versions and hence the performance. The first FSM controller in Table 7, Counter, was initially designed and implemented with four multi-phase non-overlapping clock signals (Clk1, Clk2, Clk3 and Clk4) and four states (S0, S1, S2 and S3). Each of these four clock signals controls the transition from one particular state to another. For example, the Clk1 is only signal that can drive the state S0 to S1. Similarly, Clk2, Clk3 and Clk4 are the corresponding clock signals that S1, S2 and S3 are depended on for its transition to the next state. The controller design with this approach was synthesized and mapped to Stratix FPGA device to observe the design clock frequency and the number of interconnect structures used. The design was then changed by replacing the four clock signals by a single clock signal (Clk) assuming the binary encoded values of quaternary logic values. This new design was again synthesized and mapped to same Stratix FPGA device. Results obtained for the design with the single clock signal approach were observed and compared with the results obtained for the previous approach. The experiment was repeated for various other FSM controller designs shown in Table 7. Significant improvement in results with the new approach bolsters our new approach of designing with single MVL clock signal distribution to be better than with multiple clock signal approach. ### Chapter 5 ### CONCLUSIONS AND FUTURE RESEARCH The research work presented here introduces the idea of using modified literal selection gates, a MVL clock generator, and a single multiple-valued clock distribution network for the purpose of implementing multi-valued clock designs instead of multiple phase non-overlapping clock designs with the intend to reduce the number of metal interconnections for the clock distribution network which occupy a significant amount of area on an IC chip. The implementation of these ideas using custom VLSI or standard cell ASIC target technology was described including a discussion of the supporting subcircuits with new level-sensitive latches required. Functional validations of several designs were accomplished using the SystemVerilog HDL and a transistor-level design and simulation was carried out using the HSPICE simulator. We also described how these ideas could be adapted to implementation in commercially available FPGA devices or ASICS based on binary-only logic cells and described how the sub-circuits with new level-sensitive latches could be replaced using standard binary components such as a modular counter in place of the master clock generator and a binary decoder in place of the modified literal selection gate. Several circuit design implementation experiments were performed with results showing significant reduction in area and increased performance supporting our new approach. This set of experimental results was obtained by using the Altera QuartusII EDA tool to implement several example circuits. The results obtained in our research work bolsters the idea of implementing a single clock distribution network designs with a multi-valued clock signal replacing the multiple phase clock distribution networks and hence it proves to be a turning point for the challenges faced by the designers in reducing the chip area without compromising on the circuit performance. Though the idea presented here shows its advantages in reducing the number of metal interconnections used and increasing the circuit performance, implementation of MVL designs are also exposed to certain risks. We have described the usage of multi-threshold voltage-mode FET devices in Chapter 3 for implementing the MVL circuits required for our work. The proper functionality of the FET devices to perform the desired operations is determined by their threshold voltages. The threshold voltages of transistors are susceptible to variations in several factors like temperature, gate potential, doping levels of the polysilicon etc. Hence, utmost care has to be taken while the fabrication of FET devices to ensure the desired functionality of MVL circuits. Also, the new level-sensitive latches designed in our work show their transparent nature to certain voltage levels present on the MV clock signal. Several interconnect process variations could affect the voltage levels of the MV clock signal resulting in improper functionality of the latches. Therefore, MVL designs need to be implemented such that they can tolerate the inevitable variations in voltage levels to an extent. Our future efforts will concentrate on the detailed implementation of the MVL clock generation circuit using a suitable MV technology such as multi-threshold voltage-mode FET devices. We plan to evaluate the use of this result by generating custom MV clock generation subcircuits and modified literal selection cells. A sample standard cell multi-phase clock domain circuit will then be implemented using these new cells and compared to the original implementation. #### REFERENCES - [1] Eby G. Friedman, "Clock Distribution Networks in Synchronous Digital Integrated Circuits," proceedings of IEEE, pp. 665 –692, Vol. 89, No. 5, May 2001 - [2] D.M. Miller and M.A. Thornton, Mutiple-Valued Logic: Concepts and Representations, Morgan & Claypool Publishers, San Rafael, CA, ISBN 10-1598291904, 2008. - [3] Vojin G. Oklobdzija, Vladimir M. Stojanovic, Dejan M. Markovic and Nikola M. Nedovic, "Digital System Clocking: High-Performance and Low-Power Aspects", 2003 John Wiley & Sons, Inc. ISBN: 0-471-27447-X. - [4] S. Datla, M.A. Thornton, L. Hendrix, and D. Henderson, "Quaternary addition circuits based on SUSLOC voltage-mode cells and modeling with SystemVerilog," Proc. of IEEE Int. Symposium on Multiple valued Logic, 2009, pp. 256-261 - [5] S.H. Unger, C. Tan, "Clocking Schemes for High-Speed Digital Systems", IEEE Transactions on Computers, Vol. C-35, No 10, October 1986. - [6] M. C. Papaefthymiou and K. H. Randall, "TIM: A timing Package for two-phase, level-clocked circuitry," in *Proc. ACM/IEEE Design Automation Conf.*, 1993, pp. 497–502. - [7] Gronowski P.E, et al, "High-performance microprocessor design "Solid-State Circuits, IEEE Journal of, Volume: 33 Issue: 5, May 1998. - [8] E. G. Friedman, "Clock distribution design in VLSI circuits—An overview," in *Proc. IEEE Int. Symp. Circuits and Systems*, May 1993, pp. 1475–1478. - [9] Gong, M., Zhou, H., Li, L., Tao, J., & Zeng, X. (2011)"Binning Optimization for Transparently-Latched Circuits", IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 30(2), 270-283. - [10] D. Noice, R. Mathews, and J. Newkirk, "A clocking discipline for two-phase digital systems," in *Proc. IEEE Int. Conf. Circuits and Computers*, Sept. 1982, pp. 108–111. - [11] Roland E. Best (2007). *Phase-Locked Loops: Design, Simulation and Applications* (6th ed.). McGraw Hill. ISBN 978-0-07-149375-8. - [12] G.Epstein, G.Frieder, and D.C. Rine, "The Development of Multiple-Valued Logic as Related to Computer Science", Computer, vol. 7, pp. 20-32, 1974. - [13] C. Ebeling and B. Lockyear, "On the performance of level-clocked circuits," in Proc. Advanced Research in VLSI, Chapel Hill, NC, 1995,pp. 342–356. - [14] K.C. Smith, "The Prospects for Multivalued Logic: A Technology and Application View," IEEE Trans Computers, vol. 30, pp. 619-634, 1981. - [15] S.L. Hurst, "Multiple-Valued Logic—Its Status and Its Future," IEEE Trans. Computers, vol. 33, no. 12, pp.1,160-1,179, Dec. 1984. - [16] Vasundara Patel and K S Gurumurthy, "Static Random Access Memory Using Quaternary Latch", International Journal of Engineering Science and Technology. Vol. 2(11), 2010, 6371-6379. - [17] K. D. Wagner, "Clock system design," *IEEE Des. Test Comput.*, pp. 9–27, Oct. 1988. - [18] V. G. Oklobdzija, "Clocking in multi-GHz environment," in Proc. 23<sup>rd</sup> IEEE Int. Conf. Microelectron., 2002, vol. 2, pp. 561–568. - [19] S. Datla, M.A. Thornton, "Quaternary Voltage-Mode Logic Cells and Fixed-Point Multiplication Circuits", IEEE International Symposium on Multiple-Valued Logic (ISMVL), May 26-28, 2010, pp. 128-133 - [20] P. E. Gronowski, W. J. Bowhill, R. P. Preston, M. K. Gowan, and R. L. Allmon, "High-performance microprocessor design," IEEE J.Solid-State Circuits, vol. 33, pp. 676–686, May 1998. - [21] A. Ishii, C. E. Leiserson, and M. C. Papaefthymiou, "Optimizing two-phase, level-clocked circuitry," in Advanced Research in VLSI and Parallel Systems: Proceedings of the 1992 Brown/MIT Conference, pp. 246{264, 1992. - [22] G.-C. Hsieh and J. C. Hung, "Phase-Locked Loop techniques A survey," *IEEE Transactions on Industrial Electronics*, vol. 43, pp. 609–615, December1996. - [23] LSSD Rules and Applications, Manual 3531, Release 59.0, IBM Corporation, March 29, 1985. [24] Michael A.B. Jackson, Arvind Srinivasan, et al., "Clock Routing for High-Performance ICs," 27th ACM/IEEE Design Automation Conference, pp. 573-579, 1990