# Global Multiple-Valued Clock Approach for High-Performance Multi-Phase Clock Integrated Circuits

Rohit P Menon Department of Computer Science and Engineering Southern Methodist University Dallas, Texas, USA rmenon@smu.edu

Abstract – Some high performance digital integrated circuits use multi-phase clock distribution systems with level-sensitive latches as clocked storage elements. A set of N periodic non-overlapping binary clock signals propagate over each of the clock phase distribution networks and drive disjoint subsets of level-sensitive latches providing enhanced throughput and performance. This performance enhancement can result in increased area characteristics since the individual distribution networks are required for each clock phase. The method presented in this paper overcomes this problem by using a single global clock distribution network for a multi-valued (MV) clock signal in combination with level-sensitive latches designed to be transparent for a specific portion of the global MV clock signal. This approach is compatible with conventional binary logic since the only non-binary components required are the global clock generator and a modified literal selection gate that can be implemented as small analog circuits. A purely binary implementation approach is also described where the MV clock signal is replaced by a binary encoded signal and the phase-sensitive latches are implemented through the inclusion of a decoding function. This approach allows for implementation of the method using commercially available FPGA devices or a standard cell library containing only binary logic cells.

### 1. INTRODUCTION

The concept of Multiple-Valued Logic (MVL) has been an area of research for many years in the Integrated Circuit (IC) design community [1,2,3,5]. MVL has several applications in Electronic Design Automation (EDA) tools for digital circuit simulation and synthesis. MVL circuitry has the potential to become a future logic technology for high-speed IC designs as reduction in chip area, increase in performance, and reduced power dissipation characteristics are major requirements for future ICs. Some high-performance digital integrated circuits being produced today use multi-phase clock distribution systems [7,8,9]. The clock distribution networks used in multiphase clock distribution systems require a significant amount of resources in terms of area since each clock phase requires an independent distribution network. Difficulties also arise in

Mitchell A Thornton Department of Computer Science and Engineering Southern Methodist University Dallas, Texas, USA mitch@lyle.smu.edu

maintaining synchronization among the independent clock phases.

Clocking is an essential concept in the design of synchronous digital systems [10]. A synchronous system is comprised of storage elements and combinational logic that together make up a Finite State Machine (FSM) controller and a datapath. A typical clock signal has to be distributed to a large number of storage elements and hence it usually has the highest fan-out of any node in a typical digital design. As a result, a clock distribution system alone can consume up to 30-40% of the power budget of the IC chip [4]. Clocking in digital systems continues to gain importance since the clock frequency is increasing rapidly, approximately doubling every three years. The increase in clock uncertainties due to higher clock frequencies has made designing clock distributions in high-performance microprocessors and other ICs increasingly difficult. Hence distributed multi-phase clock systems can play a vital role in high-performance circuit designs where independent clock networks with lower frequency nonoverlapping clock signals are physically distributed to disjoint subsets of the clocked storage elements.

The clocked storage elements generally used in highperformance circuits are level-sensitive transparent latches [6] that provide high-performance and low power consumption [13] as compared to flip-flops. Level-sensitive latches are attractive since they require fewer transistors to implement as compared to edge-sensitive storage devices. However, the transparent nature of latches increases the difficulty in meeting timing criteria as compared to the use of edge-sensitive circuits.

The method presented here implements the strategy of a multi-phase clocking architecture but utilizes a single clock distribution network instead of multiple clock distribution networks. The single clock distribution network contains a multi-valued logic clock signal propagating to all clocked storage elements present in the high-performance IC. The single network may be implemented as a single conductor that distributes an MV clock signal, or as a set of log(N) conductors distributing a binary encoded version of the MV clock signal.

For explanatory purposes, this paper focuses on a fourvalued voltage-mode quaternary clock signal (i.e. four logic levels - logic 0, logic 1, logic 2 and logic 3). Binary levelsensitive latches are augmented to contain a modified literal selection gate inserted in-line between the MV clock signal input and the gating input of the latch. There are four different modified literal selection gates that correspond to one of each of the four clock phase domains. The output of the modified literal selection gate is a standard binary signal that produces a logic-1 for one of the four MV clock signal levels and a logic-0 otherwise.

The paper is organized as follows: In Section 2, the background of the approach is described and the application to custom or standard cell integrated circuits as a target technology is included. In Section 3, an alternative implementation of the approach is described that allows for the use of only binary logic elements through the use of a bundled set of log(N) conductors that distribute an encoded version of the MV clock signal. The approach of Section 3 does not require the implementation of custom MV subcircuits, but does require more clock distribution network area; however, it is reduced from N independent multi-phase binary clock signals to log(N) binary signals. In Section 4, experimental results supporting the technique are provided. Finally in Section 5, conclusions and plans for future work are discussed.

# 2. CUSTOM OR STANDARD CELL IC IMPLEMENTATION

A typical high performance IC design with multiple phase clock signal distribution networks is depicted in Fig 1. An onchip Phase Locked Loop (PLL) receives the binary external clock input to generate a stable high frequency global clock signal which is then input to a multi-phase clock generation circuit. The phase generation circuit produces each of the N lower frequency individual clock phase signals which are in turn distributed to a subset of latches in each phase domain. CDT<sub>i</sub> represents the individual distribution networks.



Fig 1. Block Diagram of Multi-phase Clocking

In Fig 1, the *N* multiple phase shifted clock signals are represented by  $\Phi_0$ ,  $\Phi_1$ ,...,  $\Phi_{N-1}$ . In terms of quaternary logic, we consider four phase shifted clock signals, represented by  $\Phi_0$ ,  $\Phi_1$ ,  $\Phi_2$  and  $\Phi_3$ . An example of the clock signal waveforms for



Fig 2. Clock-phase Signal Waveforms

the external global clock input and the resulting four phase shifted clock signals are shown in Fig 2. The four multiple non-overlapping phase shifted clock signals,  $\Phi_0$ ,  $\Phi_1$ ,  $\Phi_2$  and  $\Phi_3$ , propagate to disjoint sets of sub-circuits over corresponding Clock Distribution Tree (CDT) networks. Each sub-circuit shown in Fig 1 is comprised of combinational logic along with sequential logic elements which are typically level-sensitive transparent latches.

The idea of having a single clock distribution network instead of different multi-phase clock distribution networks is depicted in Fig 3. A Multiple-Valued Logic (MVL) clock signal, represented by  $\Phi_{elk}$ , is generated from a MVL clock generator and is propagated on a single clock distribution network to all the sub-circuits present in the IC chip. The waveform for the MVL  $\Phi_{elk}$  signal is shown in Fig 4.



Fig 3. Block Diagram with Global MVL Clock Signal



Fig 4. Φ<sub>clk</sub> MVL Clock Signal Waveform

Each subinterval in the  $\Phi_{clk}$  period in the waveform shown in Fig 4 represents one of the four logic levels (logic 0, logic 1, logic 2, and logic 3) in a quaternary logic system. Since the level-sensitive latches present in Fig 1 are designed for binary logic, a change in the latch design is essential for it to be compatible with the new MVL clock signal ( $\Phi_{clk}$ ). The design change for the level-sensitive latches is implemented by inserting a modified literal selection gate in series with the latch gate or enable input.

### A. Modified Literal Selection Gate (*J<sub>i</sub>*)

A Literal Selection Gate is a unary quaternary logic gate, denoted by  $J_i$  [5]. The subscript *i* denotes the desired logic level for which the output of  $J_i$  gate has a non-zero value. For a quaternary logic implementation, *i* can take values 0, 1, 2 and 3 representing the corresponding four logic levels. The  $J_i$ logic symbols for the four different literal selection gates are shown in Fig 5.



Fig 5. Literal Selection Gates,  $J_i$ 

Previous work defines the non-zero output of a quaternary  $J_i$  gate to be logic 3. In this paper, we have modified the structure of  $J_i$  such that the non-zero output of  $J_i$  is a standard binary logic 1 which allows for compatibility with existing multi-phase clock domain binary circuits. The use of a logic 1 output is used to indicate that the modified literal selection gate may contain appropriate level-shifting circuitry to enable compatibility with conventional binary logic. The modified literal selection gate,  $J_i$  is defined as shown in the truth table in Table 1.

| OUT | $J_{	heta}$ | $J_1$ | $J_2$ | $J_3$ |
|-----|-------------|-------|-------|-------|
| 0   | 1           | 0     | 0     | 0     |
| 1   | 0           | 1     | 0     | 0     |
| 2   | 0           | 0     | 1     | 0     |
| 3   | 0           | 0     | 0     | 1     |

Table 1. Truth Table for Modified  $J_i$ 

### **B.** Level-Sensitive Latches with MVL Gate Input

A level-sensitive latch or D-latch is a logic circuit that acts as a data storage element. A D-latch has a data input signal (D), a gate/enable signal (EN) and outputs (Q and Q'). The logic symbol and the characteristic table for a binary D-latch are shown in Fig 6 and Table 2 respectively.



Fig 6. D-Latch Logic Symbol

| EN | D | Q                 | Q'     |
|----|---|-------------------|--------|
| 0  | Х | Q <sub>prev</sub> | Q'prev |
| 1  | 0 | 0                 | 1      |
| 1  | 1 | 1                 | 0      |

Table 2. D-Latch Characteristic Table

A typical CMOS voltage-mode D-latch circuit can be implemented in a fashion as shown in Fig. 7 [11]. The Data input signal (D) is input to a transmission gate controlled by the enable signal (EN). The EN input serves as the latch's gate input and is connected to the output of the modified literal selection gate. The output of the transmission gate is connected to a latch comprised of two inverters where the topmost inverter serves as a *keeper logic* circuit.



Fig 7. Typical Voltage-Mode D-Latch Circuit

To generate the MV global clock signal for a multi-phase clock domain circuit, the PLL and phase generation circuit shown in Fig 1 are replaced by a subcircuit labeled MVL clock generator as shown in Fig 3. The individual CDT distribution networks are replaced with a single global MV clock signal distribution network and an appropriate modified literal selection circuit is inserted in series with the EN input of the transparent D-latches. Area savings results in a custom or standard cell implementation through the replacement of the multiple CDT networks with a single clock distribution network since routing and via placement is less complex.

Depending on which CDT domain is used, each original Dlatch is replaced with one of the four sub-circuits shown in Fig 8 comprising a modified literal selection gate in series with the D-latch EN input.



Fig 8. D-latches with Modified Literal Selection Gates

The overall characteristic tables for the four different subcircuits in Fig. 8 are shown in Table 3.

E

| EN/CLK | D | Q                 | Q'     |
|--------|---|-------------------|--------|
| 0      | 0 | 0                 | 1      |
| 0      | 1 | 1                 | 0      |
| 1      | х | Q <sub>prev</sub> | Q'prev |
| 2      | Х | Q <sub>prev</sub> | Q'prev |
| 3      | Х | Q <sub>prev</sub> | Q'prev |

| N/CLK | D | Q                 | Q'     |
|-------|---|-------------------|--------|
| 0     | Х | Q <sub>prev</sub> | Q'prev |
| 1     | 0 | 0                 | 1      |
| 1     | 1 | 1                 | 0      |
| 2     | х | Q <sub>prev</sub> | Q'prev |
| 3     | х | Q <sub>prev</sub> | Q'prev |

a) D-latch-0

0

D

x 0 Q'

0'

EN/CLK

0

| EN/CLK | D | Q     |  |
|--------|---|-------|--|
| 0      | х | Qprev |  |

Qprev

Qpre

0

1

Х

х

0

1

O'

Q'prev

Q'prev

Q'prev

1

0

b) D-latch-1

| · |   | <prev<br>prev</prev<br> | ✓ prev             |
|---|---|-------------------------|--------------------|
| 1 | х | Q <sub>prev</sub>       | Q' <sub>prev</sub> |
| 2 | 0 | 0                       | 1                  |
| 2 | 1 | 1                       | 0                  |
| 3 | х | Q <sub>prev</sub>       | Q'prev             |

1

2

3

3

c) D-latch-2

d) D-latch-3

Table 3. Characteristic Tables for D-latch-i

### C. MV Clock Generation Circuit

The master MV clock generation circuit is the second additional circuit that must be implemented in place of the traditional PLL and phase generation circuits. An example of a simple implementation of the MV clock generation circuit is comprised of a MV incrementing circuit with a MV registered output as shown in Fig. 9. Other implementations are certainly possible and are a topic of further investigation. Such implementations will depend upon the target fabrication technology.



### 3. BINARY LOGIC IMPLEMENTATION

When the implementation technology target is а commercially available programmable device such as an FPGA or a standard cell ASIC with a binary logic library, existing logic cell structures must be employed. This restriction prevents the incorporation of the modified literal selection gate and the MV clock generation circuit as described in the previous section. This section describes how the method may be modified such that implementation using such devices is possible.

Most commercially available FPGAs contain resources to support a single binary clock distribution network. When a multi-phase clock domain design is required, the tools must route the different CDT networks using on-chip routing resources. This is often inefficient and requires the use of multiple programmable interconnects which in turn can severely impact performance since the delay added by the programmable interconnects can be significant. For this reason, we are interested in also exploring the use of FPGA target technologies that can take advantage of multi-phase clock domain designs while not suffering from undue clock signal delays due to the heavy use of programmable interconnects in the distribution of the multiple CDT networks. The modification described here is also applicable for standard cell designs where the inclusion of custom analog cells is not permitted.

Because the MV clock generator cannot be easily implemented on most available FPGAs, an intermediate approach is used where the N CDT networks of a traditional multi-phase clock domain IC are replaced by log(N) CDTs that are routed within the FPGA. The log(N) CDT networks transmit a binary-encoded version of the MV clock signal. For the quaternary design example, two binary signals, labeled A and B, are propagated to each storage cell that cycle through the values of 00, 01, 10, and 11. Each clock value is the binary encoded representation of the global MV clock signal. The MV clock generator can then be implemented as a binary counter that cycles through the various clock phase values.

Also, in commercially available FPGAs, it is not possible to insert a modified literal selection gate in series with each storage device. Instead of a modified literal selection gate, a binary decoding circuit is inserted that receives the encoded global clock signal as input and produces an output enabling binary pulse. The appropriate decoder output can then connected to the latch gate input as shown in Fig 10. The particular decoder output used is based upon the particular phase domain of the original design. Although, an entire decoder is shown in Fig. 10, this is for illustrative purposes

only. Since each latch responds to a particular clock phase only, a more economical implementation would be the use of a two-input binary AND gate with input inverters that select the appropriate clock phase.



Fig 10. Clock Phase Sensitive Latches for a Binaryencoded MV Clock Signal

### 4. EXPERIMENTAL RESULTS

Three different types of experiments were carried out to validate the approach described in this paper. The modified literal selection gates were designed and implemented at the transistor level, several multi-phase clock domain synchronous circuits were implemented and functionally validated at the RTL level, and several different circuits were designed and implemented using FPGA target technology based on the method presented in Section 3.

The modified literal selection gates were designed at the transistor level and simulated using HSPICE. In this experiment, variable threshold voltage FET models were used allowing for different threshold voltages to control the switching.

From a functional point of view, RTL descriptions of several multi-phase clock domain circuits were implemented and simulated. The SystemVerilog HDL is used in the functional simulations since it supports extended data types that allow for non-binary, higher-valued radix discrete signals to be easily represented. In these simulations, the MV clock generator circuit was modeled based on the structure shown in Fig 9 using the quaternary adder design reported in [12]. The modified literal selection gates were modeled using a simple case construct and all other circuitry was identical to that in the original multiple CDT network designs.

The 'Clk' signal supplied to the register in Fig 9 is a periodic binary pulse train. The 'Reset' signal initializes the output Q of the register to be logic 0. For every positive edge of the binary clock, the register stores the result of the quaternary full adder circuit. Hence, the output Q will take values in the sequence 0,1,2,3,0,1,2,3,0,1... which acts as the MVL clock/EN signal for the control of the D-latches with modified literal selection gates. All simulations of SystemVerilog models were performed using Synopsys Verilog Compiler Simulator (VCS) tool.

The FPGA-based implementation of this approach was experimented with by comparing multi-phase clock designs with N CDT networks with those implemented using log(N)

CDT networks. The comparative study reported on the number of internal interconnects required to implement both forms of the designs and a timing analyzer was used to report the worst case path delay and overall improvement in maximum clock speed.

The circuit designs used for the comparison were Finite State Machine (FSM) controllers implemented using the Altera QuartusII (Subscription Edition 5.0) tool. The FSM controller for each implemented binary circuit has n different states with multiple binary non-overlapping clock distribution networks driving a subset of the level-sensitive latches compared to the single MVL clock distribution network implemented as log(N) CDT networks with latches driven by decoders in their equivalent MVL circuits.

We performed the synthesis of MVL circuits using the Altera QuartusII EDA tool by representing the MVL clock signal logic values with their binary-encoded equivalents  $(0 \leftarrow 00, 1 \leftarrow 01, 2 \leftarrow 10, \text{ and } 3 \leftarrow 11)$ . All designs were mapped to a StratixII Altera FPGA. The results obtained for this comparison are summarized in Table 4 in terms of number of interconnects required and clock frequency for both designs.

|                             | FINITE STATE MACHINE CONTROLLER |      |        |          |         |                    |        |                     |        |       |
|-----------------------------|---------------------------------|------|--------|----------|---------|--------------------|--------|---------------------|--------|-------|
|                             | Counter Vending Machine         |      |        | State Ma | chineS0 | Traffic Controller |        | Electronic Key Lock |        |       |
| No: of States               | 4                               |      | 6      | 6 4 4    |         | 1                  | 10     |                     |        |       |
|                             | Binary                          | MVL  | Binary | MVL      | Binary  | MVL                | Binary | MVL                 | Binary | MVL   |
| Interconnects               | 20                              | 5    | 48     | 38       | 63      | 51                 | 89     | 48                  | 74     | 32    |
| of Interest                 |                                 |      |        |          |         |                    |        |                     |        |       |
| % Reduction                 | 75%                             |      | 21     | %        | 19%     |                    | 46%    |                     | 57%    |       |
| Worst Path                  | 14.87                           | 8.55 | 21.08  | 19.5     | 32.4    | 32                 | 49.5   | 37.5                | 22.62  | 19.33 |
| delay, t <sub>pd</sub> (ns) |                                 |      |        |          |         |                    |        |                     |        |       |
| Increase in                 | 43%                             |      | 79     | %        | 1%      |                    | 24%    |                     | 15%    |       |
| Clock speed                 |                                 |      |        |          |         |                    |        |                     |        |       |

## Table 4. Comparison of Binary and MVL Designs in FPGA Technology

Table 4 indicates a significant reduction in the required number of interconnects for the MVL circuit designs as compared to their binary counterparts. Because programmable interconnect structures represent a significant source of area and delay in FPGA designs, these reductions are significant and result in increased performance.

The counter circuit in Table 4 was initially designed with four multiple-phase shifted clock signals (Clk1, Clk2, Clk3 and Clk4) where each of the four clock signals drives one particular state to another. A design change was then made by replacing the four clock signals with a single clock signal (Clk), which assumes binary-encoded values of quaternary logic. Both designs were compared in terms of the number of interconnects and maximum clock frequency. Similarly, the comparison was performed on other controllers listed in Table 4.

### **5. CONCLUSIONS AND FUTURE WORK**

The results presented in this paper introduce the technique of using modified literal selection gates, a MV clock generator, and a single multiple-valued clock distribution network for the purpose of implementing multi-phase clock designs with increased area and performance characteristics. The implementation of these ideas using custom VLSI or standard cell ASIC target technology is described including a discussion of the supporting new subcircuits required. Functional validations of several designs were accomplished using the SystemVerilog HDL and a transistor-level design and simulation was carried out using the HSPICE simulator.

We also described how these ideas could be adapted for implementation in commercially available FPGA devices or ASICS based on binary-only logic cells and described how the new subcircuits could be replaced using standard binary components such as a modular counter in place of the master clock generator and a binary decoder in place of the modified literal selection gate. Experimental results show improved performance and reduction in area for a set of example circuits when these ideas are applied as compared to their binary counterparts. This set of experimental results was obtained by using the Altera QuartusII EDA tool to implement several example circuits.

Our future efforts will concentrate on the detailed implementation of the MV clock generation circuit using a suitable MV technology such as variable-threshold voltagemode FET devices. We plan to evaluate the use of this result by generating custom MV clock generation subcircuits and modified literal selection cells. A sample standard cell multiphase clock domain circuit will then be implemented using these new cells and compared to the original implementation.

#### REFERENCES

- [1] Epstein, G., Frieder, G., and Rine, D.C., "The Development of Multiple-Valued Logic as Related to Computer Science", *IEEE Computer*, vol. 7, pp. 20-32, 1974.
- [2] Smith, K.C., "The Prospects for Multivalued Logic: A Technology and Application View," *IEEE Trans Computers*, vol. 30, pp. 619-634, 1981.
- [3] Hurst, S.L., "Multiple-Valued Logic-Its Status and Its Future," *IEEE Trans. Computers*, vol. 33, no. 12, pp. 1,160-1,179, Dec. 1984.
- [4] Gronowski P.E, et. al, "High-performance microprocessor design," *IEEE Journal of Solid-State Circuits*, vol. 33, iss. 5, May 1998.
- [5] Miller, D.M. and Thornton, M.A., Mutiple-Valued Logic: Concepts and Representations, Morgan & Claypool Publishers, San Rafael, CA, ISBN 10-1598291904, 2008.
- [6] Gong, M., Zhou, H., Li, L., Tao, J., & Zeng, X., "Binning Optimization for Transparently-Latched Circuits", *IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems*, 30(2), 270-283, 2011.
- [7] Eby, G. and Friedman, "Clock Distribution Networks in Synchronous Digital Integrated Circuits," *Proceedings* of *IEEE*, pp. 665–692, Vol. 89, No. 5, May 2001
- [8] Jackson, M.A.B. and Srinivasan, A., et. al., "Clock Routing for High-Performance ICs," Proc. of ACM/IEEE Design Automation Conference, pp. 573-579, 1990.

- [9] Papaefthymiou, M.C. and Randall, K.H., "TIM: A timing Package for two-phase, level-clocked circuitry," in *Proc. ACM/IEEE Design Automation Conf.*, 1993, pp. 497–502.
- [10] Oklobdzija, V.G., Stojanovic, V.M., Markovic, D.M. and Nedovic, N.M., Digital System Clocking: High-Performance and Low-Power Aspects, John Wiley & Sons, Inc. ISBN: 0-471-27447-X, 2003.
- [11] Vasundara P. and Gurumurthy, K.S., "Static Random Access Memory Using Quaternary Latch", *International Journal of Engineering Science and Technology*, 2(11), 2010, 6371-6379.
- [12] Datla, S., Thornton, M.A., Hendrix, L., and Henderson, D., "Quaternary addition circuits based on SUSLOC voltage-mode cells and modeling with SystemVerilog," Proc. of *IEEE Int. Symposium on Multiple valued Logic*, pp. 256-261, 2009.
- [13] Ebeling, C. and Lockyear, B. "On the performance of level-clocked circuits," in Proc. Advanced Research in VLSI, Chapel Hill, NC, 1995, pp. 342–356.