# A 2.56-Gb/s Serial Wireline Transceiver That Supports an Auxiliary Channel in 65-nm CMOS

Xiaoran Wang<sup>®</sup>, Student Member, IEEE, Tianwei Liu, Member, IEEE, Shita Guo<sup>®</sup>, Member, IEEE,

Mitchell A. Thornton<sup>10</sup>, Senior Member, IEEE, and Ping Gui, Senior Member, IEEE

Abstract—In this article, an asynchronous serial transceiver that is capable of transmitting and receiving an auxiliary data stream concurrently with the primary data stream is described. The transceiver instantiates the auxiliary data stream by modulating the phase of the primary data without affecting the primary channel transmission and recovery mechanisms. Standard receiver interoperability is maintained since the auxiliary data appear as primary data jitter. Analysis of the proposed transceiver and considerations of the system parameters are included and can be used to determine how such an auxiliary channel is implemented. The proposed transceiver with the auxiliary channel can be widely used in many data communication applications such as for transmitting signatures for authentication or other control information, steganography, or additional data in an existing serial link. A prototype transceiver, implemented in a 65-nm CMOS process, demonstrates the proposed concept with an 80-Mb/s auxiliary channel in a 2.56-Gb/s asynchronous serial link.

Index Terms—Auxiliary channel, asynchronous serial transceiver, clock and data recovery (CDR), hardware security.

## I. INTRODUCTION

SECURING an information processing system at the application and system software layers is regarded as a necessary but incomplete defense against the cybersecurity threats. Encryption is a commonly employed method with the goal of preventing unauthorized access to sensitive information [1], [2]. However, the modification or redesign of an existing system to include encryption at the hardware layer can add significant expense and result in compatibility issues with other systems and specifications as well as interoperability issues with other contemporary versions of

Manuscript received February 1, 2019; revised May 27, 2019 and June 24, 2019; accepted July 15, 2019. Date of publication August 12, 2019; date of current version December 27, 2019. (*Corresponding author: Ping Gui.*)

X. Wang is with the Integrated Circuits and Systems Laboratory, Department of Electrical and Computer Engineering, Southern Methodist University, Dallas, TX 75205 USA (e-mail: xiaoranw@smu.edu).

T. Liu was with the Department of Electrical and Computer Engineering, Southern Methodist University, Dallas, TX 75205 USA. He is now with Ambarella, Inc., Santa Clara, CA 95054 USA (e-mail: tianweil@smu.edu).

S. Guo was with Southern Methodist University, Dallas, TX 75205 USA. He is now with Texas Instruments, Dallas, TX 75243 USA (e-mail: s-guo@ ti.com).

M. A. Thornton and P. Gui are with the Department of Electrical and Computer Engineering, Southern Methodist University, Dallas, TX 75205 USA, and also with the Darwin Deason Institute for Cyber Security, Southern Methodist University, Dallas, TX 75205 USA (e-mail: mitch@lyle.smu.edu; pgui@smu.edu).

Color versions of one or more of the figures in this article are available online at http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/TVLSI.2019.2931478

similar systems. Moreover, sometimes the mere presence of an encrypted channel provides an adversary with information that is undesired and encourages increased attacks [3], [4]. More emphasis is being placed in the area of hardware security due to the emergence of exploits at these lower layers of data transmission and processing [5]–[7].

Many security measures at the hardware level require an additional data channel with extra bandwidth to transmit authentication information or to just increase redundancy. A competing issue is that most existing designs cannot support the additional channel required to implement these measures without an expensive redesign effort. Even if such redesign efforts are accomplished, the resulting more secure products may not be backwardly compatible with earlier or standard generations.

To meet these challenges, we devised and implemented a wireline transceiver for an asynchronous serial channel that provides additional data bandwidth through the inclusion of an auxiliary data channel and is interoperable with nonequipped transceivers in earlier-generation systems. An auxiliary channel at a lower layer generally provides increased bandwidth to support requirements of security and other system modifications. This technique is also a means for steganography since it allows for communications to be hidden whether they are actually encrypted or not [8], [9].

Modern information processing circuitry is becoming very common for information exchanges to be accomplished via asynchronous serial links [10]–[14]. Asynchronous serial transceivers are ubiquitous in today's devices and are used to interface between blocks within and between integrated circuits (ICs), and between packaged systems, typically using industry standards such as USB, MIPI, and PCI-e [15].

Serial communication channels must have a bandwidth in excess of the signal bandwidth that they transmit to allow for reliable communications in accordance with Shannon's capacity theorem. Because signals are always transmitted in the presence of noise, practical channels are designed with a bandwidth margin to account for reliable detection of the transmitted bitstream. It is desirable to efficiently utilize the available bandwidth in asynchronous serial channels. The common trend of the modern wireline transceiver is to be designed with increasingly large data transmission bandwidth on the serial channel. However, besides increasing the transmitted data rate on a single channel, we consider that the available bandwidth can also be used to support an auxiliary channel that is

1063-8210 © 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications\_standards/publications/rights/index.html for more information.



Fig. 1. Proposed asynchronous serial link with the auxiliary channel.

capable of transmitting and receiving an auxiliary data stream in parallel with the primary channel on a single serial link.

As shown in Fig. 1, the proposed transceiver system transmits the primary and auxiliary data stream through a single asynchronous serial channel and simultaneously recovers both of them at the receiver.

The proposed novel transmission scheme can be used to provide several benefits mentioned as follows. For example, it can be applied in different ways to enhance hardware security, such as by using the auxiliary channel to carry authentication information for the simultaneously received primary data [16]. Other enhancements may exploit the fact that the auxiliary data transmission can also be considered as a form of steganography since the proposed method provides backward-compatibility with standard transceivers for the primary data channel, and the auxiliary data appear as jitter to a nonequipped transceiver [8], [9]. Alternatively, the auxiliary channel may be used for error correction or detection on the primary channel, thereby enhancing its data integrity and capacity without an increase in bandwidth [17]. Moreover, data throughput can also be increased through the use of the additional auxiliary channel.

The primary contributions of this paper include the design, simulation, analysis, and prototype measurement of a new architecture for an asynchronous serial transceiver that supports an auxiliary channel. The potential applications to support hardware-level security are described in this paper. We also emphasize that this additional channel is provided in a way that offers backward compatibility, interoperability with nonequipped designs, and minimal redesign of existing systems. The architecture of the transmitter and receiver is described in detail. The guidelines to select the system parameters on the proposed scheme are provided. The analysis on the limits and impact of the additional auxiliary data channel is given. The measurement results demonstrate the function and performance of this proposed transceiver.

The rest of this paper is organized as follows. Section II introduces the applications of the auxiliary channel built over the asynchronous serial link and the channel bandwidth margin that is used to create the auxiliary channel. In Section III, the proposed transceiver architecture detailed analysis about of the architecture and the design parameters are presented. In Section IV, the measurement results are demonstrated. Finally, the conclusion is made in Section V.

### II. PHYSICAL LAYER AUXILIARY CHANNEL

#### A. Auxiliary Channel Over Asynchronous Serial Link

There are many applications for an auxiliary channel over a serial wireline link. As an example, various schemes involving

data authentication at the physical level are desirable without causing significant cost increases. Data authentication can be accomplished via the receiver verifying the data source authenticity through signatures or other data being transmitted over the auxiliary channel. One example is the increasingly common use of reconfigurable FPGAs wherein configuration bitstreams are dynamically loaded from on-board serial memory. Such bitstreams could be accompanied with authenticating signatures that provide some level of security to the device being configured. If the auxiliary data stream comprises sensitive information, it can be further secured through the use of encryption.

Auxiliary channel data transmissions may also be used for purposes other than authentication, for example, link quality indices (LQI) or error detection and correction checksums could be transmitted to accompany the primary data [4].

An overall advantage of embedding a physical-layer auxiliary channel in an asynchronous serial link is the use of the channel to transmit additional data in an existing design without the need for including another primary channel. Since transistors are relatively plentiful as compared to on- and off-chip communication channels, it is advantageous to use existing communications channels rather than incorporating costly additional links. As an example, high-speed data on the primary channel may be accompanied with lower bandwidth control or synchronizing data on the auxiliary channel. In this latter case, keeping the auxiliary transmissions hidden may not be a primary goal, rather the decrease in cost and more efficient usage of available bandwidth could be the motivating factor.

#### B. Channel Bandwidth Margin

The Shannon–Hartley channel capacity theorem  $C = B \times$  $\log_2(1 + \text{SNR})$  describes the relationship between the bandwidth of a channel versus the signal-to-noise ratio (SNR) of the transmitted signal, where C represents the channel capacity in bit/s and B represents the bandwidth in Hz. Due to the fact that all practical channels comprise some amount of noise, it is necessary to implement communication systems such that the theoretical channel bandwidth exceeds the bandwidth of the transmitted signal by some amount and thus has some bandwidth margin. It is this excess bandwidth margin that is exploited in the physical layer auxiliary channel described in this paper. However, the achievable auxiliary channel bandwidth is not only limited by the transmission channel, but the transceiver architecture, which is another key factor to decide the data rate of the auxiliary channel. In the proposed prototype transceiver, we consider implementing the auxiliary channel at relatively low bandwidth in a conventional serial wireline transceiver at a very small additional cost, which will be described in detail in Section III.

## **III. TRANSCEIVER IMPLEMENTATION AND ANALYSIS**

The transceiver implementation described here is applicable to wireline baseband modulation systems. The implemented transceiver embeds the auxiliary data into the primary data



Fig. 2. Block diagram of the transmitter.

using modulation that can be considered as phase modulation (PM) since the phase of the primary data stream is conditionally delayed depending upon the auxiliary data values. The transceiver simultaneously recovers both the primary and the auxiliary data at the receiving end [18].

### A. Transmitter Architecture

A serial transmitter transmitting data at N bits per second typically employs a D flip-flop (DFF) triggered by a clock of frequency N Hz at the very end to synchronize every data bit before sending it out as serial data. The duration of every data bit is the same as that of one period of the clock. In the proposed transmitter, the auxiliary data are used to modulate the clock to the DFF. This is implemented using a 2:1 multiplexer (MUX) whose two data inputs are clock CLK0 and its delayed version CLK1. The MUX control input is driven by the auxiliary data as shown in Fig. 2. Thus, either CLK0 or CLK1 is selected for the DFF clock input by each bit of the auxiliary data signal to generate the modulated clock signal.

Fig. 3 depicts the waveforms of the clocks and data in the transmitter. CLK1 is delayed from CLK0 by a phase of  $\Delta \varphi$ . The instantaneous modulated clock is produced according to the auxiliary data bit. For example, when the auxiliary data bit is "0," CLK0 is used as the clock for the DFF and the modulated data stream is synchronized with the positive edge of CLK0. When the auxiliary data bit is "1," CLK1 is used for the DFF and the modulated data stream is synchronized with the positive edge of the produced gata are translated to phase lead and phase lag in the modulated data stream by the MUX serving as a binary switch for CLK0 and CLK1.

Fig. 3(a) and (b) shows the transmitter timing diagrams with auxiliary data changing from bit "0" to "1" and from "1" to "0," respectively. In the case from "0" to "1," the auxiliary data modulate the primary date from phase lead to phase lag. In the case from "1 to "0," the auxiliary data modulate the primary date from phase lead.

It is necessary not to miss any clock sampling edges in the modulated clock to ensure correct data transmission. Higher data rate leads to shorter timing margin for the MUX selection process; thus, the timing of the MUX block needs to be carefully verified.



Fig. 3. Timing diagrams of the transmitter modulation scheme. (a) Auxiliary data changing from bit "0" to "1." (b) Auxiliary data changing from bit "1" to "0."

If not properly designed, glitches in the modulated clock may happen when CLK0 and CLK1 are at different voltage levels when the modulated clock makes transitions from CLK0 to CLK1 or vice versa, as shown in Fig. 4. The potential glitches are avoided by ensuring that the MUX select signal, the "auxiliary data," always transitions when both CLK0 and CLK1 are at the same level. This is accomplished by making sure that "auxiliary data" are internally synchronized to the positive edge of CLK1. (The transmitter is positive edge triggered by CLK0 and CLK1.) Since CLK1 is a delayed version of CLK0 and "auxiliary data" are triggered by the positive edge of CLK1, the auxiliary data only transitions when both CLK0 and CLK1 are "1"; thus, the potential glitches problem mentioned above would never occur.

Fig. 5 shows the current-starved delay cell that consists of four cascaded current-starved inverters. The bias voltages  $V_{bp}$  and  $V_{bn}$  are used to adjust the current through the top and bottom transistors, thus controlling the degree of the phase shift between CLK0 and CLK1.

#### B. Receiver Architecture

Typically, the asynchronous serial protocols require a clock and data recovery (CDR) circuit at the receiver, as the timing is



Fig. 4. Case of potential glitches and incorrect data transition.



Fig. 5. Current-starved delay cell.

embedded within the transmitted data. Fig. 6 depicts the block diagram of the receiver architecture. It contains the main CDR circuit to recover the primary data and the embedded clock, and an auxiliary data recovery path to extract and demodulate the auxiliary data from the modulated phase of the primary data. The main CDR loop is composed of a bang–bang phase detector (BBPD), a charge pump followed by a low-pass filter (LPF), and an LC tank voltage-controlled oscillator (VCO) [19]–[21].

In particular, in the proposed receiver, the BBPD is shared by both the main CDR loop and the auxiliary data recovery path. Fig. 7 shows the architecture and the timing diagram of the modified BBPD. A conventional Alexander BBPD uses three sampled points and XOR gates to produce the early or late signals that may not be continuous during the leading or lagging phase [22]. The uniqueness of the modified BBPD is that it detects the phase difference between the input data and feedback clock and produces the either "1" to indicate a phase lag or "0" to indicate a phase lead for the entire phase leg/lead period. This is an important feature to extract the auxiliary data.

The error signal produced by the BBPD is sent to the charge pump in the main CDR loop and is also used in the auxiliary data path for demodulation of the auxiliary data. In the main CDR loop, the error signal through the charge pump is used to control the VCO that adjusts the phase of the recovered clock signal so that the primary data stream can be recovered. In the auxiliary data path, as the bits "0" and "1" of the auxiliary data are embedded as phase lead and lag in the primary data stream produced by the transmitter, the BBPD error signal contains the demodulated phase information (lead/lag) that is used to recover the auxiliary data.

The timing diagram in Fig. 7 indicates various signals in the BBPD for the case where the auxiliary channel is modulated either as the phase lead and phase lag in the input data. Phase lead represents bit "0," and phase lag represents "1" in the auxiliary channel. The BBPD consists of four DFFs, DFF1 through DFF4, followed by a MUX. The input data are connected to the "D" input of the DFF1 and DFF2, whereas the VCO feedback clock is tied to the clock of the two DFFs as the sampling clock. It should be noted DFF1 is positive-edge triggered, whereas DFF2 is negative-edge triggered as shown in the figure.

During the locking process of the modulated input data stream, the BBPD error signal generated through the negative feedback loop adjusts the VCO feedback clock so that it is locked to the average phase of the input data. Once the locking status is achieved, the VCO clock is locked to the mid-point of the two phases (phase lead and phase lag) of the input data. Signal  $Q_1$  which is always triggered at the positive edge of the VCO clock to be the recovered primary data. Signal  $Q_2$ triggered at the negative edge of the VCO clock is either 180° leading or lagging with respect to  $Q_1$ , and this phase relation represents the auxiliary data bits.  $Q_3$  is produced by sampling  $Q_2$  using the positive edge of  $Q_1$ ; thus, it produces "1" for phase leading and "0" for phase lagging.  $\overline{Q}_3$  is the opposite of  $Q_3$ . On the other hand,  $Q_4$  is produced by sampling  $Q_2$ using the negative edge of  $Q_1$ ; thus, it produces "0" for phase leading and "1" for phase lagging.  $\overline{Q}_3$  and  $Q_4$  are almost identical except for a certain delay between them. The final error signal is produced by selecting either  $\overline{Q}_3$  or  $Q_4$  using  $Q_1$  as the select signal to ensure bit "0" and "1" are extracted in a timely manner for the recovered auxiliary data. Although this is properly referred to as an error signal with respect to the phase of the primary channel, it is also the signal that carries the auxiliary channel bitstream and is further processed to extract the auxiliary channel bitstream.

The CDR relies upon transitions in the received data stream to produce the error signal that controls the VCO. For this reason, it is necessary that a sufficient number of transitions or pulse edges be presented in the received asynchronous data stream. In order to provide a sufficient number of signal transitions in the received primary data stream, most asynchronous serial data communications systems use encoding schemes such as pseudorandom binary sequence (PRBS) or 8B10B encoding. These encoding schemes insert extra timing bits that guarantee at least one signal transition occurs among some number of subsequent bits. While the timing bits are redundant in terms of information content, their presence allows for transmission speeds to be increased due to the transitions occurring often enough to ensure that the receiver maintains synchronization, thus enhancing overall throughput. The PRBS7 scheme is used in both the primary data stream and the auxiliary data stream in the proposed transceiver and data channels.



Fig. 6. Block diagram of the receiver.



Fig. 7. Architecture and timing diagram of the BBPD.

Some high-frequency pulses would be generated in the BBPD output, which result from the noise coupled from the input serial stream and the VCO feedback clock. The BBPD compares each transition edge of the input serial stream with that of the VCO feedback clock to detect the phase lead or lag and produces the error signal. When the instant random noise of the input serial stream and the VCO feedback clock is larger than the modulated phase difference  $\Delta \varphi$ , the modulated information of phase lead/lag would be buried in the instant random jitter. At such moment, phase error detected by the BBPD is not the modulated phase information; thus,

some high-frequency pulses would appear in the error signal. To fully recover the auxiliary data from the error signal, a second-order low-pass filter (second LPF) is employed in the auxiliary data recovery path to the first filter out the high-frequency pulses in the BBPD output. By choosing an appropriate bandwidth (which will be explained in Section III-C), the second LPF can filter out the high-frequency pulses of the error signal, while maintaining the recovered auxiliary data.

The auxiliary data rate is lower than the primary data rate in the proposed transceiver, but they are synchronized with each other. The clock for sampling the auxiliary data is thus a divided-down version of the faster clock in the transmitter/receiver (fast clock being the VCO output for the receiver CDR).

A standard receiver that is not equipped to demodulate the auxiliary channel will function normally and recover the primary bitstream without noticeable error due to the presence of the auxiliary channel. Because the modulated phase difference at the transmitter does not exceed allowable tolerances with respect to the requirement of CDR at the standard receiver. In this case, the phase difference modulated by the auxiliary data appears to be phase noise/jitter on the primary data channel.

## C. System Design Parameters

In order to demodulate and recover the auxiliary data and to minimize the jitter in the recovered primary data, several parameters of the transceiver need to be chosen appropriately including the modulated phase difference at the transmitter, the parameters of the CDR loop, the data rate of the auxiliary channel, and the bandwidth of the second LPF.

To ensure the main CDR loop operates reliably, the total jitter from the transmitter and receiver must be less than one Unit Interval (1 UI); otherwise, the feedback clock of the CDR would not sample the input data stream correctly, thus



Fig. 8. Simulated eye diagram of the modulated primary data.

leading to the degradation of the BER performance of the recovered primary data. To fully recover and demodulate the auxiliary data, the total jitter from the transmitter, the receiver, and the serial channel together needs to be less than the bounded deterministic jitter caused by the modulated phase difference  $\Delta \varphi$  at the transmitter; otherwise, the embedded auxiliary data information would be buried in the system's jitter. In our design, a jitter budget of <0.3 UI is allocated to the total jitter of the transceiver and the serial channel, and a jitter budget of 0.3–0.5 UI is allocated to the phase difference  $\Delta \varphi$ .  $\Delta \varphi$  is set to be about 135° in our design, which can be translated into about 0.38 UI. A ±10% process, voltage, and temperature (PVT) variation of the phase difference  $\Delta \varphi$  (0.34 UI–0.42 UI) is still within the jitter budget.

Fig. 8 shows the simulated eye diagram of the modulated primary data with two phases. Unlike a typical transceiver eye diagram where the eye width is one UI, the proposed transmitter has the output data phase modulated by the auxiliary data and thus its eye diagram contains phase difference between bits with phase lead and bits with phase lag. The blue arrowed line marks the transmitted eye with leading phase, and the black one marks the transmitted eye with lagging phase, both of which are as wide as 390 ps. Note that there is a 0.38 UI (148 ps) phase difference, shown as indicated by the green arrows between these two eyes. This phase difference corresponds to the phase difference  $\Delta \varphi$  selected for the transmitter.

The data rate of the proposed transceiver has the potential to be increased. Then, the new jitter budget and modulated phase difference numbers are required to scale with the increase of the data rate. For example, if we plan to design the data rate to be ten times faster, it is necessary to scale the jitter budget and modulated phase difference to one-tenth of the number we used above. It is still achievable but needs to be more carefully designed by using low-jitter techniques and some precise phase control methods. Transmitter and receiver equalizers would also be needed is the channel loss is high at the higher data rate.

As mentioned in Section III-B, in a nonequipped standard receiver, the auxiliary data appears as bounded deterministic jitter of the primary data. The jitter caused by the auxiliary data still falls within the jitter budget of the transceiver, and having this much jitter would not adversely affect the functionality of the primary data recovery. Thus, the proposed auxiliary



Fig. 9. CDR loop linear model.

data are backward compatible with the nonequipped standard receiver.

In order to select other CDR parameters, the main CDR with a negative feedback loop is modeled with a linear model as shown in Fig. 9. The small-signal model of the CDR can be derived from this block diagram.

The LPF in the CDR loop contributes to a pole at  $\omega_P$ , and a zero at  $\omega_Z$ . The magnitude frequency response of the open-loop transfer function, denoted by A(s), is shown in the following equation:

$$A(s) = \frac{\varphi_{\text{OUT}}}{\varphi_{\text{IN}}}$$
$$= K_{\text{PD}} \times I_{\text{CP}} \times \frac{1 + \frac{s}{\omega_Z}}{s \times C_{\text{total}} \times \left(1 + \frac{s}{\omega_P}\right)} \times \frac{K_{\text{VCO}}}{s} \quad (1)$$

where  $C_{\text{total}} = C_1 + C_2$ ,  $\omega_Z = (1/(R \times C_1))$ ,  $\omega_P = (1/(R \times C_2))$ ,  $K_{\text{PD}}$  is the gain of BBPD,  $I_{\text{CP}}$  is the change pump current, and  $K_{\text{VCO}}$  is the gain of the VCO.

Typically,  $C_1 \gg C_2$ . Equation (1) can be simplified to the following equation:

$$A(s) = K_{\rm PD} \times I_{\rm CP} \times \frac{R \times C_1}{C_{\rm total}} \times \frac{K_{\rm VCO}}{2\pi f_C}$$
(2)

where  $f_C$  is the unit gain bandwidth BW of the CDR open loop transfer function, which can be derived as shown in the following equation:

$$BW = f_C \approx \frac{K_{PD} \times I_{CP} \times R \times K_{VCO}}{2\pi}.$$
 (3)

The binary value "0" and "1" of the phase error signal produced by BBPD provides sign information to indicate the phase of the data is either lagging or leading the VCO feedback clock. The BBPD is followed by the charge pump to turn on either the charge/discharging current. The value "0" of the BBPD output turns on the charging current of the charge pump, whereas value "1" of the BBPD output turns on the discharging current. Thus, "-1" and "1" are used to represent the BBPD output average amplitude. As shown in Fig. 10,  $K_{\rm PD}$  is the slope of the waveform when the BBPD average output amplitude changes from "-1" to "1." The waveform in blue color shows the ideal PD has an infinite large gain in a noiseless environment. However, in real situation, the slope of the BBPD output is influenced by the random jitter of the input data as shown in the red waveform in Fig. 10. The root-meansquare (rms) value of the random jitter  $J_{\rm RMS}$  in the input data stream is denoted as  $\sigma$ . The practical  $K_{\rm PD}$  can be calculated by the slope of the red waveform times data transition density



Fig. 10. BBPD output varies with the input data phase difference. Blue curve represents the ideal gain of the BBPD, whereas the red curve presents the realistic gain of the BBPD.

coefficient  $\alpha$ , which can be written as

$$K_{\rm PD} = \alpha \left(\frac{2}{2\sigma}\right) \tag{4}$$

where  $\alpha$  is the transition density of a specific type of data input and its value depends on the average number of transition edges present during a unit time. Typically,  $\alpha = 0.5$  for the PRBS type input [23]. As mentioned above, a jitter budget of <0.3 UI is allocated to the total jitter of the transceiver and the serial channel. Therefore, the maximum peak random jitter  $J_P$  cannot exceed 0.15 UI, and  $\sigma$  can be calculated as

$$\sigma \approx \frac{J_P}{Q} = \frac{0.15 \text{UI}}{14} = \frac{3}{280} \text{UI}$$
 (5)

where Q is the scaling factor when converting between peak jitter and RMS jitter with specified BER. When BER =  $10^{-12}$ ,  $Q \approx 14$ . Thus, the value of  $K_{PD}$  can be written as

$$K_{\rm PD} \approx \frac{1}{2} \left( \frac{1 - (-1)}{2\sigma} \right) = \frac{1}{2\sigma} = \frac{70}{3\pi}.$$
 (6)

The bandwidth of the CDR is set at 1.54 MHz (2.56 GHz/1667),  $K_{VCO}$  is set to  $100 \times 2\pi \text{ rad} \times \text{MHz/V}$  and  $I_{CP}$  is designed to  $1\mu A$ . With these parameters, resistor *R* is derived from (3) as

$$R = \frac{BW \times 2\pi}{K} pd \times Icp \times Kvco \approx 2.1K\Omega.$$
(7)

The value of the capacitor  $C_1$  is set to 148 pF, and  $C_2$  is set to 10 pF, leading to a zero located at 512 KHz, and a pole located at 7.6 MHz as calculated in the following equation:

$$f_Z = \frac{1}{2\pi \times R \times C_1} = 512 \text{ KHz}$$
(8)

$$f_P = \frac{1}{2\pi \times R \times C_2} = 7.6 \text{ MHz.}$$
(9)

#### D. Auxiliary Data Rate

The Shannon–Hartley theorem reveals the theoretical bandwidth margin within the total channel capacity that can be exploited for an auxiliary channel. In the serial link design, the transceiver architecture sets the actual limitations for the auxiliary data rate, and the performance of the recovered primary and auxiliary data is related to the auxiliary data rate.

The main CDR closed-loop transfer function H(s), which is also the jitter transfer function (JTF), is expressed in the following equation, and it has a low-pass frequency response:

$$H(s) = \frac{A(s)}{1 + A(s)} = \frac{K_{\rm PD}I_{\rm CP}K_{\rm VCO}R\left(s + \frac{1}{RC_1}\right)}{s^2 + (K_{\rm PD}I_{\rm CP}K_{\rm VCO}R)s + \frac{K_{\rm PD}I_{\rm CP}K_{\rm VCO}}{C_1}}.$$
(10)

The observed JTF(OJTF) shown in (11) has a high-pass frequency response, so the jitter above the cutoff frequency can be observed at the output of the phase detector

OJTF = 
$$1 - H(s) = \frac{1}{1 + A(s)}$$
  
=  $\frac{s^2}{s^2 + (K_{PD}I_{CP}K_{VCO}R)s + \frac{K_{PD}I_{CP}K_{VCO}}{C_1}}$ . (11)

Due to the low-pass characteristics of the CDR loop, the VCO feedback clock can track the low-frequency (in-band) phase change of the input data stream, and accordingly, the in-band phase noise remains at the recovered primary data, while the out-of-band phase noise appears at the error signal of the BBPD output. In order to keep the auxiliary data information intact in the error signal, the bandwidth of the main CDR loop needs to be smaller than the lowest frequency components of the auxiliary data frequency spectrum. Otherwise, the feedback clock would track the phase change of the input data stream, which can result in the loss of embedded phase information (auxiliary data).

The random data, including the encoded data such as PRBS or 8B10B, have a wide frequency spectrum because there are some consecutive 0 s and 1 s in the data stream. The PRBS7 is used for the primary and auxiliary data in the proposed transceiver, and the lowest frequency component of PRBS7 is one-seventh of the data rate. In our implementation reported here, the lower limit of the data rate of the auxiliary channel is set at 7 times the CDR bandwidth. The 1.54-MHz bandwidth of the CDR leads to the lower boundary of the auxiliary data rate being 21.56 Mb/s.

A ratio between the primary and auxiliary data rate is also considered to set the upper boundary for the auxiliary data rate in the proposed transceiver. As the BBPD compares the edges between the input data and the VCO feedback clock at each transition edge of the input data, the frequency of the unwanted high-frequency pulses at the error signal highly depends on the primary data rate. Therefore, the frequency range of the pulses is from the lowest to the highest frequency components of the primary data rate. In order to filter out the unwanted highfrequency pulses coupled from the noise of the primary data and the VCO feedback clock, the bandwidth of the second LPF needs to be lower than the lowest frequency component of the primary data rate. Therefore, the upper limit of the auxiliary data rate that needs to be within the bandwidth of the second LPF is considered to be the lowest frequency component of the primary data rate. In the proposed transceiver, the primary data are coded to PRBS7, leading to the upper boundary of



Fig. 11. Simulated jitter tolerance with SONET OC-192 mask.

the auxiliary data rate being one-seventh of 2.56 Gb/s, which is 365.7 Mb/s.

Taking these analyses into consideration, the auxiliary data rate in the proposed transceiver is set to 80 Mb/s, within the auxiliary data rate range as analyzed above. The bandwidth of the second LPF is set to 40 MHz to filter out the high-frequency pulses while keeping the recovered auxiliary data.

Generally, there are some tradeoffs between the performance of the recovered data (BER or jitter performance) and the auxiliary data rate. Higher auxiliary data rate that is exploited from the channel bandwidth margin may degrade the BER performance of the recovered auxiliary data. Designers can set the auxiliary data rate based on their channel SNR and the BER requirements for the received data.

The auxiliary data have some limitations and affects the performance of the primary data link on jitter tolerance (JTOL) and the jitter of recovered data. The JTOL analysis in Section III-E and jitter measurement results in Section IV can depict the abovementioned limitations and impacts.

## *E. Impact of the Auxiliary Data on the Primary Data Channel—The Jitter Tolerance Analysis*

To understand the impact of the auxiliary data on the performance of the primary data channel, we evaluate the JTOL of the primary data. JTOL can be evaluated using the following equation [24]:

$$JTOL(s) = \frac{TimingMargin}{\left(1 - \frac{\varphi_{OUT}(s)}{\varphi_{IN}(s)}\right)}.$$
 (12)

JTOL is specified in the unit of peak-to-peak jitter amplitude (UI<sub>PP</sub>). Timing margin for CDR input signal is 1 UI.  $\varphi_{OUT}(s)/\varphi_{IN}(s)$  is the CDR closed-loop transfer function H(s). Since H(s) presents a low-pass response, CDR can tolerant the low-frequency jitter amplitude up to several times the timing margin.

The CDR JTOL with and without the auxiliary data is simulated by varying the frequency and amplitude of a sinusoidal jitter, and measuring the maximum sinusoidal jitter amplitude, at which the receiver guarantees the targeted BER of  $10^{-12}$ . As can be seen from the JTOL simulation results in Fig. 11, at low jitter frequency, the JTOL performance with the auxiliary data is almost the same as the performance without the



Fig. 12. Die photograph of the transceiver.

auxiliary data, whereas at high jitter frequency, the auxiliary data degrade the JTOL performance by about 0.3-0.4UI. This is because the main CDR loop has a high-pass OJTF, and the low-frequency jitter (in band) can be tracked by CDR feedback clock. Therefore, as long as the bounded deterministic jitter introduced by the auxiliary data are within the CDR timing margin, auxiliary data do not affect the low-frequency JTOL performance. However, high-frequency sinusoidal jitter (out of band) would not be tracked by the CDR feedback clock; thus, it occupies part of the CDR timing margin in the input bitstream. As mentioned above in Section III-B, the auxiliary data also appear as high-frequency deterministic jitter in the CDR input bitstream and occupy some CDR timing margin. Therefore, the performance degradation of the high-frequency JTOL between the cases with and without auxiliary data is very close to the modulated phase difference  $\Delta \varphi$  introduced by the auxiliary data, which is 0.38 UI.

The SONET OC-192 mask is also plotted in Fig. 11 for comparison. The JTOL curves of both cases (with and without the auxiliary data) exceed the SONET OC-192 mask with a margin of more than 0.2 UI. This means that the degraded JTOL performance is still acceptable for obtaining a BER of  $10^{-12}$  for the recovered primary data. The proposed auxiliary channel compromises some performances of the serial transceivers, like JTOL, but it still meets the communication standard and would not affect the operability.

## **IV. MEASUREMENT RESULTS**

The prototype IC for the proposed asynchronous serial transceiver was fabricated in a 65-nm CMOS process. The die photograph is shown in Fig. 12. The core circuit occupies about  $0.13 \text{ mm}^2$ , among which  $0.03 \text{ mm}^2$  is for the transmitter, and  $0.1 \text{ mm}^2$  is for the receiver.

Fig. 13 presents the measurement environment and connections. On the transmitter side, two synchronized signal generators (Agilent N5181A and PatternPro SDG 12072) produce three input signals, namely, TX clock input, TX PRBS primary data input, and TX PRBS auxiliary data input. On the receiver side, three recovered signals are connected to Tektronix 73304D oscilloscope to measure the eye diagrams and jitter performance. All external connections use short-reach (less than 0.5 m) coaxial cables. The control codes were generated by an I2C master board.

The modulated data are differentially transmitted between the transmitter and the receiver by the connection of a pair of



Fig. 13. Measurement environment.



Fig. 14. Recovered clock signal.



Fig. 15. Eye diagram of the recovered primary data.

differential coaxial cables. Given the 2.56 Gb/s of the primary data rate and the jitter budget provided in Section III-C, the 0.38UI (148 ps) of the modulated phase difference with  $\pm 10\%$  variation of it (30 ps) is still within the jitter budget. Typically, the phase delay mismatch caused by a pair of short-reach coaxial cables (~0.5 m in our measurement) would not exceed the margin of the PM employed in the proposed design (30 ps); thus, it would not adversely affect the function or performance of PM and demodulation.

The transceiver operates at a supply voltage of 1.2 V. The power consumption of the core circuits is 11.8 mW at 2.56 Gb/s of the primary data rate. The CDR in the receiver can cover the primary data rate from 2.2 to 3.6 Gb/s (54.7%). Fig. 14 shows the recovered 2.56-GHz clock signal with 31-ps total jitter when the auxiliary data are in the serial link. Fig. 15 shows the eye diagram of the recovered 2.56-Gb/s primary



Fig. 16. Bathtub of the recovered primary data.



Fig. 17. Eye diagram of the recovered auxiliary data.

data at PRBS-7, with 48-ps total jitter when the auxiliary data are off, and with 70-ps total jitter when the auxiliary data are on. The bathtub curves of the recovered primary data are shown in Fig. 16. The eye open width is 0.88 UI when the auxiliary data are off, and 0.82 UI when the auxiliary data are on. Fig. 17 shows the eye diagram of the recovered 80-Mb/s auxiliary data with 103-ps total jitter. The total jitter is measured under the targeted BER of  $10^{-12}$ .

While the auxiliary data degrade the jitter performance of the recovered primary data, as can be seen from the measurement results in Figs. 15 and 16, the additional jitter (22 ps) caused by the auxiliary data are relatively small, and it would not adversely affect the functionality of the primary data recovery.

#### V. CONCLUSION

A new asynchronous serial transceiver that supports an auxiliary channel yielding additional data transmission capability is demonstrated in a 0.13 mm<sup>2</sup> 65-nm CMOS IC. The prototype transceiver allows for both its primary and auxiliary data streams, at 2.56 Gb/s and 80 Mb/s, respectively, to be recovered simultaneously with good jitter and BER performance. The analysis of the auxiliary data rate that can be achieved from available channel bandwidth margin is given in this paper. The contribution supports secure methods by offering a way to utilize an additional auxiliary channel and extra data bandwidth to support potential security measures such as the inclusion of authentication data, additional support of encryption or other methods requiring another channel or

more bandwidth, steganography, etc. The additional novelty is that this auxiliary channel in the serial transceiver is provided in a way that offers backward compatibility, interoperability with nonequipped designs, and minimal redesign of existing systems.

#### REFERENCES

- S. Scott-Hayward, S. Natarajan, and S. Sezer, "A survey of security in software defined networks," *IEEE Commun. Surveys Tuts.*, vol. 18, no. 1, pp. 623–654, 1st Quart., 2016.
- [2] X. Guo, R. Dutta, and Y. Jin, "Eliminating the hardware-software boundary: A proof-carrying approach for trust evaluation on computer systems," *IEEE Trans. Inf. Forensics Security*, vol. 12, no. 2, pp. 405–417, Feb. 2017.
- [3] M. R. Stytz and J. A. Whittaker, "Software protection: Security's last stand?" *IEEE Security Privacy*, vol. 1, no. 1, pp. 95–98, Jan./Feb. 2003.
- [4] Z. Zhang, L. Njilla, C. A. Kamhoua, and Q. Yu, "Thwarting security threats from malicious FPGA tools with novel FPGA-oriented moving target defense," *IEEE Trans. Very Large Scale Integr. (VLSI) Syst.*, vol. 27, no. 3, pp. 665–678, Mar. 2019.
- [5] R. Ramirez and N. Choucri, "Improving interdisciplinary communication with standardized cyber security terminology: A literature review," *IEEE Access*, vol. 4, pp. 2216–2243, 2016.
- [6] M. Wolf and D. Serpanos, "Safety and security in cyber-physical systems and Internet-of-Things systems," *Proc. IEEE*, vol. 106, no. 1, pp. 9–20, Jan. 2018.
- [7] G. Gogniat, T. Wolf, W. Burleson, J.-P. Diguet, L. Bossuet, and R. Vaslin, "Reconfigurable hardware for high-security/ high-performance embedded systems: The SAFES perspective," *IEEE Trans. Very Large Scale Integr. (VLSI) Syst.*, vol. 16, no. 2, pp. 144–155, Feb. 2008.
- [8] S. Z. Goher, B. Javed, and N. A. Saqib, "Covert channel detection: A survey based analysis," in *Proc. High Capacity Opt. Netw. Emerg./Enabling Technol. (HONET)*, Dec. 2012, pp. 057–065.
- [9] S. Zander, G. Armitage, and P. Branch, "A survey of covert channels and countermeasures in computer network Protocols," *IEEE Commun. Surveys Tuts.*, vol. 9, no. 3, pp. 44–57, 3rd Quart., 2007.
- [10] J. M. Khoury and K. Lakshmikumar, "High speed serial transceivers for data communication systems," *IEEE Commun. Mag.*, vol. 39, no. 7, pp. 160–165, Jul. 2001.
- [11] K. Lee and J.-Y. Sim, "Half-rate clock-embedded source synchronous transceivers in 130-nm CMOS," *IEEE Trans. Very Large Scale Integr.* (VLSI) Syst., vol. 22, no. 10, pp. 2093–2102, Oct. 2014.
- [12] R. Navid *et al.*, "A 40 Gb/s serial link transceiver in 28 nm CMOS technology," *IEEE J. Solid-State Circuits*, vol. 50, no. 4, pp. 814–827, Apr. 2015.
- [13] M.-S. Chen, Y.-N. Shih, C.-L. Lin, H.-W. Hung, and J. Lee, "A fullyintegrated 40-Gb/s transceiver in 65-nm CMOS technology," *IEEE J. Solid-State Circuits*, vol. 47, no. 3, pp. 627–640, Mar. 2012.
- [14] J. Savoj et al., "A Low-Power 0.5–6.6 Gb/s wireline transceiver embedded in low-cost 28 nm FPGAs," *IEEE J. Solid-State Circuits*, vol. 48, no. 11, pp. 2582–2594, Nov. 2013.
- [15] B. Casper and F. O'Mahony, "Clocking analysis, implementation and measurement techniques for high-speed data links—A tutorial," *IEEE Trans. Circuits Syst. I, Reg. Papers*, vol. 56, no. 1, pp. 17–39, Jan. 2009.
- [16] X. Wu, Z. Yang, C. Ling, and X.-G. Xia, "Artificial-noise-aided message authentication codes with information-theoretic security," *IEEE Trans. Inf. Forensics Security*, vol. 11, no. 6, pp. 1278–1290, Jun. 2016.
- [17] P. Reviriego, S. Liu, L. Xiao, and J. A. Maestro, "An efficient single and double-adjacent error correcting parallel decoder for the (24,12) extended golay code," *IEEE Trans. Very Large Scale Integr. (VLSI) Syst.*, vol. 24, no. 4, pp. 1603–1606, Jun. 2016.
- [18] X. Wang, T. Liu, S. Guo, M. A. Thornton, and P. Gui, "A 2.56 Gbps asynchronous serial transceiver with embedded 80 Mbps secondary data transmission capability in 65 nm CMOS," in *Proc. IEEE Radio Freq. Integr. Circuits Symp. (RFIC)*, Jun. 2018, pp. 360–363.
- [19] K. Fukuda et al., "A 12.3mW 12.5Gb/s complete transceiver in 65 nm CMOS," in *IEEE ISSCC Dig. Tech. Papers*, Feb. 2010, pp. 368–369.
- [20] S. Guo et al., "A low-voltage low-power 25 Gb/s clock and data recovery with equalizer in 65 nm CMOS," in *Proc. IEEE Radio Freq. Integr. Circuits Symp. (RFIC)*, May 2015, pp. 307–310.
- [21] T. Liu, X. Wang, R. Wang, G. Wu, T. Zhang, and P. Gui, "A temperature compensated triple-path PLL with K<sub>VCO</sub> non-linearity desensitization capable of operating at 77 K," *IEEE Trans. Circuits Syst. I, Reg. Papers*, vol. 64, no. 11, pp. 2835–2843, Nov. 2017.

- [22] C. Sánchez-Azqueta, C. Gimeno, C. Aldea, S. Celma, and C. Azcona, "Bang-bang phase detector model revisited," in *Proc. IEEE Int. Symp. Circuits Syst. (ISCAS)*, May 2013, pp. 1761–1764.
- [23] A. Ghiasi, "Impact of transition density on CDR," in Proc. IEEE 802.3bs Logic Adhoc Meeting, Feb. 2017, pp. 1–9.
- [24] S. Palermo, CMOS Nanoelectronics Analog and RF VLSI Circuits. Chaper 9: High-Speed Serial I/O Design for Channel-Limited and Power-Constrained Systems. New York, NY, USA: McGraw-Hill, 2011.



Xiaoran Wang (S'15) received the B.S. degree in electrical engineering from the South China University of Technology, Guangzhou, China, in 2013, and the M.S. degree in electrical engineering from Southern Methodist University, Dallas, TX, USA, in 2015, where he is currently working toward the Ph.D. degree at the Department of Electrical Engineering.

His current research interests include analog/mixed circuits in high-speed wireline link design, including high-performance PLL, CDR, and transceiver design.



**Tianwei Liu** (S'14–M'16) received the B.S. degree in electrical engineering from Xi'an Jiaotong University, Xi'an, China, in 2002, the M.S. degree in electrical engineering from Shanghai Jiao Tong University, Shanghai, China, in 2005, and the M.S.E.E degree from Southern Methodist University, Dallas, TX, USA, in 2016.

He was with Shanghai Huahong NEC Electronics Company, Ltd., Shanghai, from 2005 to 2007, where he was involved in PLL IPs development. From 2007 to 2010, he was with Integrated Device

Technology (IDT) Inc., San Jose, CA, USA, focusing on low jitter frequency synthesizer design. From 2010 to 2013, he was with LSI Corporation, San Jose, CA, USA, focusing on high-speed SerDes design from 10 up to 28 Gb/s. He was with Analog Devices, Inc., Shanghai, China, from 2013 to 2014, focusing on JESD204B SerDes design for AD/DA interface circuits.



Shita Guo (S'14–M'15) received the B.S. and M.S. degrees in electrical engineering from the University of Science and Technology of China, Hefei, China, in 2006 and 2009, respectively, and the Ph.D. degree in electrical engineering from Southern Methodist University, Dallas, TX, USA, in 2015.

From 2009 to 2011, he was with Sychip Inc., Shanghai, China, where he was involved in RF integrated passive devices and wireless subsystems for mobile products. Since 2014, he has been with Texas Instruments Inc., Dallas, TX, USA, with the

High-Speed Interface Group. He has authored or co-authored over 20 peerreviewed publications. His current research interests include RF/millimeterwave integrated circuits (ICs) for wireless communication system, and high-performance analog and mixed-signal IC for high-speed serial data links. Dr. Guo is a member of the IEEE Solid-State Circuits Society and the IEEE Circuits and Systems Society.



Mitchell A. Thornton (M'85-SM'99) received the B.S. degree in electrical engineering from Oklahoma State University, Stillwater, OK, USA, in 1985, the M.S. degree in electrical engineering from the University of Texas-Arlington, Arlington, TX, USA, in 1990, and the M.S. degree in computer science and Ph.D. degree in computer engineering from Southern Methodist University, Dallas, TX, USA, in 1993 and 1995, respectively.

He was a Senior Electronic Systems Engineer with E-Systems, Inc., Greenville, TX, USA, from 1986 to

1991. He was a Design Engineer with Cyrix Corporation, Richardson, TX, USA, from 1992 to 1993. He has served as a full-time Faculty Member with the University of Arkansas, Fayetteville, AR, USA, from 1995 to 1999, and Mississippi State University, Starkville, MS, USA, from 1999 to 2002, and currently holds the Cecil H. Green Chair of Engineering with Southern Methodist University where he is a Professor. He serves as the Technical Director and the Interim Associate Director of the Darwin Deason Institute for Cyber Security. He is a Licensed Professional Engineer in the states of Arkansas, Mississippi, and Texas. His current research interests include electronic design automation algorithms for synthesis and verification, computer arithmetic, disaster tolerant systems and modeling, and computer security hardware.

Dr. Thornton has served as a Chair of several committees within the IEEE.



Ping Gui (S'03-M'04-SM'09) received the Ph.D. degree from the University of Delaware, Newark, DE, USA.

She is currently a Professor with the Department of Electrical Engineering, Southern Methodist University, Dallas, TX, USA. Her current research interests include analog, mixed-signal, and RF/millimeterwave IC for a variety of applications including high-speed wireline transceiver design, wideband wireless communication using millimeter-wave, high-speed ADC/DAC design, and circuits and

systems for extreme environments. Dr. Gui was a recipient of the CERN Scientific Associate Award from

2008 to 2010, the IEEE Dallas Section Outstanding Service Award in 2011, and the Gerald J. Ford Research Fellowship at SMU in 2015.