Generalized I/O timing analysis: post-layout timing analysis is a must in product development, if PCBs are to work not just in the lab, but in the field.
Timing analysis is conducted pre- and post-layout. In pre-layout calculations, we investigate possibilities (maximum number of devices, maximum data rate, etc.) and determine PCB trace-length constraints, which ensure minimum zero timing margins (FIGURE 1). In post-layout timing analysis, we verify if the finished board will work reliably (with positive timing margins) in all cases. The worst-case combination of the timing parameters must be considered, usually specified as a range of statistically possible values. By rearranging the variables, maximum clock frequency could be calculated. (1)
[FIGURE 1 OMITTED]
Bidirectional buses move data in two different directions. These are called read and write, although in some multi-master interfaces it is not clear which transaction is which, necessitating a separate timing analysis for each driver/receiver combination.
Separate setup and hold analyses must be performed for both reads and writes. If the setup analysis fails (setup violation), then when the capture flip-flop captures the signal on its input, it might not have finished transitioning to the new logic level. If the hold analysis fails (hold violation), then the capture flip-flop might capture the signal after it has started transitioning to the next logic value.
Analysis can be performed either between package-pins (based on data sheet parameters), or in a complete system-on-board and on-chip. If based on data sheet parameters, use the generalized equations; if detailed on-chip-timing data are available, use the timing graph approach (FIGURE 2).
[FIGURE 2 OMITTED]
Every system type has a D-flip-flop launching the data signal (data, address, command, or, as here, "data") to the bus at the edge of the reference signal (signal connected to its CK-pin, a clock or a strobe signal). Another D-flip-f]op captures the data at a given rising or falling edge of the reference signal. Usually the two flip-flops are on different chips, and the data and clock paths contain PCB interconnects, as well as on-chip delays. The data signal usually is a group of signals, so the timing requirements have to be met for each.
Chip designers usually use I/O flip-flops (or in some cases, transparent latches) close to the I/O pins, and usually launch/capture the data to/from the on-chip core logic, as separate transactions from the on-board transactions (FIGURE 3). On a bidirectional interface, there is an I/O buffer and a launch and a capture flip-flop on the same chip, although only one of them is used in one bus transaction.
[FIGURE 3 OMITTED]
A strobe signal is not a free-running clock; it has an edge only when needed, while a clock signal toggles all the time when power is applied. Both types feed I/O flip-flop CK pins.
Signal states on a bus:
* Bit-N is stable/valid.
* Previous or next bit is stable/valid.
* Signal is in the process of transitioning (between thresholds), so it's not valid.
* On a bus, the data are only valid when all lines are valid.
There are timing parameters describing components, cells in components and interconnects on-chip and on-board.
Output guaranteed timing. Worst case late/early times before/until the signal at the output of a component or cell may become/stay valid, guaranteed by the chip manufacturer/designer at nominal load. The on-chip delays with the flip-flop timing can be combined together at the chip/die pin/pad (FIGURE 4). The D-type flip-flop generates the data on its Q output pin as the effect of the signal edge on its CK-pin. This takes time for the flip-flop, and this time is called clock-to-output delay (t_CK-Q).
[FIGURE 4 OMITTED]
The flip-flop output pin or I/O buffer has to drive an interconnection (on-chip or on-board), and it takes additional time until the output voltage reaches the new logic value (Transition Delay). This is included in the data sheet output timings, and depends on the output loading, and so usually is specified at a test/reference-load condition. The nominal/test load has to be specified in the data sheet, for example 50[OMEGA] and 50pF in parallel to VDD/2 voltage. If the loading condition is different from the nominal load, then extract tile on-chip portion to handle it separately.
Output setup time (t_OSU). Maximum clock-to-output-valid delay (t_CK-Q_max) at nominal load (FIGURE 5). It describes the longest (statistically) time it takes for the signal at the output pin to settle at the new voltage level, so it is the maximum clock-to-output delay. This parameter has different names in the different vendor-data sheets (t_OSU, t_VAL_max, t_acc, etc.).
[FIGURE 5 OMITTED]
Output hold time (t_OH). Minimum clock-to-output-invalid delay (t_CK-Q_min) at nominal load. It describes the shortest time for the previous bit at the output pin to remain valid after the clock-edge has already ordered the transition to the new bit value, so it is the minimum clock-to-output delay. This parameter has different names in the different vendor-data sheets (t_OH, t_VAL_min, etc.).
Output skew (t_SK_Out). Maximum clock-to-output-valid delay-difference between two signals at two output pins. The skew can be specified in different ways: as a peak-to-peak value, or as a min/max value pair referencing one signal to the other signal (FIGURE 6).
[FIGURE 6 OMITTED]
Peak-to-peak: t_SK_Out = Max(t_OSU1, t_OSU2) - Min(t_OH1, t_OH2)
Min/Max: t_SK_Min = t_OH_datat_OSU_Ref, which is usually negative, and t_SK_Max = t_OSU_datat_OH_Ref
In some cases they specify: t_SK_Out = (t_OSU_Ref - t_OH_Ref) + (t_OSU_data - t_OH_data)
So, to use these parameters, we might have to transform them to a different form to fit in the timing calculations.
Worst-case times before/until the signal must become/stay valid at the input pin. The signal timing at the input pin must be at least this good, guaranteed by the system/board designer. These are originally the parameters of the D-type flip-flop capturing the data signal, but they can be combined with on-chip delays between the flip-flop pins and the die/package pins (FIGURES 7 AND 8). The D-type flip-flop captures the data on its D input pin as the effect of the signal edge on its CK-pin.
[FIGURE 7 OMITTED]
[FIGURE 8 OMITTED]
Input requirement (slew rate) derating. Some standards take into account the effect of the input signal slew rate. Usually they specify the input requirements at a given input/reference signal slew rate combination, and provide "derating tables" or formulas to calculate t_ISU and t_IH for other slew rates. The transistor inputs need a certain amount of charge to build up to make the transistor switch to the new logic level. For DDR2-SDRAMs, they specify AC thresholds as Vref+/-175mV. The required charge is determined by the chip designer. The AC threshold levels describe the voltage that the waveform has to reach at nominal slew rate to accumulate enough charge. If our slew rate is different, then the threshold level to cross will be different. Instead of recalculating the AC threshold levels for the given slew rate, the JEDEC memory standards (2) handle this by introducing the slew rate derating, by keeping the AC threshold constant and modifying the input timing requirements, compensating for the extra time it takes to finish accumulating the charge. Note that the slew rate does not change the input setup/hold requirements; we just apply this simplification instead of changing the threshold levels to make the analysis work easier. The AC/DC expansion creates four thresholds (high/low-AC/DC) instead of the original one for Vref based or differential I/O (or the original two in CMOS/TTL). Most standards don't use derating. (3)
Input setup requirement (t_ISU). The signal has to be already stable/valid at the D input pin of the flip-flop at least t_SU before the edge on the CK pin, to make sure that the right data value is safely captured. This parameter has different names in the different vendor-data sheets: t_ISU, t_SETUP, t_DS. This parameter often is called "Setup Time," although some documents refer to something else with the same name: for example, actual value of arrival time.
Input hold requirement (t.lH). The signal has to stay stable/valid at the D input pin of the flip-flop at least t_IH after the capturing edge on the CK pin, to make sure the right data value is safely captured. This parameter has different names in the different vendor-data sheets: t_IH, t_HOLD, t_DH. This parameter often is called "Hold Time."
Input skew (t_SK_In). This specifies the maximum allowed skew/deviation between the signal/edge arrival times (signal becomes/stays valid) at two specified input pins on one chip. These can be data-data, reference-data, or reference-reference combinations.
Propagation delay (t_pd) describes interconnect delays (output-buffers with the interconnections) and/or combinatorial logic-path delays.
Interconnect delay is made of two big parts: transition delay (I/O-buffer + load related delay) and flight time (PCB trace length-related delay). A signal-integrity simulation measures the combination of these, but doesn't include flip-flop timings. Transition delay is the time needed for the signal to reach the new logic value and become valid. Flight-time delay is the time it takes for a signal to travel through the interconnection. For PCB design, we need length constraints that are directly proportional to flight-time.
To be able to determine accurate constraints, separate them from the transition delays. This can be done via signal integrity simulations with realistic load conditions. The length variation within a narrow range will have a negligible effect on transition delays. If two signals propagate on different layers with different effective dielectric constants, then they have different propagation velocity, so they should be propagation delay matched, which is different from simple length matching.
In case of meander routing, which is common for length matching, the signal may propagate faster than it would on a straight PCB trace because of crosstalk between the meander segments. This is one of the reasons to do post layout verification with a proper 3D EM simulator.
Propagation delay is measured from simulation run time zero until the logic-threshold level crossing at the receiver chip pin. We have to simulate both the rising and the falling waveforms, at both "Fast" and "Slow" I/O buffer IBIS model setting. Then finally take the shortest time as t_pd_min, and the longest time as t_pd_max. Usually we measure these on an eye diagram, where most of the signal integrity effects can be included.
The chip data sheet output timing usually is a combination of the transition delay at the output pin at test-load condition and the on-chip delays and flip-flop timings (FIGURE 9). The test load has to be specified in the data sheet. If a loading condition is different from the test load, extract the on-chip portion of the data sheet parameters, because the transition delay will have to replace the real one instead of the test-load-based one. This can be done by simulating signal integrity to get the test-load transition delays, and subtracting these from the data sheet values. This gives the on-chip parameters (t_OD_min/max) to put in the timing budget calculation separately.
[FIGURE 9 OMITTED]
To extract on-chip output delays:
1. Take t_OSU and t_OH from the chip data sheets.
2. Do a signal integrity simulation with test-load condition; to determine the min/max transition delays, use "Fast" IBIS buffer setting for min, "Slow" for max. Measure both rising and falling edges, using the earlier crossing for min, and the later for max. Measure it from simulation time zero until the signal crosses the new logic level threshold.
3. Calculate on-chip output delay portions:
t_OD_max = t_OSU - t_Transition_min
t_OD_min = t_OH - t_Transition_max
To extract real transition delays:
1. Simulate waveforms at the receiver pin with realistic load condition using estimated trace lengths; measure the delays as t_pd_min/max (from Fast/Slow IBIS options).
2. Extract the internal chip delays (t_OD_min, t_OD_max) from the data sheet t_OSU/t_OH parameters.
3. Calculate flight time:
t_flight_time = [[Length[SIGMA][square root of (r_eff)]]/c]
[[epsilon].sub.r_eff] is the average or effective Dk of the materials surrounding the trace on the PCB; c is the speed of light. For outer layers, the material is partly air, partly FR-4. An estimation for outer layer can be
[.sub.r_eff] = [[r_FR4.sup.+1]/2]+[[r_FR4.sup.1]/2][[SIGMA][(1+12[SIGMA][h/w]).sup.[1/2]]
where h is FR-4 dielectric thickness measured to the ground plane and w is trace width. For innerlayers with two different FR-4 materials, the estimation can be
[.sub.r_eff] = [[.sub.r1] [SIGMA][h.sub.2] + [.sub.r2] [SIGMA][h.sub.1]]/[h.sub.1] + [h.sub.2]].
[[epsilon].sub.r] should be provided on the signaling frequency (use f_knee = 0.5/t_rise).(4)
4. The transition delay:
t_transition_delay_x = t_pd_x - t_flight_time where "x" is min or max.
Combinatorial logic circuits also cause delays. These can be on-chip or on-board logic: for example, a byte-swapping CPLD on a VME bus. We can get the delay values from the on-chip STA report. The combinatorial logic gates or multiplexers can be treated as delay elements (they have timing arcs) in the aspect of STA. For STA, the actual functionality of the logic doesn't matter, since it is a worst-case analysis. Flip-flop D-to-Q arc is NOT combinatorial; it breaks timing paths, although CK-to-Q arcs can be included.
PCB skew is the propagation delay difference between two signals. If all traces have similar drivers and trace lengths, then the propagation delay and the transition delay will be very similar. This simplifies the problem to trace length matching. The matching is designed with an allowable tolerance as a constraint.
Errors and Jitters
Error and jitter parameters always decrease timing margins, so they have to be subtracted from the timing budgets. Here, they are called errors, and their sum is [summation over (m)](t_err.sub.m). We only have to take into account half bit time of it, for both the setup and the hold analysis, which is t_err = t_err_pp/2.
Calculate crosstalk or power supply noise-related jitter from voltage noise in the following way:
t_err_pp = V_noise_pp/Slew_Rate
where slew rate is the victim signal slew rate in V/ns. The best method is a signal integrity simulation that injects all possible noise and crosstalk sources, and only the remaining ones are included in the timing calculation.
The various types of errors:
* Clock-jitter. Some component data sheets include the maximum allowed input reference clock jitter in the data output timing uncertainty, while others specify output timing at ideal reference clock. (5)
* ISI (Inter Symbol Interference). At high speeds, the previous bit values have an effect on the location of the signal edge. This is best simulated in eye diagrams for long PRBS (Pseudo-Random Binary Sequence (5)) bitstreams.
* Crosstalk effects. Crosstalk has two effects: false detection on the settled signal, and speeding or slowing of a signal edge.
* Power supply noise. Supply-voltage noise shifts the output voltage and input threshold voltage levels. A signal integrity simulation with power plane model is needed to investigate, or simply change the thresholds by the noise amplitude/2.
* Vref noise. This shifts the detection thresholds apart.
* Capacitive load mismatch. The transition delay depends on load capacitance, among other parameters. Perform two simulations: one with and one without an extra capacitor connected at the load. Measure the t_pd difference between the two simulations to get the t_err_pp.
* Termination resistance mismatch. Perform a signal integrity simulation with ODT=off in the IBIS model selector, and manually connect external termination resistors with a value of nominal+/-tolerance. Take the t_pd difference as t_err_pp. (6)
* Duty cycle distortion (DCD). The clock is not perfectly symmetrical; high and low time durations therefore are not equal. It is t_err_pp = (t_high - t_low). Normally the nominal period is used and the duty-cycle distortion taken into account. Some standards (such as DDR memories) specify a parameter that includes all these as a t_HP clock-half-period, which is used as the analysis starting point.
* DLL-delay-error. In some interfaces, reference or data signal is delayed by an on-chip DLL circuit to align them to the correct capturing position. This delay has a tolerance, and the deviation from the ideal value is the DLL delay error.
* Propagation delay tolerance. The PCB trace lengths and on-chip delays can be constrained with a tolerance. To include a propagation delay in the budget, use its min/max values appropriately, or use its nominal value and include its tolerance as a t_err parameter.
In different data sheets or standards, the same timing parameters can have different names, or different parameters can have the same name. In other cases, the way they measure the timing parameter can be different. For example, some documents refer to Input Setup Requirement as "Setup Time," while some other documents call the actual value of arrival time or an Output Setup Time with the same name. In some data sheets, a parameter is referenced to the previous clock edge, while others reference it to the following clock edge. Before using a parameter, check how they are specified, and if it is necessary, then transform them to the way presented here:
t_x [right arrow] (t_bit - t_x) [+ or -] t_bit*(k/2)
t_x [right arrow] (t_bit - t_x) [+ or -] t_bit* (k/2)
where k is an integer.
In most data sheets, the skew between two signals is specified as the deviation between them, while other data sheets name a parameter as skew, even if that parameter describes the t_data_period - t_skew value, so basically the skew-less region of the bit-time.
Timing parameters can be specified at the chip-package pin, at the silicon-die-pad (large BGAs), or at the I/O flip-flops on the silicon. In case of the die-pad specification, they provide the routing lengths inside the package, called "package length."
Chip I/O timing parameters contain the flip-flop timing parameters and on-chip interconnect, buffer and clock-tree delays combined. In case of in-house chip/FPGA development, internal parameters might be specified separately.
For those interfaces where absolute delays are not important but relative delays are, they provide skew instead of t_OSU/t__OH values. This simplifies the board-level timing analysis, so the board designer doesn't have to worry about the on-chip signal relationships that the chip designers have already taken care of.
Chip/FPGA designers can set up timing constraints to produce a chip layout at least as good as the data sheet parameters describe. Constraints are based on standard values or are intuitively based on the target speed. If the chip and board are designed at the same company, then the chip/board-design constraints may be adjusted together to achieve best timing. (7)
In some interfaces, there can be different reference signals for the different signal-groups. To maintain the correct timing for some on-chip circuits, additional timing requirements have to be met in the board design between the different reference signals. Examples are the DDR-SDRAM memory clock-to-strobe matching or serial link lane-to-lane matching.
To get accurate propagation delays on interconnects including various signal integrity effects, perform signal integrity simulations using I/O buffer models (IBIS models). On the other hand, the timing analysis can be considered as the way to quantify SI simulation results. For a pre-layout simulation, the interconnect is modeled as a set of simplified transmission line models, while for the post-layout analysis, we extract interconnect information from the routed layout design. The result of the SI analysis is a set of waveforms. We have to measure times and voltages on these; this could be called "geometrical waveform analysis." Finally, supply the timing information to the timing calculations. We also have to determine signal slew rates, for input requirement derating.
These simulations and the IBIS I/O buffer models don't include on-chip and flip-flop timing, only the interaction between traces and I/O buffers. Because of this, the starting point on the waveform is the simulation time zero, and also the output buffer excitation starting time.
The IBIS models have a selector parameter: Fast/Slow/Typical. These are the silicon manufacturing, supply voltage and temperature-related speed parameters, corners. The speed does not vary (statistically) much on the same chip; it varies much more between different chips. Find the worst, but still realistic, speed combination of the signals. Input logic level threshold/decision types:
* Single-ended V_IH/V_IL based (for example, CMOS or TTL). The logic value is high if the V_in>V_IH, low if V_in<V_IL; between them it is invalid.
* Single-ended Vref based (for example SSTL). The logic value is high if the V_in>Vref, low if V_in<Vref.
* Differential. There is a positive (P) and a negative (N) signal in a differential pair, and if the voltage on P is higher than on N, then it is logic high, otherwise low.
* SDR (single data rate): Data are launched/captured at only one (usually rising) edge of the reference signal. Sometimes the capture edge is the rising edge, and the launch edge is the falling edge, but it is still SDR.
* DDR (dual data rate): Data are launched/captured at both rising and falling edges of the reference signal. This can be done by two separate flip-flops or by a DDR flip-flop.
* QDR (quad data rate): The data signal is launched/captured at both edges of the reference signal, but also half way between the edges.
Pre-layout: Get on-chip delays and realistic transition delays. Use estimated trace lengths for realistic transition delay extraction, reference signal t_pd and for relative t_pd rules. Use zero length and nominal load for t_OD extraction. Use the on-chip delay (t_OD) parameters in the timing calculations. Use the transition delay in the length calculations.
Post-layout: Get propagation delays, using exact trace lengths. Also, the on-chip delay (t_OD) parameters must be extracted using nominal load. The routing skew has to be at the extremes during the simulation.
The times on the waveforms have to be measured between points where the signal crosses the logic threshold voltage levels. AC and DC level offsets might be applied to these because of the capacitance of the input buffers (slew rate derating). Until the signal doesn't cross the DC threshold level, it is considered as holding the previous bit value and stable. The signal is stable/valid at the new bit value after it has crossed the AC level. Therefore, the AC levels on the data signals are used for setup analysis, and the DC levels for hold analysis, although it's the opposite for the reference signals. Most standards don't specify AC/DC expanding.
The geometrical timing measurements can be done as absolute time measurements (t_pd from simulation time zero), or as relative time measurements (between the reference signal and the data signal). If we do relative measurements, then we get an intermediate timing margin (t_SU_MAR1, and t_H__MARl). In the timing calculations after the simulation, subtract the remaining parameters from this to get the final timing margins.
For t_pd_min measurements, use "fast" IBIS model setting and measure until the DC threshold, and take the smaller of the rising/falling values. In case of the t_pd_max, use "slow" IBIS setup until the AC threshold and take the greater of the rising/falling values. The IBIS model Fast/Slow settings have to be chosen based on what is realistic and worst case at the same time.
a) Two different chips drive the data and the reference signal. Assume that one of them is in the "fast" and the other one is in the "slow" corner worst case.
b) The data and reference signal are driven by the same chip. Assume negligible difference in speed capabilities. Set the IBIS models to "fast" for both or "slow" for both. Try both speeds and take the worst margins.
Simulated eye diagrams wrap the timeline around, so every second bit appears to be delayed by only t_pd from simulation time zero. This way t_pd can be measured on multiple bits at the same time. If oscilloscope measurements are performed in the lab instead of simulation, then the opportunity is missed to know where the time zero is, so then only relative times can be measured on those waveforms.
Geometrical timing measurements on waveforms. The t_pd_min is measured from simulation time zero until the signal's previous bit value goes invalid (earliest DC crossing). The t_pd_max is measured from simulation time zero until the signal actual bit value goes valid (latest AC crossing) (FIGURE 10). The t_SU_MAR1 is measured from the moment when the data signal goes valid (AC), until the reference signal goes invalid (DC). The t_H_MAR1 is measured from the moment when the reference signal goes valid (AC), until the data signal goes invalid (DC) with the actual bit value. The moments when these valid-invalid transitions happen depend on the input logic level decision thresholds and the AC/DC expansion.
[FIGURE 10 OMITTED]
With a DLL the reference signal edge can be aligned within the data valid region at the capture flip-flop's pins. This might not be the case on the board traces if the DLL is on the receiving chip. Either do the analysis at the capture flip-flop on-chip, or do the analysis on the board with adding the DLL delay to the reference signal excitation.
If the device or standard specifies input requirement derating, then measure on the waveforms at realistic load and trace lengths. Best is to get the slew rate between the logic threshold levels. For Setup, measure between Vref (or nominal threshold) and V_AC crossing, while for Hold measure between V_DC and Vref.
Crosstalk to the data signals can increase or decrease transition time, and this way, the propagation delay. If the noise voltage is too high and it happens during sampling, then it may cause false detection. Crosstalk to clock or strobe signals increases the clock/strobe jitter. The simulation setup for signals within a group could be two drivers with maximum skew between them, switching in the same and opposite way. In case of different signal groups, the best is to set the skew between them to be t_rise for the simulation. The power supply noise affects simulation and requires an S-parameter model containing the interconnects and power planes, or a simple setup where the IBIS TX-buffer power pins have a DC-supply and an AC-noise voltage source in series. In all cases measure t_pd with and without aggressor; the difference is a t_err parameter.
Generalized Timing Equation
The generalized equation:
0 [less than or equal to] t_X_MAR = t_AVAILABLE+[summation over (i)]t_[improving.sub.i] - [summation over (k)]t_deg [rading.sub.k] - [summation over (m)]t_[err.sub.m]
"X" can be read-setup, read-hold, write-setup, write-hold. t_X_MAR is the timing margin; t__AVAILABLE is the time available for the analysis. t_err parameters are various signal integrity effects; they always decrease the timing margins, and are not delay-related.
Which timing parameters improve and which degrade the timing margin depends on the interface type. If the improving parameter is bigger, then it helps increase the timing margin. If the degrading parameter is bigger, it decreases the margin. Before putting any parameter into the calculations, transform them (if necessary), removing the negative sign (if they have a negative sign). In rare cases, a data sheet parameter's negative sign really means negative, so an engineering judgment is needed. In example a) if they specify t_ISU= -1ns, then it may mean that the chip actually tolerates if the data arrive Ins after the sampling edge.
When the data and the reference signal are going in opposite directions, in the setup analysis the data delay is degrading, and the reference signal delay is also degrading; for hold analysis the data are improving, and the reference signal is also improving. When they go in the same direction, then for setup the data are degrading and reference signal is improving, while for hold the data are improving and reference signal degrading.
Instead of data sheet t_OSU/t_OH parameters and flight-time, use extracted t_OD and simulated t_pd_min/max at real load conditions. In the degrading list use, t_pd_max, and in the improving list use, t_pd_min, except if the data/reference signals are driven by the same chip. In that case both are minimum or both are maximum.
DLL delays (usually t_bit/2) can be added to the available time and DLL-delay-error included as an error parameter, or can be included among the improving or degrading parameters with its absolute value. Both cases may need the "Shift Rule."
Few parameters are included in the signal integrity analysis result, so in the calculation, only the remaining ones must be considered. If relative timings on the waveforms were measured, then the final calculation looks like this:
0 [less than or equal to] t_X_MAR = t_X_MAR1+[summation over (i)]t_[improving.sub.i] - [summation over (k)]t_deg [rading.sub.k] - [summation over (m)]t_[err.sub.m]
Maximum data rate calculation. The maximum data rate is the speed where zero is obtained for one of the margins, t_required = [summation over (i)] t_deg [rading.sub.1] - [[summation over (k)]t_[improving.sub.k] + [summation over (m)]t_[err.sub.m]
Calculate the required time for both setup/hold and read/write then take the biggest number. But only do those calculations where the t_AVAILABLE is not specified as zero in the table. The minimum bit period is t_bit_min = [2.sup.*] t_required_max. The maximum data rate is 1/t_bit_min.
TABLE 1. The Parameters CASE T_AVAILABLE T_IMPROVING (LIST) Synchronous RD-SU T_clk t_OD_min_slave, t_pd_data, t_pd_clk RD-HOLD 0 t_pd_clk WR-SU T_clk t_OD_min_master, t_pd_data WR-HOLD 0 Asynchronous RD-SU RD# t_PD_min_slave, t_pd_data, pulse-width t_pd_strobe RD_HOLD 0 t_pd_strobe WR-SU WR# master, t_pd_data, t_pd_data pulse-width WR-HOLD 0 Source Synch. RD-SU T_bit/2 RD-HOLD T_bit/2 WR-SU T_bit/2 WR-HOLD T_bit/2 Clock Forwarding RD-SU T_clk t_pd_clk RD_HOLD 0 t_OD_min_slave, t_pd_data WR-SU T-clk t_pd_clk WR-HOLD 0 t_OD_min_master, t_pd_data Embedded.clk WR-SU T_clk/2 WR_HOLD T_clk/2 CASE T_DEGRADING (LIST) Synchronous RD-SU t_ISU_master, t_OD_max_slave, t_pd_data, t_pd_clk RD-HOLD t_IH_master WR-SU t_ISU_slave, to_OD_max_master, t_pd_data, t_IH_slave, t_pd_clk WR-HOLD Asynchronous RD-SU t_ISU_master, t_OD_max_slave, t_pd_data, t_pd_strobe RD_HOLD t_IH_master WR-SU t_ISU_slave, t_OD_max_master, t_pd_data, WR-HOLD t_IH_slave, t_pd_strobe Source Synch. RD-SU t_ISU_master_(t_pd_data-t_pd_str), t_DQSQ_max RD-HOLD t_IH_master, (t_pd_str-t_pd_data), t_QHS WR-SU t_ISU_slave, (t_pd_data-t_pd_str), t_master_skew_max WR-HOLD t_IH_slave, (t_pd_str-t_pd_data), (-1 * t_master_skew_min) Clock Forwarding RD-SU t_ISU_master, t_OD_max_slave, t_pd-data, t_IH_master, t_pd_clk RD_HOLD t_ISU_slave, t_OD_max_master, t_pd_data, t_IH_slave, t_pd_clk WR-SU WR-HOLD Embedded.clk WR-SU WR_HOLD Note 1: Most of the synchronous systems have t_AVAILABLE_hold = 0, t_AVAILABLE_setup = t_clk_period. There are systems where we launch the output at the falling edge and capture input at the rising edge, for which t_AVAILABLE_setup = t_AVAILABLE _hold = T_clk_period/2. Note 2: If the clock is supplied by the master to the slave chip, then t_pd_clk is straightforward. If it is routed to both the master and the slave, then use t_pd_clk = t_pd_to_slave - t_pd_to_master. If it is known as a clock skew, then for the degrading list t_pd_clk = t_clk_skew_pp, and for the improving list t_pd_clk = -1 * t_clk_skew_pp. Note 3: All the parameters here are receiver-chip design internal delays. Note 4: The terminology here is from the DDR2-SDRAM standard. Instead of t_OSU/t_OH (which depends on t_OD), they specify their differences as skew. Instead of having one of the t_pd_data or t_pd_str as degrading and the other as improving parameter, we have +/-(t_pd_data_x - t_pd_str_x) as a degrading parameter, which is actually the skew from the data sheet.
To program an asynchronous bus interface, calculate the required time, then determine the number of system clock cycles needed by checking t_required/t_clk_period and rounding it up to the nearest integer number. The result has to be programmed into the bus interface control registers.
Minimum strobe pulse width. This is the minimum duration for an RD# or a WR# or ALE strobe pulse. Take the parameters for the WR# from the write setup parameter list in the spreadsheet. For the RD#, take the parameters from the read setup parameter list.
t_pulse_min [greater than or equal to] [summation over (i)]t_[degrading.sub.i] - [summation over (k)]t_[improving.sub.k] + [summation over (m)]t_[err.sub.m]
Programmable hold time. Take the parameters for the write from the write hold parameter list in the spreadsheet. For the read calculation, take them from the read hold list.
t_phold [greater than or equal to] [summation over (i)]t_[degrading.sub.i] - [summation over (k)]t_[improving.sub.k] + [summation over (m)]t_[err.sub.m]
Programmable setup time. This is used to satisfy the address setup requirements and to improve the data setup margins. On a typical asynchronous microcontroller or DSP bus, there can be two cases:
* The address is driven to the bus in the same time as the CS# (chip select) and not captured, just asynchronously selects the data register inside the slave. The required time for the address setup:
t_su_addr_req [greater than or equal to] t_OD_addr + t_pd_addr + t_ISU_addr-t_pd_str-t_OD_str + [summation over (m)]t_[err.sub.m]
Strobe here is the RD# or WR# strobe. t_psu = t_su_addr_req to be programmed.
* The case of the multiplexed address/data bus. The address is captured at the edge of an ALE (address latch enable) signal; then the master switches to databus mode. We need to keep the ALE asserted for at least t_su_addr_req time, then de-assert and wait t_h_addr_req with the RD#/WR# data strobe assertion. Calculate the required master clock periods for both address setup and address hold; add them together to get the t_psu. We also have to program the ALE pulse width that is minimum t_su_addr_req.
Complete System Timing Graphs
The data signal always propagates from the launch flip-flop to the capture flip-flop. The reference signal always propagates from a clock generator to both the launch and the capture flip-flop. We can add the segment of the reference signal from the clock generator to the launch flip-flop, to the data path, so then we get two paths: the "data-path" and the "reference-path." Both may propagate through multiple circuit nets. The flip-flop timings have to be included in the paths. This way both signals propagate between the same two points in the system. The Start point is the clock generator, and the End point is the capture flip-flop CK pin. From the Start point to the End point there are exactly two valid paths through the system. The data path is the one going through the capture flip-flop's D-pin. The reference path goes through the capture flip-flop's CK pin without touching the D-pin. Both paths can be walked only in the direction of the signal propagation. Both paths may include any number of flip-flop CK-Q arcs. However, the segment between the launch and the capture flip-flop cannot include more flip-flops because they would break the timing paths.
To perform timing analysis with timing graphs, all on-chip delays must be known, which is usually not the case. If the chip data sheet specifies a skew between the data and reference signal, then we might transform it to on-chip delays. For example, set t_OD_ref=0 and set t_OD_data=t_skew. If some of the delays in the two paths are identical, they can be ignored in the analysis. Chip/board designers can take advantage of this by using identical (matched with a tolerance) trace lengths and use identical I/O circuits in the two paths, to equalize delays or to simplify the STA (FIGURE 11).
[FIGURE 11 OMITTED]
We introduce two conventions here to simplify the graph-based analysis:
* Using a negative value for the input hold requirement timing parameter (t_IH [right arrow] -1 * t_IH). This way the same end point can be set for both the setup and the hold analysis.
* Shift rule. If one of the setup/hold margins is negative, then shift them until both become positive. The logic design or protocol must expect the valid data in the previous or next period. The shift operation is simply t_SU_mar [right arrow] t_SU_mar -/+ t_data_period and t_h_mar [right arrow] t_h_mar +/- t_data_period, so shift them in the opposite direction. Some designs are based on this: Delaying by t_DLL has the same effect on STA as being early by t_data_per - t_DLL.
To perform the analysis, write a delay-budget on both paths from the Start until the End, to get t_del_data and t_del_ref. This is basically a weighted graph, where we have to determine the weight of the two paths. After this, we can calculate the timing margins as:
0 [greater than or equal to] t_SU_MAR = t_bit + t_del_ref-t_del_data - [summation over (m)]t_[err.sub.m]
0 [greater than or equal to] t_H_MAR = 0 + t_del_data-t-del_ref - [summation over (m)]t_[err.sub.m]
For PCB traces, it is best to perform a signal integrity eye diagram analysis to get separate absolute propagation delays for the data and reference signals. The ISI/crosstalk/SSN effects can be included in the simulation for t_pd or can be separately taken into account as t_err in the final calculations.
For single data rate synchronous systems, if the falling edge is used to launch the data and the rising edge is used for capturing, there is an inverter in the system. An inverter on a clock signal is a delay element with half the clock period delay.
I/O Interface Types
Examples of synchronous systems are Single Data-rate SDRAM memories and PCI-bus. The main feature of the synchronous systems is that all data launch and capture happens on the edge of the same free-running clock (reference signal). In most cases the rising edge of the clock is used, although there are systems where they make use of both edges; for example, they launch the data at the falling edge and capture at the rising edge, or they launch/capture at both rising/falling edges (double data rate). The former can be useful for slow buses with potential hold violations, by balancing the setup/hold margins (FIGURE 12).
[FIGURE 12 OMITTED]
The clock propagation delay to each flip-flop in the system is preferred to be equal. This is why clock trace lengths (minimize skew) are matched to every device. If not matched, then one of the margins is reduced by the difference. In some cases the master chip provides the clock to the slave (for example, a DSP to the SDRAM), while in others there is a central clock source. In the former we can take the whole clock trace length as a clock-skew, since the length_to_master = 0, while the length_to_slave = clock_trace_length. Using a feedback clock could solve this.
The clock frequency where the setup margins reach zero is the maximum safe speed. If the two chips are too fat apart, then the data-path might have too much delay and might cause a setup violation. Decreasing the clock frequency can help, but the hold violation can only be fixed by proper propagation delay arrangement, for example, by a minimum t_pd_data constraint.
Examples of asynchronous (strobe-based) interfaces are the microcontroller or DSP peripheral buses, with separate RD# and WR# strobes. Sometimes there is an address-latch (ALE) strobe too in case of multiplexed address/data bus. The main feature of these is that the data launch happens on the falling edge of the strobe signal, and capture on the rising edges. There is a master that generates the strobes. The master chip-level logic design is always synchronous: It schedules the strobes based on its internal clock, with programmable intervals. They usually have programmable "strobe pulse width," "setup time" and "hold time." The programmable setup time is when the slave is addressed/selected, but the strobe is not yet asserted. After a strobe deassertion, the write data may remain on the bus for p.hold time to improve on the hold margins. Often the Output Setup Time of the slave is called "Access Time" (t_acc), because it includes the time it takes to access the data inside the slave device.
[FIGURE 13 OMITTED]
Source-synchronous interfaces. For both read and write, the actual device driving the data signals generates the reference signal (strobe). Examples for source-synchronous systems are the DDRx-SDRAM memory databus, Intel FSB address or data bus.
[FIGURE 14 OMITTED]
[FIGURE 15 OMITTED]
For a usual synchronous system, the biggest limitation is coming from the setup timing budges. This is either a maximum trace length at given data-rate, or a maximum data-rate at a given bus length. The source synchronous interfaces don't have this problem, since the data and the reference signal are propagating in the same direction for all bus transactions. This way one of the propagation delays is degrading, while the other one is improving the timing budget, so compensating each other. The only length limitations come from signal integrity, transaction latency or on-chip back-end interface design.
[FIGURE 16 OMITTED]
[FIGURE 17 OMITTED]
[FIGURE 18 OMITTED]
The strobe signals are usually generated by flip-flops clocked by the same clock as used for the data flip-flops, or by a 90 delayed clock. The data sheets usually specify skew between the data and the strobe signals, since the t_pd difference has direct effect on the I/O static timing.
To balance the setup and hold margins, they usually align the strobe by delaying it by t_bit/2. For DDR-SDRAM memories, this is always done inside the master (memory controller) by using a DLL delay circuit on-chip. For writes, it is delayed before the strobe enters the PCB; for reads it is delayed after it enters the master chip. So the alignment on the PCB is different for the two cases. (8), (9)
The typical timing parameters for DDR-SDRAM memory data buses are:
* Controller output: t_master_skew_min and t_master_skew_max.
* Controller/memory input: t_ISU is called t_DS, and t_IH is called t_DH.
* Memory output: t_DQSQ (skew, when the data is late), and t_QHS ("Data-Hold-Skew-Factor": when the data is early) or t_QH = t_bit/2 - t_QHS.
Unidirectional synchronous interfaces are synchronous interfaces, but the data are always driven by the same device. Analysis of these is the same as the normal synchronous systems, except that only write setup/hold analysis is needed. Examples are the ITUR-BT656 parallel digital video interface, or DDRx-SDRAM memory address bus.
The data and reference signal are propagating in the same direction, just like for the source-synchronous interfaces, so there is no t_pd based maximum length or frequency limitation.
For higher speed interfaces, the transmitter can delay the I/O clock to balance the setup and hold margins. If there is no alignment, then the PCB design rules may specify an offset in the data-to-clock trace length matching.
Clock-forwarding interfaces are similar to the unidirectional synchronous interfaces, but there are two of them in opposite directions. The data bus can be bi-directional or two separate unidirectional buses. Examples are XGMII interface and the AMD Athlon System Bus.
Embedded clock interfaces are the high-speed serial systems where the clocking information is embedded into the serialized data signal in a form of encoding and scrambling. Examples are PCI-Express, SATA and XAUI. The board designer does not have to deal with timing of these systems, although there are trace-length-related PCB design rules, based on signal integrity and lane-to-lane matching. The lanes have to be matched to ensure the static timing of the SERDES to core logic interfacing in the receiver. The data sheets should specify maximum allowable lane-to-lane skew. (10), (11)
Ed.: Part two, which covers how timing and PCB trace lengths affect different real systems, and design tricks for tuning timing, will appear next month.
(1.) PCB Interconnect Timing Analysis Calculator, www.buenos.extra.hu/iromanyok/PCB_Timing_analysis.xls.
(2.) JESD79-xx, "DDR-SDRAM Memory Standards," jedec.org.
(3.) Successful DDR2 Design, Xilinx Xcell Journal, no. 56, pp. 32.
(4.) Dielectric Constant Frequency Compensation Calculator, buenos. extra.hu/iromanyok/E_r_frequency_compensation.xls.
(5.) Digital Communications Test and Measurement, ISBN-13: 978-0-13-084788-1.
(6.) DDR SDRAM Point-to-Point Simulation Process, TN4611 Micron Semiconductors application note.
(7.) Digital Communications Test and Measurement, ISBN-13: 978-0-13-084788-1.
(8.) J. Bhasker and R. Chadha, Static Timing Analysis for Nanometer Designs, Springer, April 2009.
(9.) Xilinx DDR-SDRAM controller application notes: XAPP858, XAPP802, xilinx.com/support/documentation/applcation_notes.htm.
(10.) DDR SDRAM Point-to-Point Simulation Process, TN4611 Micron Semiconductors application note.
(11.) David Robert Stauffer et al, High Speed Serdes Devices and Applications, Springer, October 2008.
ISTVAN NAGY is with Bluechip Technology (bluechiptechnology.co.uk); firstname.lastname@example.org.
|Printer friendly Cite/link Email Feedback|
|Title Annotation:||SIGNAL INTEGRITY|
|Publication:||Printed Circuit Design & Fab|
|Date:||Oct 1, 2010|
|Previous Article:||A brief summary of creep corrosion: some finishes are more resistant than others, but lack of a standard hinders testing.|
|Next Article:||No reassembly required: how working in tandem applies to barbecues, car washing and PCB design.|