|
Click to expand the answer.
PLLs and Elastic Clocks
-

In synchronous chip design, designers commonly use Phase Locked Loops to deal with
asynchrony across multiple voltage and clock domains. How is Elastix different from this?
PLLs are used to generate reliable clocks with very small jitter.
If two different clock domains receive their clock from the same source (PLL), the clocks will run
at the same frequency or at frequencies that are divided from the main frequency.
In these conditions, synchronization between clock domains is a problem that can be
solved by adjusting delays and reaching timing convergence. PLLs do not deal with asynchrony,
they simply avoid asynchrony. A major problem with this approach is timing convergence for
those systems that have many components, since the adjustment of delays is very complicated
when several iterations are required to obtain a physical layout that meets all timing constraints.
Moreover, different blocks may have different clock frequencies required to satisfy their
performance constraints (e.g. interfacing to 100Mpbs Ethernet or decoding video at 30fps),
which may be difficult to reduce the rational multiples of some base frequency.
When using Elastic Clocks, there are various ways to synchronize different modules.
One of them is using tightly-coupled handshakes that make the circuit run at the speed
of the slowest clock determined by the operating conditions at any time.
A different way is to use loosely-coupled handshakes that enable the modules to run at their natural
frequencies and only communicate when required through elastic channels.
In both cases, timing convergence is much easier since the Elastic Clocks must
only be adjusted to talk to their neighbors and there is no need to adjust to a global frequency.
Timing constraints are local and can be met in a modular way.
Loosely-coupled handshakes can also be used by synchronous systems,
but they incur significant communication latencies due to the re-synchronization flip-flops.
Elastix provides an efficient low-latency communication scheme for elastic SoCs.
-

In a multi-core system with different Elastic Clocks for each core, would providing separate
PLLs to each core be equivalent to using Elastic Clocks?
Elastic Clocks can adapt immediately and automatically to the variability of the operating
conditions (e.g. voltage and temperature) of each core. Therefore, the frequency of the
Elastic Clocks can vary permanently so that the cores can run at the best possible performance
at any time. Instead, PLLs can only run at one frequency. The lock time taken by a PLL
to switch from one frequency to another is of the order of hundreds of microseconds.
During this time, the system must be halted and no computations can be performed, thus
having a negative impact on performance.
Variability
-

Elastix uses matched delays to design the Elastic Clocks. What are the required
delay margins to guarantee a correct behavior under PVT variations?
The matched delays determine the frequency of the Elastic Clocks. However, unlike
synchronous circuits, the frequency of an Elastic Clock changes accordingly to
the operating conditions (voltage, temperature). Given that the variability of
the Elastic Clocks is highly correlated with the variability of the logic in the
circuit, the required delay margins must be calculated assuming that the operating
conditions for the clocks are similar to those of the logic.
For a safe operation under variability, the matched delays must be longer than
the delays in the logic. To define these margins, Elastix uses the same timing
analysis tools than for the design of synchronous circuits.
Traditionally, timing analysis was performed using two corners (best and worst).
With the increase of variability, more accurate approaches considering on-chip
variability (OCV) have emerged. Several vendors are supporting multi-corner OCV
in their analysis tools. Recently, statistical timing analysis has also been incorporated.
Elastix uses an approach and a set of parameters that are similar to those used
for conventional timing analysis and synthesis in synchronous designs.
The matched delays are synthesized using SDC constraints that guarantee a correct behavior.
If, for example, a designer would use timing analysis with 10% OCV, the same
parameters would be used for the corresponding elastic design.
Indeed, choosing the appropriate models for variability has a crucial impact
on performance and yield. By choosing conservative OCV values (e.g. 30%), a very
high yield might be obtained at the expense of losing performance. On the other
hand, if a 5% OCV would be chosen, the performance of the circuit would increase
at the expense of sacrificing yield. Any increase of performance can also be
translated into power savings. The most appropriate value for OCV will depend
on the specific priorities for each design.
Voltage and frequency scaling
-

In synchronous chip design, designer can pre-scale and divide down the high-speed
system clock for different portions of the SoC that only need to run at some reduced
frequency in order for the total chip to compute correctly. In doing so, the
designer has the opportunity to reduce the power dissipation of the chip by
running some of the SOC blocks only at the reduced frequency that is necessary.
How is the Elastix technology different from that, and why would designers want
to use it when they can simply divide down their synchronous clock?
In synchronous SoCs, the frequencies of the components are usually derived from
the frequency of the master clock. In this way, synchronization can be implemented
reliably. However, only a limited spectrum of frequencies and voltages can be selected
since the frequencies of all components must be compatible for a robust synchronization.
Other schemes accept the synchronization between totally independent clocks
(for example, using asynchronous FIFOs or re-synchronization flip-flops).
In this case, the communication incurs significant latency penalties. Additionally,
the frequency and the voltage must be adjusted to accommodate the worst-case operating conditions.
Elastix provides a solution that enables each module to work at its natural frequency.
The frequency adapts dynamically to the operating conditions, which can be close
to the typical conditions during most of the running time. In case of voltage
scaling, there is no need to explicitly change the frequency of the system.
Instead, it changes automatically accordingly to the applied voltage. Therefore,
voltage scaling does not require the intervention of external agents to adjust
the frequency, nor a safe scheme that changes the voltage before changing the
frequency (or vice-versa), shuts off the clock during frequency changes, and so on.
The fact that the voltage and the frequency can be fine-tuned at any time instant
makes elasticity suitable to reach near-optimal energy configurations for each application.
-

How would a circuit actually make decisions about voltage and frequency scaling?
The decisions about scaling can be done at the level of circuit (hardware implementation)
or at the level of system (software implementation). The criteria to make these
decisions may be diverse and will depend on the type of application and environment.
We can analyze two examples.
Let us assume that a device is producing data in a pipeline. The data is sent to
a FIFO and is received by a consumer. Depending on the relative speed of the
producer and the consumer, the FIFO may spend most of its time empty or full.
A voltage/frequency control circuit might decide to raise/lower the voltage depending
on the average occupancy of the FIFO. In this way, the speed and power of the producer
could be automatically adjusted to the speed of the consumer.
A second example could illustrate voltage/frequency scaling at the system level.
In an SoC, different components might be collaborating in the execution of a task.
This task might require the execution of several subtasks in different components.
The deadlines of each subtask could define the urgency with which each component
should run. With Elastic Clocks, each component could run at a different frequency
(and voltage) to meet the required deadlines just-in-time, thus consuming a
near-optimal energy. The frequencies could be adjusted without any constraints
related the frequencies of the other components.
-

What are the differences between Adaptive Voltage Scaling (AVS) and Elastix
Voltage Scaling (EVS)?
With AVS, the circuit adapts its voltage to the operating conditions in such a way
that the delays of the logic can still be accommodated by the clock period.
The goal in EVS is similar: adjusting the voltage to achieve a given throughput.
However, the combination of elasticity with voltage scaling can provide superior
voltage reductions and new opportunities for system-level power management.
In AVS, the clock period is defined externally and is totally uncorrelated with
the delays of the logic. A guard band must be conservatively defined to prevent
the circuit from exceeding the clock period under any circumstances. In EVS, the
Elastic Clock delay is highly correlated with the internal logic of the circuit.
The guard band is already defined for the matched delays that generate the
Elastic Clock according to the existing correlation. No other guard bands are
required to synchronize with any external delay reference.
Thus power consumption is lowered through the use of smaller guard bands.
Additionally, and more important, the combination of Elastic Clocks with the
proprietary low-latency Elastix interconnects (FIFOs) enables the decoupling
of multiple modules working with different frequencies at SoC level. Hence,
all modules can run at their natural frequencies without incurring synchronization
latencies. The flexibility of interconnecting multiple modules at different
voltages and frequencies offers new opportunities to implement efficient power
management techniques.
-

Does Elastix have technology to reduce voltage if performance throughput
requirements go down? Essentially, can a circuit be slowed down below the
reference clock speed when needed?
The Elastix technology can be combined with voltage regulators to adjust the
voltage at any moment. The voltage can be scaled in such a way that the
frequency of the Elastic Clocks is adjusted according to the performance
requirements. The adjustment can be done automatically by hardware, in case
the performance requirements can be measured by specific circuits (e.g. measuring the traffic on a network).
Otherwise, the adjustment can be done at the system level by software
(e.g. by estimating the workload of the system). In either case, the adjustment
of the voltage is automatically transformed into an adjustment of the frequency.
In this way, DVFS can be directly implemented without requiring specific
actions to change the frequency of the system.
The response time to adapt the system to some specific performance requirements
is determined by the response time of the voltage regulator. This time can be
of the order of several microseconds.
Power consumption
-

How is power consumption distributed along time?
The Elastix technology combined with voltage scaling techniques can effectively
reduce the time-averaged power consumption. The Elastic Clocks keep track of the
operating conditions of the circuit and adjust the voltage to meet the required
performance at any time. Consequently, power consumption may vary along time but
the average power consumption is significantly reduced.
-

Is power gain across the entire lot? If not, what is the expected power
distribution in a lot?
Regardless of the technology used to design a circuit, inter-die variability
may have a non-uniform impact on different ICs. This phenomenon is not different
from what occurs in synchronous ICs. Also, in the case of elastic circuits, power
consumption will depend on the specific process parameters of each individual IC,
thus providing a variety of devices with different power and performance characteristics.
Some companies that produce a large number of ICs can afford to bin and classify
them according to their particular power and performance characteristics.
Most companies define the power and performance requirements and discard those ICs
not meeting them. These requirements are defined according to the desired yield.
In general, the distribution of power consumption of circuits using Elastic Voltage Scaling (EVS)
will be similar to the distribution of worst-case operating frequencies that
can be derived by performing at-speed testing (even though the power gain is
obtained automatically, without the need for binning), since EVS transforms a
performance gain into a power gain.
-

Is running a design at 200 MHz for a longer period of time better than running
the design at 1 GHz for a shorter period of time from the power point of view?
If the same voltage is used in both cases, then the dynamic power at 200 MHz
is 1/5th of the power at 1 GHz. However, the energy consumed to run the application
is the same. The static power (leakage) would be the same in both cases.
However, power can be significantly reduced (quadratically) if voltage scaling
is applied. If V1 and V2 are the two voltages used in each case, the dynamic power
reduction will be proportional to (V1/V2)2 approximately. The reduction in voltage
will also result in a reduction of static power.
-

How can the power gains of an elastic circuit be estimated?
Given that the transformations performed in a synchronous circuit to make it elastic
are very minor, the parameters to estimate power consumption are practically identical:
the total load capacitance and the switching activity. We must also consider that other
power reduction techniques (e.g., clock gating, multiple Vt, etc) are orthogonal to the
Elastix technology, i.e., they can be applied in the same way as in a synchronous circuit.
The main source of power savings using the Elastix technology is voltage scaling.
Therefore, let us assume that we have an estimation of the power consumption of
the synchronous circuit working at a nominal voltage Vsync. The power consumption
of the elastic circuit working at a voltage Velx can be estimated by applying the
well-known equations for dynamic and leakage power.
For example, if Pdyn(V) = kV2, and we know that Vsync=1V and Velx=0.85V, then we
will have a dynamic power reduction of 28% (calculated as 1-0.852). A similar calculation
could be done using the equation for leakage power. Similar numbers could also be obtained
by simulation using the same test benchmarks for the synchronous and elastic circuits.
Therefore, we can estimate a relative power reduction with regard to the synchronous
circuit based on the voltage reduction obtained by voltage scaling.
-

What are the power gains obtained by using voltage scaling with the Elastix technology?
According to our experience, if a chip is normally working at conditions around the
typical corner, the power savings can be around 30-40% when compared to the synchronous
counterpart running with a nominal voltage to operate at worst-case conditions.
If a die manifests some process variability closer to the best-case corner, the power
savings will be larger. On the other hand, if the process variability delivers a die
closer to the worst-case corner, the power savings will be smaller. The Elastic Clocks
will allow every die to automatically run at the best voltage according to the process variability.
Variations in temperature will also have an impact on the dynamic variation of
the voltage. However, this impact will be much smaller than the one determined
by process variations.
Performance
-

Can Elastic circuits provide more performance in highly-parallel systems?
Definitely yes. Highly-parallel systems (e.g. many-core systems) have a large
amount of processing units. Even though each processing unit may be in a different
clock domain, a PLL typically provides the clock for all the components of the system.
The frequency of this PLL must be conservatively defined to account for the variability
of all processing units. With this approach, many of the "fast" units must run their
tasks at the speed determined by the "slow" units.
By making a system elastic, the processing units can run at their natural frequency
independently from the frequency of the other units. With this technology,
the average throughput of the system will have a significant improvement. This
feature may be especially interesting for systems showing a high degree of concurrency.
Two typical examples are multi-core systems running multiples threads
in data servers, or Graphics Processing Units (GPUs).
-

Can Elastic circuits guarantee a minimum performance from a specification point
of view? Would it better and more predictable than the current worst-case performance?
Because of the margins required to design the matched delays, the guaranteed
worst-case performance could be slightly worse than the one in a synchronous design.
However, the amortized average performance is significantly better.
Elastic Clocks are reliable from the point of view of the internal synchronization
of the chip. However, Elastic Clocks may manifest some cycle-by-cycle non-deterministic timing.
For example, the frequency of an elastic circuit could oscillate by a small percentage
around an average frequency. This jitter potentially has a highly beneficial effect
on power integrity, reducing Vdd and ground noise, and electro-magnetic emission,
since it lowers the harmonics by spreading the edges over time.
Even though the cycle-by-cycle behavior can suffer some non-deterministic jitter,
an average frequency can still be guaranteed, by using an external reference clock
from which the Elastic Clocks can deviate by a bounded amount. As an example,
Elastic Clocks could run at an average frequency with an error smaller than 1%,
even though the phase jitter could be up to 3%.
Synchronous and asynchronous interfaces
-

Can elastic blocks communicate with synchronous blocks?
Yes, since we can consider a synchronous block like an asynchronous one with
a fixed frequency handshake signal (the clock). The communication between two
blocks, one synchronous and the other elastic, can be done using two different schemes:
- A tightly-coupled handshake in which both blocks run at the same frequency
and the synchronous clock is treated as a normal handshake signal by the asynchronous
block. In this scheme, it is required that the asynchronous block has a worst-case
period not longer than the synchronous clock period.
- A loosely-coupled handshake in which both blocks can run at independent frequencies.
In this scheme, synchronization mechanisms are required to safely transmit data from
one block to the other (e.g. two flip-flop synchronizers or asynchronous FIFOs).
This scheme implies a communication latency penalty. In case of using asynchronous FIFOs,
the penalty is produced by the synchronization of the full/empty signals.
-

Can different elastic blocks communicate and work at different frequencies?
Yes, and this is one of the main values of the Elastix technology. Unlike in the
synchronous case, elastic blocks working at different frequencies can communicate
using Elastix FIFOs. This is one of the IPs of Elastix that provides low-latency
communication between different clock domains. This approach opens the door for
efficient power management techniques trading-off voltage and frequency without
synchronization constraints.
-

Can elastic circuits interface with standard buses?
Yes, in various ways. If the designer prefers to maintain the bus running synchronously,
a synchronous-to-asynchronous interface using synchronizers or FIFOs can be included.
A more efficient option is to make the bus elastic. For the design point of view,
the logic of a standard bus (including the arbiters, mux/demuxes, pipeline registers,
control, etc) is not different from the logic of any synchronous circuit. In that respect,
the Elastix technology can be directly applied to the bus and obtained an elastic
version of the standard bus with the same behavior.
Design, verification and test
-

How different is the Elastix design flow from the classical ASIC design flows?
The Elastix design flow starts from a classical design flow in which few minor
adjustments are added. Even though elasticity can be introduced at various levels
of the flow, usually this happens after placement has been performed. Up to that
point, the circuit is still genuinely synchronous.
The circuit is then de-synchronized by simple ECO transformations. Initially, some
latches are added in the logic to make the data flow elastic. Next, the clock trees
are synthesized. Finally, some elastic controllers are added to generate the
Elastic Clocks that will be connected to the clock trees. After de-synchronization,
the physical layout is synthesized using conventional EDA flows.
Along this process, the classical timing constraints are substituted by another
set of equivalent constraints related to the Elastic Clocks. These constraints
are preserved until timing sign-off.
Therefore, the Elastix design flow uses the same EDA tools as a conventional
design flow (RTL synthesis, clock tree synthesis, placement, routing, timing analysis, etc).
The Elastix flow only incorporates ECO transformations on top of the existing flow.
-

Can the Elastic Clocks be clock gated?
Yes. Given the fact that the circuit is practically identical to its synchronous counterpart,
the same clock gating structures can be inherited and used in the elastic circuit.
Even though the Elastic Clocks continuously generate clock pulses (like a PLL),
these pulses can be intercepted and blocked before being transmitted to the sequential
elements of the circuit.
The fact that the original synchronous design already has clock gating structures
is transparent and orthogonal to the substitution of rigid clocks by Elastic Clocks.
-

Can Elastic Clocks take advantage of clock skew?
Elastic Clocks do naturally use a dynamically adjusted clock skew. The relationship
between the Elastic Clocks of neighboring blocks inherently uses clock skew to maximize
the performance of the circuit. Additionally, the elasticity of the clocks allows
reducing the requirements for buffer insertion to solve hold violations.
-

How are elastic circuits tested?
Elastic circuits are tested in the very same way as synchronous circuits.
The fact that the circuit looks like its synchronous counterpart makes it possible
to use the same test structures (e.g., scan chains, BIST) and patterns that were
initially designed for the synchronous circuit.
Additionally, the elastic circuit requires some specific logic to test the Elastic Clocks.
This is a negligible extra logic and a small set of extra test patterns.
-

How are the elastic circuits verified?
Elastix provides a complete flow to ensure the correctness of the circuit.
An elastic circuit can be validated using a simulation setup similar to the one of
synchronous circuits, reusing the same testbenches and assertions.
Additionally, Elastix also provides a flow that guarantees a full coverage to catch
both functional and timing-related bugs in the controllers and their interaction with
the datapath. This flow is based on formal verification tools and is performed in two steps:
-
Functional verification, aimed at proving the functional equivalence of the circuit with
regard to the original synchronous one. Functional equivalence can be proven with existing
equivalence checking tools. Additionally, the handshake protocol implemented by the Elastic Clocks
is verified using Assertion-Based Verification tools, where the assertions guarantee the
correct generation of clock events.
-
Timing sign-off, aimed at checking that data are captured at the sequential
elements without violating any setup/hold constraint. A set of SDC constraints
is generated and verified for each design.
|