Welcome to SOFA documentation!

Introduction

Skywater Opensource FpgA (SOFA) is a fully open-source embedded FPGA IP library, from the architecture description to production ready layouts. As illustrated in Fig. 1, SOFA IPs are designed through the Skywater 130nm PDK, OpenFPGA framework and Synopsys IC Compiler II. The runtime of the design flow for each IP is within 24 hours.

All the SOFA FPGAs are designed to interface the Caravel SoC interface. We aims to empower embedded applications with its low-cost design approach but high-density architecture.

24-hour FPGA IP development: from PDK to production-ready layout

24-hour FPGA IP development: from PDK to production-ready layout

HD FPGAs

Device Comparison

The High Density (HD) FPGAs are embedded FPGAs built with the Skywater 130nm High Density Standard Cell library (Sky130_fd_SC_HD).

Logic capacity of High Density (HD) FPGA IPs

Resource/Capacity

SOFA HD

QLSOFA HD

SOFA CHD

Look-Up Tables [1]

1152

1152

1152

Flip-flops

2304

2304

2304

Soft Adders [2]

N/A

1152

1152

Routing Channel Width [3]

40

60

60

Max. Configuration Speed [4]

50MHz

50MHz

50MHz

Max. Operating Speed [4]

50MHz

50 MHz

50MHz

User I/O Pins [5]

144

144

144

Max. I/O Speed [4]

33MHz

33MHz

33MHz

Core Voltage

1.8V

1.8V

1.8V

DC and AC Characteristics

Typical AC Characteristics

Typical AC characteristics for FPGA I/Os

Symbol

Description

Min

Max

Units

V_in Overshoot

Maximum allowed overshoot voltage for Vin

TBD

TBD

V

V_in Undershoot

Minimum allowed overshoot voltage for Vin

TBD

TBD

V

I_VDD_core

Quiescent VDD_core supply current

TBD

TBD

mA

I_VDD_io

Quiescent VDD_io supply current

TBD

TBD

mA

SOFA HD

Architecture

Floorplan

Fig. 5 shows an overview on the architecture of the embedded FPGA fabric. The FPGA follows a homogeneous architecture which only contains single type of tiles in the center fabric. I/O tiles are placed at the boundary of the FPGA to interface with GPIOs and RISC-V processors (see details in I/O Resources).

Tile-based FPGA architecture

Tile-based FPGA architecture

Tiles

The FPGA architecture follows a tile-based organization, to exploit the fine-grainularity in physical design, where three types of tiles are built:

FPGA tile type and functionalities

Type

Capacity

Description

CLB

144

Each CLB tile consists of
- a Configurable Logic Block (CLB)
- a X-direction Connection Block (CBx)
- a Y-direction Connection Block (CBy)
- a Switch Block (SB).
This is the majority tile across the fabric to implement logics and registers.

IO-A

36

The type-A I/O is a low-density I/O tile which is designed to mainly interface
the GPIOs of the SoC.
Each I/O-A tile consists of 1 digitial I/O cell.

IO-B

12

The type-B I/O is a high-density I/O tile which is designed to mainly interface
the wishbone interface and logic analyzer of the SoC.
Each I/O-B tile consists of 9 digitial I/O cells.

Routing Architecture

The routing architecture is based on uni-directional routing tracks, which are interconnected by routing multiplexers. Fig. 6 illustrates the detailed organization of the routing architecture.

Detailed routing architecture

Detailed routing architecture

The routing architecture consists the following type of routing tracks:

  • Length-1 wires (L1 wires), which hop over 1 logic block (including I/O block)

  • Length-2 wires (L2 wires), which hop over 2 logic block (including I/O block)

  • Length-4 wires (L4 wires), which hop over 4 logic block (including I/O block)

Each tile includes two routing channels, i.e., the X-direction routing channel and the Y-direction routing channel, providing horizental and vertical connections to adjacent tiles. Each routing channel consists of 40 routing tracks. See details in Table 5.

Routing track distribution of SOFA HD FPGA

Track type

Number of tracks per channel

Length-1

4 (10%)

Length-2

4 (10%)

Length-4

32 (80%)

Total

40

Scan-chain

There is a built-in scan-chain in the FPGA which connects the the sc_in and sc_out ports of CLBs in a chain (see details in Scan Chain), as illustrated in Fig. 7.

When Test_en signal is active, users can

  • overwrite the contents of all the D-type flip-flops in the FPGA by feeding signals to the SC_HEAD port

  • readback the contents of all the D-type flip-flops in the FPGA through the SC_TAIL port.

Built-in scan-chain across FPGA

Built-in scan-chain across FPGA

I/O Resources

Pin Assignment

The High-Density (HD) FPGA IP has 144 data I/O pins as shown in Fig. 8.

Among the 144 I/Os,

  • 29 external I/Os are accessible through the Caravel SoC’s General-Purpose I/Os (GPIOs).

  • 115 internal I/Os are accessible through the Caravel SOC’s logic analyzer and wishbone interfaces, which are controlled by the RISC-V processor. See Debug Mode and Accelerator Mode for details.

Warning

For all the unused GPIOs, please set them to input mode, so that the FPGA will not output any noise signals to damage other SoC components.

Note

The connectivity of the 115 internal I/Os can be switched through a GPIO of Caravel SoC. As a result, the FPGA can operate in different modes.

Warning

The internal I/O pins will drive either Wishbone or the logic analyzer, following the same truth table as mode-switch bit in Fig. 8.

I/O arrangement of FPGA IP

I/O arrangement of High-Density (HD) FPGA IP: switchable between logic analyzer and wishbone bus interface

External I/Os

A SOFA HD FPGA IP contains 37 external I/O pins, including 29 data I/Os and 8 control I/Os.

Full details are summarized in the following table.

SOFA HD FPGA I/O usage and sizes

I/O Type

Description

No. of Pins

Data I/O

Datapath I/Os of FPGA fabric

29

CLK

Operating clock of FPGA core

1

PROG_CLK

Clock used by configuration protocol to program FPGA fabric

1

CCFF_HEAD

Input of configuation protocol to load bitstream

1

CCFF_TAIL

Output of configuration protocol to read back bitstream

1

TEST_EN

Activate the test mode of FPGA fabric

1

SC_HEAD

Input of built-in scan-chain to load data to flip-flops of FPGA fabric

1

SC_TAIL

Output of built-in scan-chain to read back flip-flops from FPGA fabric

1

IO_ISLO_N

Active-low signal to enable I/O datapath isolation from external ports

1

Total

37

Accelerator Mode

When the Wishbone interface is enabled, the FPGA can operate as an accelerator for the RISC-V processor. Fig. 9 illustrates the detailed I/O arrangement for the FPGA, where the wishbone bus signals are connected to fixed FPGA I/O locations.

Note

Not all the 115 internal I/Os are used by the Wishbone interface. Especially, the I/O[21:29] are not connected.

Warning

The FPGA does not contain a Wishbone slave IP. Users have to implement a soft Wishbone slave when use the FPGA as an accelerator.

I/O arrangement of FPGA IP when interfacing wishbone bus

I/O arrangement of High-Density (HD) FPGA IP when interfacing wishbone bus

Debug Mode

When the logic analyzer interface is enabled, the FPGA can operate in debug mode, whose internal signals can be readback through the registers of the RISC-V processor. Fig. 10 illustrates the detailed I/O arrangement for the FPGA, where the logic analyzer signals are connected to fixed FPGA I/O locations.

Note

The logic analyzer is 128-bit, while 115 bits can drive or be driven by the FPGA I/O. The other 14 bits are connected to internal spots of the FPGA fabric, monitoring critical signal activities of the FPGA in debugging purpose.

Warning

If the logic analyzer is not used, please configure both the management SoC and the FPGA as follows:

  • all the I/O directionality is set to input mode.

  • all the output ports is pulled down to logic ``0``.

I/O arrangement of FPGA IP when interfacing logic analyzer

I/O arrangement of High-Density (HD) FPGA IP when interfacing logic analyzer

Configurable Logic Block

Generality

Each Logic Block (CLB) consists of 8 Logic Elements (LEs) as shown in Fig. 11. All the pins of the LEs are directly wired to CLB pins without a local routing architecture. Feedback connections between LEs are implemented by the global routing architecture outside the CLBs.

Configurable Logic Block schematic

Configurable logic block schematic

Multi-mode Logic Element

Physical Implementation

As shown in Fig. 12, each Logic Element (LE) consists of

  • a fracturable 4-input Look-Up Table (LUT)

  • two D-type Flip-Flops (FF)

Logic element schematic

Detailed schematic of a logic element

The LE can operate in different modes to map logic function efficiently

Operating mode: LUT4 + FF

The logic element can operate in the Look-Up Table (LUT) + Flip-flop (FF) mode as many classical FPGA logic elements. As depicted in Fig. 13, the fracturable LUT will operate as a single-output 4-input LUT and the upper FF is used to implemented sequential logic.

The operating mode is designed to efficiently implement 4-input functions.

Logic element schematic

Resource usage of the logic element operating in LUT4 + FF mode (Grey blocks and lines are unused resources).

Operating mode: Dual-LUT3

The logic element can operate in the dual Look-Up Tables (LUTs) and Flip-flops (FFs) mode as many modern FPGA logic elements. As depicted in Fig. 14, the fracturable LUT will operate as two 3-input LUTs with shared inputs.

The operating mode is designed to efficiently implement two 3-input functions with shared input variables. A popular example is the adder function, where the carry logic can be mapped to the upper LUT3 and the sum logic can be mapped to the lower LUT3.

Logic element schematic

Resource usage of the logic element operating in dual LUT3 + FFs mode (Grey blocks and lines are unused resources).

Operating mode: Shift-Register

As depicted in Fig. 15, the Flip-flops (FFs) can be connected in dedicated routing wires to implement high-performance shift registers.

The operating mode is designed to efficiently implement shift registers which are widely used in buffer logic, e.g., FIFOs.

Logic element schematic

Resource usage of the logic element operating in shift register mode (Grey blocks and lines are unused resources).

Scan Chain

There is a built-in scan-chain in the CLB where all the sc_in and sc_out ports of LEs are connected in a chain, as illustrated in Fig. 11. When Test_en signal is active, users can readback the contents of all the D-type flip-flops of the LEs thanks to the scan-chain. When Test_en signal is disabled, D-type flip-flops of the LEs operate in regular mode to propagate datapath signal from LUT outputs.

Note

The scan-chain of CLBs are connected in a chain at the top-level. See details in Scan-chain.

Circuit Designs

I/O Circuit

As shown in Fig. 16, the I/O circuit used in the I/O tiles of the FPGA fabric (see Fig. 5) is an digital I/O cell with

  • An active-low I/O isolation signal IO_ISOL_N to set the I/O in input mode. This is to avoid any unexpected output signals to damage circuits outside the FPGA due to configurable memories are not properly initialized.

    Warning

    This feature may not be needed if the configurable memory cell has a built-in set/reset functionality!

  • An internal protection circuitry to ensure clean signals at all the SOC I/O ports. This is to avoid

    • SOC_OUT port outputs any random signal when the I/O is in input mode

    • FPGA_IN port is driven by any random signal when the I/O is output mode

  • An internal configurable memory element to control the direction of I/O cell

The truth table of the I/O cell is consistent with the GPIO cell of Caravel SoC (which requires an active-low signal to enable output directionality), where

  • When configuration bit (FF output) is logic 1, the I/O cell is in input mode

  • When configuration bit (FF output) is logic 0, the I/O cell is in output mode

Schematic of embedded I/O cell used in FPGA

Schematic of embedded I/O cell used in FPGA

Fig. 17 shows an example waveform about how the I/O cell works:

  • When IO_ISOL_N is enabled/disabled

  • When operates in input mode

  • When operates in output mode

Schematic of embedded I/O cell used in FPGA

An example of waveforms of embedded I/O cell used in FPGA

Multiplexer

Routing multiplexer are designed by using the skywater High-Density (HD) 2-input MUX cell, as shown in Fig. 18. The tree-like multiplexer design is applied to all the routing multiplexers in logic elements, connection blocks and switch blocks across the FPGA fabric.

Schematic of multiplexer design in SOFA HD FPGA

Schematic of multiplexer design in SOFA HD FPGA

Note

Each routing multiplexer has a dedicated input which is connected to ground (GND) signal. When it is not used, the output will be driven by the ground, working as a constant generator.

Timing Annotation

Configurable Logic Block

The path delays in Fig. 19 are listed in Table 7.

Schematic of a logic element used in SOFA HD FPGA

Schematic of a logic element used in SOFA HD FPGA

Path delays of logic element in the SOFA HD FPGA

Path / Delay

TT (unit: ns)

in0 -> LUT3_out[0]

0.85

in1 -> LUT3_out[0]

0.57

in2 -> LUT3_out[0]

0.30

in0 -> LUT3_out[1]

0.86

in1 -> LUT3_out[1]

0.59

in2 -> LUT3_out[1]

0.31

in0 -> LUT4_out

1.14

in1 -> LUT4_out

0.86

in2 -> LUT4_out

0.58

in3 -> LUT4_out

0.51

LUT3_out[0] -> A

0.56

LUT4_out[0] -> A

0.58

A -> out[0]

0.88

A -> FF[0]

0.56

FF[0] -> out[0]

0.88

LUT3_out[1] -> out[1]

0.89

LUT3_out[1] -> FF[1]

0.56

FF[1] -> out[1]

0.89

regin -> FF[0]

0.58

FF[0] -> FF[1]

0.56

I/O Block

The path delays in Fig. 16 are listed in Table 8.

Path delays of I/O circuit in the SOFA HD FPGA

Path / Delay

TT (unit: ns)

SOC_IN -> FPGA_IN

0.11

FPGA_OUT -> SOC_OUT

0.11

Routing Architecture

The path delays in Fig. 6 are listed in Table 9.

Path delays of routing blocks in the SOFA HD FPGA

Path / Delay

TT (unit: ns)

A -> B

1.61

A -> C

1.61

A -> D

1.61

B -> E

1.38

QLSOFA HD

Architecture

Floorplan

QLSOFA HD FPGA share the same floorplan as SOFA HD FPGA. See details at Floorplan.

Tiles

The FPGA architecture follows a tile-based organization, to exploit the fine-grainularity in physical design, where three types of tiles are built:

FPGA tile type and functionalities

Type

Capacity

Description

CLB

144

Each CLB tile consists of
- a Configurable Logic Block (CLB)
- a X-direction Connection Block (CBx)
- a Y-direction Connection Block (CBy)
- a Switch Block (SB).
This is the majority tile across the fabric to implement logics and registers.

IO-A

36

The type-A I/O is a low-density I/O tile which is designed to mainly interface
the GPIOs of the SoC.
Each I/O-A tile consists of 1 digitial I/O cell.

IO-B

12

The type-B I/O is a high-density I/O tile which is designed to mainly interface
the wishbone interface and logic analyzer of the SoC.
Each I/O-B tile consists of 9 digitial I/O cells.

Routing Architecture

The routing architecture shares the same principle as the SOFA HD routing architecture (See details in Routing Architecture).

Note

Different from SOFA HD, each routing channel consists of 60 routing tracks. See details in Table 11.

Routing track distribution of QLSOFA HD FPGA

Track type

Number of tracks per channel

Length-1

6 (10%)

Length-2

6 (10%)

Length-4

48 (80%)

Total

60

Scan-chain

QLSOFA HD FPGA share the same floroplan as SOFA HD FPGA. See details at Scan-chain.

I/O Resources

Pin Assignment

The QLSOFA HD FPGA IP has 144 data I/O pins as shown in Fig. 20.

Among the 144 I/Os,

  • 29 external I/Os are accessible through the Caravel SoC’s General-Purpose I/Os (GPIOs).

  • 115 internal I/Os are accessible through the Caravel SOC’s logic analyzer and wishbone interfaces, which are controlled by the RISC-V processor. See Debug Mode and Accelerator Mode for details.

Warning

For all the unused GPIOs, please set them to input mode, so that the FPGA will not output any noise signals to damage other SoC components.

Note

The connectivity of the 115 internal I/Os can be switched through a GPIO of Caravel SoC. As a result, the FPGA can operate in different modes.

Warning

The internal I/O pins will drive either Wishbone or the logic analyzer, following the same truth table as mode-switch bit in Fig. 20.

I/O arrangement of FPGA IP

I/O arrangement of QLSOFA HD FPGA IP: switchable between logic analyzer and wishbone bus interface

External I/Os

A QLSOFA HD FPGA IP contains 37 external I/O pins, including 27 data I/Os and 10 control I/Os.

Full details are summarized in the following table.

SOFA HD FPGA I/O usage and sizes

I/O Type

Description

No. of Pins

Data I/O

Datapath I/Os of FPGA fabric

27

CLK

Operating clock of FPGA core

1

PROG_CLK

Clock used by configuration protocol to program FPGA fabric

1

RESET

Active-low reset for datapath flip-flops in the FPGA

1

PROG_RESET

Active-low reset for configuration flip-flops in the FPGA

1

CCFF_HEAD

Input of configuation protocol to load bitstream

1

CCFF_TAIL

Output of configuration protocol to read back bitstream

1

TEST_EN

Activate the test mode of FPGA fabric

1

SC_HEAD

Input of built-in scan-chain to load data to flip-flops of FPGA fabric

1

SC_TAIL

Output of built-in scan-chain to read back flip-flops from FPGA fabric

1

IO_ISLO_N

Active-low signal to enable I/O datapath isolation from external ports

1

Total

37

Accelerator Mode

When the Wishbone interface is enabled, the FPGA can operate as an accelerator for the RISC-V processor. Fig. 21 illustrates the detailed I/O arrangement for the FPGA, where the wishbone bus signals are connected to fixed FPGA I/O locations.

Note

Not all the 115 internal I/Os are used by the Wishbone interface. Especially, the I/O[21:29] are not connected.

Warning

The FPGA does not contain a Wishbone slave IP. Users have to implement a soft Wishbone slave when use the FPGA as an accelerator.

I/O arrangement of FPGA IP when interfacing wishbone bus

I/O arrangement of High-Density (HD) FPGA IP when interfacing wishbone bus

Debug Mode

When the logic analyzer interface is enabled, the FPGA can operate in debug mode, whose internal signals can be readback through the registers of the RISC-V processor. Fig. 22 illustrates the detailed I/O arrangement for the FPGA, where the logic analyzer signals are connected to fixed FPGA I/O locations.

Note

The logic analyzer is 128-bit, while 115 bits can drive or be driven by the FPGA I/O. The other 14 bits are connected to internal spots of the FPGA fabric, monitoring critical signal activities of the FPGA in debugging purpose.

Warning

If the logic analyzer is not used, please configure both the management SoC and the FPGA as follows:

  • all the I/O directionality is set to input mode.

  • all the output ports is pulled down to logic ``0``.

I/O arrangement of FPGA IP when interfacing logic analyzer

I/O arrangement of High-Density (HD) FPGA IP when interfacing logic analyzer

Configurable Logic Block

Generality

Each Logic Block (CLB) consists of 8 Logic Elements (LEs) as shown in Fig. 23. All the pins of the LEs are directly wired to CLB pins without a local routing architecture. Feedback connections between LEs are implemented by the global routing architecture outside the CLBs.

Configurable Logic Block schematic

Configurable logic block schematic

Multi-mode Logic Element

Physical Implementation

As shown in Fig. 24, each Logic Element (LE) consists of

  • a fracturable 4-input Look-Up Table (LUT)

  • two D-type Flip-Flops (FF)

Logic element schematic

Detailed schematic of a logic element

The LE can operate in different modes to map logic function efficiently

Operating mode: LUT4 + FF

The logic element can operate in the Look-Up Table (LUT) + Flip-flop (FF) mode as many classical FPGA logic elements. As depicted in Fig. 25, the fracturable LUT will operate as a single-output 4-input LUT and the upper FF is used to implemented sequential logic.

The operating mode is designed to efficiently implement 4-input functions.

Logic element schematic

Resource usage of the logic element operating in LUT4 + FF mode (Grey blocks and lines are unused resources).

Operating mode: Dual-LUT3

The logic element can operate in the dual Look-Up Tables (LUTs) and Flip-flops (FFs) mode as many modern FPGA logic elements. As depicted in Fig. 26, the fracturable LUT will operate as two 3-input LUTs with shared inputs.

The operating mode is designed to efficiently implement two 3-input functions with shared input variables. A popular example is the adder function, where the carry logic can be mapped to the upper LUT3 and the sum logic can be mapped to the lower LUT3.

Logic element schematic

Resource usage of the logic element operating in dual LUT3 + FFs mode (Grey blocks and lines are unused resources).

Operating mode: Shift-Register

As depicted in Fig. 27, the Flip-flops (FFs) can be connected in dedicated routing wires to implement high-performance shift registers.

The operating mode is designed to efficiently implement shift registers which are widely used in buffer logic, e.g., FIFOs.

Logic element schematic

Resource usage of the logic element operating in shift register mode (Grey blocks and lines are unused resources).

Operating mode: Soft Adder

As depicted in Fig. 28, the 4-input LUT can implement an 1-bit adder logic, where carry inputs and outputs are connected through dedicated carry chain wires cin and cout across logic elements. This is more delay efficient than implementing adders through the dual LUT3 mode (see details in Operating mode: Dual-LUT3).

The operating mode is designed to efficiently implement multi-bit adders.

Logic element schematic

Resource usage of the logic element operating in soft adder mode (Grey blocks and lines are unused resources).

Scan Chain

There is a built-in scan-chain in the CLB where all the sc_in and sc_out ports of LEs are connected in a chain, as illustrated in Fig. 23. When Test_en signal is active, users can readback the contents of all the D-type flip-flops of the LEs thanks to the scan-chain. When Test_en signal is disabled, D-type flip-flops of the LEs operate in regular mode to propagate datapath signal from LUT outputs.

Note

The scan-chain of CLBs are connected in a chain at the top-level. See details in Scan-chain.

Circuit Designs

I/O Circuit

QLSOFA HD FPGA share the same I/O circuit design as SOFA HD FPGA. See details at I/O Circuit.

Multiplexer

QLSOFA HD FPGA share the same multiplexer design as SOFA HD FPGA. See details at Multiplexer.

Timing Annotation

Configurable Logic Block

The path delays in Fig. 29 are listed in Table 7.

Schematic of a logic element used in QLSOFA HD FPGA

Schematic of a logic element used in QLSOFA HD FPGA

Path delays of logic element in the QLSOFA HD FPGA

Path / Delay

TT (unit: ns)

in0 -> LUT3_out[0]

0.85

in1 -> LUT3_out[0]

0.57

in2 -> B

0.60

B -> LUT3_out[0]

0.32

in0 -> LUT3_out[1]

0.90

in1 -> LUT3_out[1]

0.62

B -> LUT3_out[1]

0.33

in0 -> LUT4_out

1.17

in1 -> LUT4_out

0.89

in2 -> LUT4_out

1.21

in3 -> LUT4_out

0.79

LUT3_out[0] -> A

0.56

LUT4_out[0] -> A

0.58

A -> out[0]

0.88

A -> FF[0]

0.56

FF[0] -> out[0]

0.88

LUT3_out[1] -> out[1]

0.89

LUT3_out[1] -> FF[1]

0.56

FF[1] -> out[1]

0.89

regin -> FF[0]

0.58

FF[0] -> FF[1]

0.56

I/O Block

The path delays of I/O blocks in QLSOFA HD FPGA is same as the SOFA HD FPGA. See details in I/O Block.

Routing Architecture

The path delays in Fig. 6 are listed in Table 14.

Path delays of routing blocks in the QLSOFA HD FPGA

Path / Delay

TT (unit: ns)

A -> B

1.44

A -> C

1.44

A -> D

1.44

B -> E

1.38

SOFA CHD

Architecture

SOFA CHD FPGA share the same architecture as QLSOFA HD FPGA. See full details at Architecture.

I/O Resources

The SOFA CHD FPGA IP share the same I/O resource arragement as QLSOFA HD FPGA IP. See details at I/O Resources.

Configurable Logic Block

The SOFA CHD FPGA IP share the same Configurable Logic Block (CLB) architecture as QLSOFA HD FPGA IP. See details at Configurable Logic Block.

Circuit Designs

I/O Circuit

SOFA CHD FPGA share the same I/O circuit design as SOFA HD FPGA. See details at I/O Circuit.

Multiplexer

Routing multiplexer are designed by using a few custom cells based on the Skywater High-Density (HD) PDK, as shown in Fig. 30. The multiplexer design follows a two-level structure, which is applied to all the routing multiplexers in logic elements, connection blocks and switch blocks across the FPGA fabric.

Schematic of multiplexer design in SOFA CHD FPGA

Schematic of multiplexer design in SOFA CHD FPGA

Each primitive in the two-level structure could be a 2/3/4-input custom cell, depending on the input size of the routing multiplexer. Each custom cell is built with input inverters and transmission-gates. For instance, Fig. 31 shows the transistor-level design of a 3-input custom cell.

Detailed schematic of a 3-input custom cell in SOFA CHD FPGA

Detailed schematic of a 3-input custom cell in SOFA CHD FPGA

Note

Each routing multiplexer has a dedicated input which is connected to ground (GND) signal. When it is not used, the output will be driven by the ground, working as a constant generator.

Timing Annotation

Configurable Logic Block

The path delays in Fig. 32 are listed in Table 15.

Schematic of a logic element used in SOFA CHD FPGA

Schematic of a logic element used in SOFA CHD FPGA

Path delays of logic element in the SOFA CHD FPGA

Path / Delay

TT (unit: ns)

in0 -> LUT3_out[0]

0.86

in1 -> LUT3_out[0]

0.58

in2 -> B

0.16

B -> LUT3_out[0]

0.32

in0 -> LUT3_out[1]

0.91

in1 -> LUT3_out[1]

0.63

B -> LUT3_out[1]

0.34

in0 -> LUT4_out

1.20

in1 -> LUT4_out

0.92

in2 -> LUT4_out

0.78

in3 -> LUT4_out

0.52

LUT3_out[0] -> A

0.17

LUT4_out[0] -> A

0.18

A -> out[0]

0.48

A -> FF[0]

0.15

FF[0] -> out[0]

0.48

LUT3_out[1] -> out[1]

0.47

LUT3_out[1] -> FF[1]

0.16

FF[1] -> out[1]

0.37

regin -> FF[0]

0.15

FF[0] -> FF[1]

0.16

I/O Block

The path delays of I/O blocks in SOFA CHD FPGA is same as the SOFA HD FPGA. See details in I/O Block.

Routing Architecture

The path delays in Fig. 6 are listed in Table 16.

Path delays of routing blocks in the SOFA CHD FPGA

Path / Delay

TT (unit: ns)

A -> B

0.81

A -> C

0.81

A -> D

0.81

B -> E

0.57

Custom Cells

Skywater Custom Multiplexer Cells

Background

Traditionally, larger multiplexers are built using trees of smaller multiplexers as illustrated below:

Traditional Multiplexer Tree

Multiplexers trees lead to large power and timing constraints that limit FPGA performance. FPGA fabrics use complementary pass gate logic (CPL) to replace multiplexer trees with single level inverted transmission gate derived multiplexers, as illustrated below:

Single Level FPGA Multiplexer

Single level multiplexers are controlled through configuration SRAM cells which enable high impedance connections throughout the multiplexer hierarchy, thereby removing the need for hierarchical designs of multiplexers. Therefore, the CPL multiplexers enable increased performance and reduced power consumption throughout FPGA fabrics. Standard cells required for CPL multiplexers are not commonly included in PDKs, thereby requiring the need for custom cell creation to enable FPGA multiplexer hierarchies. The remainder of this document is dedicated to the architecture and performance evaluation of our sky130_uuopenfpga_cc_hd_invmux2_1/sky130_uuopenfpga_cc_hd_invmux3_1 custom cells generated using the Skywater 130nm PDK. A comparison is achieved by generating 4-to-1 multiplexer and 6-to-1 multiplexers made from our cells and standard cells provided within the Skywater 130nm PDK.

SKY130_UUOPENFPGA_CC_HD_INVMUX2_1
  • Usage: 2-Input Transmission Gate Multiplexer with Unity Drive Strength Inverter Input
    -Pins:
    • Q1/Q2 - Inverter Input

    • S0/S1 - NMOS Select Input

    • S0B/S1B - PMOS Select Input

  • Schematic:

Sky130_uuopenfpga_cc_hd_invmux2_1 Schematic
  • Layout:

Sky130_uuopenfpga_cc_hd_invmux2_1 Layout
  • Comparison: To demonstrate the performance gains using CPL multiplexers, we built a 4-1 single level multiplexer using our custom sky130_uuopenfpga_cc_hd_invmux2_1 cell along with a 4-to-1 multiplexer using the sky130_fd_sc_hd__mux2_1 as the root cell of the multiplexer tree.

The sky130_fd_sc_hd_mux2_1 multiplexer is built using a static CMOS structure with a single select input, whereas our cell uses a fractured select hierarchy. To perform the comparisons we have tabulated values in regards to power, area, and timing for the 4-to-1 multiplexer tree using Cadence ADE XL.

  • Power:
    • sky130_uuopenfpga_cc_hd_invmux2_1: 2.37 μW

    • sky130_fd_sc_hd__mux2_1: 3.03 μW

Our custom multiplexer provides a 22% reduction in power consumption.

  • Area:
    • sky130_uuopenfpga_cc_hd_invmux2_1: 33.78 μA2

    • sky130_fd_sc_hd__mux2_1: 33.78 μA2

Our multiplexer implementation requires equal area neglecting interconnect overhead.

  • Timing:
    • sky130_uuopenfpga_cc_hd_invmux2_1: 211.1 ps

    • sky130_fd_sc_hd__mux2_1: 304.3 ps

Our custom multiplexer provides over a 31% reduction in propagation delay.

SKY130_UUOPENFPGA_CC_HD_INVMUX2_1 Cell Characterization

SKY130_UUOPENFPGA_CC_HD_INVMUX3_1
  • Usage: 3-Input Transmission Gate Multiplexer with Unity Drive Strength Inverter Input
    • Pins:
      • Q2/Q3 - Inverted Input

      • S0/S1/S2 - NMOS Select Input

      • S0B/S1B/S2B - PMOS Select Input

  • Schematic:

Sky130_uuopenfpga_cc_hd_invmux3_1 Schematic
  • Layout:

Sky130_uuopenfpga_cc_hd_invmux3_1 Layout
  • Comparison: To demonstrate the performance gains using CPL multiplexers, we built a 6-1 single level multiplexer using our custom sky130_uuopenfpga_cc_hd_invmux3_1 cell along with a 6-to-1 multiplexer using the sky130_fd_sc_hd__mux4/2_1 as the root cells of the multiplexer tree.

To perform the comparisons we have tabulated values in regards to power, area, and timing for the 4-to-1 multiplexer tree using Cadence ADE XL.

  • Power:
    • sky130_uuopenfpga_cc_hd_invmux3_1: 2.96 μW

    • sky130_fd_sc_hd__mux2_1: 3.31 μW

Our custom multiplexer provides a 10.5% reduction in power consumption.

  • Area:
    • sky130_uuopenfpga_cc_hd_invmux3_1: 61.31 μA2

    • sky130_fd_sc_hd__mux2_1: 48.80 μA2

The Skywater multiplexer provides a 20% reduction in area.

  • Timing:
    • sky130_uuopenfpga_cc_hd_invmux3_1: 272.6 ps

    • sky130_fd_sc_hd__mux2_1: 374.2 ps

Our custom multiplexer provides over a 27% reduction in propagation delay.

SKY130_UUOPENFPGA_CC_HD_INVMUX3_1 Cell Characterization

Contacts

General Questions

Prof. Pierre-Emmanuel Gaillardon

pierre-emmanuel.gaillardon@utah.edu

Technical Questions about OpenFPGA

Prof. Xifan Tang

xifan.tang@utah.edu

Technical Questions about Physical Design

Ganesh Gore

ganesh.gore@utah.edu

Acknowledgment

_images/uofu_logo.png
_images/lnis_logo.png

Supported by DARPA PoSH program

_images/darpa_logo.png

For more information on the OpenFPGA see openfpga_doc or openfpga_github

For more information on the VPR architecture description language see xml_vtr

For more information on the Skywater 130nm PDK see skywater_pdk_github

Indices and tables