What Can FPGAs Really Do? Logic Gluing, Real-Time Control, Signal Processing, and SoC - Why FPGAs Outperform DSP and ASIC for Complex Algorithms
Beginners often ask: what is the true scope of FPGA applications? The answer has evolved over time. Early FPGAs were used for logic gluing (connecting disparate chips). Then real-time control gave FPGAs a practical role. Flexible protocol implementation made FPGAs highly adaptable. Signal processing elevated FPGAs into high-end systems. And now, System-on-Chip (SoC) integration allows FPGAs to potentially replace many other components. However, a deeper question remains: can an FPGA handle video compression (JPEG, H.264) efficiently compared to dedicated DSPs or ASICs? This article examines FPGA processing models (parallel vs. pipelined) and explains why FPGAs are excellent for complex algorithms, while also noting where they struggle (e.g., sequential file system management).
1. The Evolution of FPGA Applications
FPGAs have transitioned through several major application phases:
- Logic Gluing: In the 1990s, FPGAs connected different logic families and filled small gaps between ASICs. Example: Xilinx XC4000 series (now obsolete).
- Real-Time Control: Industrial automation, motor control (PWM generation, encoder feedback) using devices like Altera Cyclone IV (EP4CE15) or Xilinx Spartan-6 (XC6SLX9).
- Flexible Protocol Implementation: Implementing custom interfaces (I2C, SPI, UART, PCIe, JESD204B) that can be modified without hardware respin. Modern examples: Xilinx Artix-7 (XC7A100T) with 6G transceivers for PCIe Gen2/Gen3.
- High-End Signal Processing: JPEG compression, H.264/H.265 encoding, radar beamforming, medical imaging. Devices like Xilinx Kintex-7 (XC7K325T) or Intel Arria 10 (10AX115) with integrated DSP slices.
- System-on-Chip (SoC): Hard ARM processors plus FPGA fabric, e.g., Xilinx Zynq-7000 (XC7Z020), Zynq UltraScale+ (XCZU9EG), Intel Cyclone V SoC (5CSXFC6), Arria 10 SoC.
2. Parallel vs. Sequential Processing: Why FPGAs Excel
Figure 1 (conceptual) shows a typical controller or processor (CPU, DSP) executing tasks sequentially. Software's inherent sequential nature means each output requires multiple steps (e.g., fetch, decode, execute, write). If one output takes 5 time units, four outputs take 20 units. Even with hardware accelerators like DMA, the coordination overhead limits speed.
Figure 2 illustrates FPGA parallel processing: multiple independent data paths operate simultaneously. The same four outputs can be completed in 5 time units - a 4x speedup. However, parallelism consumes more logic resources (area = money). Example: a 4-channel FIR filter implemented with 4 parallel multipliers uses 4x the DSP slices of a single-channel design.
Figure 3 shows pipelined processing: the classic compromise. Data flows through stages (e.g., stage1: input, stage2: multiply, stage3: accumulate, stage4: output). After the pipeline fills, one output emerges per clock cycle. Resource usage is roughly 1/4 of full parallel implementation, while throughput remains near-parallel. For video compression (JPEG, H.264), pipelining is the ideal technique: first frame might have higher latency, but subsequent frames output at wire speed.
This is why FPGAs excel at real-time video coding. A Xilinx Zynq UltraScale+ can implement a H.264 encoder pipeline using its DSP48E slices (25x18 multipliers) and block RAM for line buffers, achieving 1080p60 encoding at under 10W - impossible for a general-purpose CPU or even many DSPs.
3. Case Study: JPEG and H.264 Compression
Traditional approaches for video compression include:
- Dedicated ASIC: e.g., HiSilicon HI3519, TI DM8148 (ARM+DSP). These are highly efficient but require expensive NRE, long development cycles, and "only experts can handle them" as one project manager noted. Cost is not just monetary - engineer time to master a complex new environment is substantial.
- DSP-only: Single DSP (e.g., TI TMS320C6678) may struggle with full HD H.264 encoding due to sequential bottlenecks. Many designs use DSP+FPGA combination.
- FPGA-only: Completely viable. Search for "JPEG encoder FPGA" or "H.264 FPGA" yields many open-source and commercial cores. Xilinx and Intel provide reference designs (e.g., Xilinx H.264/H.265 IP core).
FPGA compression designs use on-chip memory (Block RAM) for pixel buffers and transform coefficients, DSP slices for DCT/quantization, and state machines for entropy coding (CABAC/CAVLC). For example, an Arria 10 (10AX115S2F45I2SG) can implement a full H.264 baseline encoder consuming ~40% of logic and 60% of DSP slices, handling 1080p at 60fps with <15W.
4. Memory and Arithmetic: Internal Resources vs. External
Real-time processing depends heavily on memory. External DRAM (DDR3/DDR4) adds complexity and reduces speed due to access latency and bus contention. FPGAs leverage block RAM (BRAM) and UltraRAM (in UltraScale+) for low-latency, deterministic storage. For arithmetic:
- Addition, subtraction, multiplication: Use DSP slices or LUT-based logic. Xilinx 7-series DSP48E supports 25x18 multiply-accumulate; Intel Arria 10 DSP blocks support 27x27 multiply.
- Square root, power, exponential: These are not native in FPGA logic. Use CORDIC algorithms (iterative) or lookup tables (LUT ROM). For video compression, sqrt is rarely needed; quantization tables are precomputed.
FPGA vendors provide IP cores for common math functions (e.g., Xilinx Floating-Point Operator, Intel ALTPIPE). For custom designs, designers implement CORDIC or LUT-based approximations.
5. What FPGAs Are NOT Good For
While it is true that "there is no job an FPGA cannot do", there are tasks that FPGAs are poorly suited for. Specifically, highly sequential, control-intensive algorithms with unpredictable data dependencies. Example: file system management (FAT32, exFAT on an SD card). Such tasks involve many conditional branches, scattered memory accesses, and state machine hell. A single huge state machine can be built in an FPGA, but it becomes extremely complex to design, debug, and maintain. For these tasks, a small microcontroller (ARM Cortex-M, RISC-V) or a soft-core processor (MicroBlaze, Nios II) inside the FPGA is a better approach.
Thus, FPGAs are best for dataflow-dominated algorithms (streaming video, digital filtering, FFT, packet processing) rather than control-dominated algorithms.
6. Recommended FPGA Part Numbers for Common Applications
| Application Domain | Recommended Families | Example Part Numbers | Key Features |
|---|---|---|---|
| Logic gluing / low-cost control | Intel Cyclone IV, Xilinx Spartan-6 | EP4CE15F23I7N, XC6SLX9-2FTG256I | Low LUT count, legacy I/O standards |
| Real-time motor control | Intel Cyclone V, Xilinx Artix-7 | 5CEFA7F23I7N, XC7A35T-1FTG256C | Built-in ADCs (Xilinx XADC), PWM generators |
| High-speed protocol bridge (PCIe, JESD204B) | Xilinx Kintex-7, Intel Arria 10 | XC7K325T-2FFG900I, 10AX115S2F45I2SG | SerDes up to 12.5 Gbps, PCIe Gen3 hard IP |
| Video compression (JPEG, H.264) | Xilinx Zynq UltraScale+, Intel Cyclone V SoC | XCZU9EG-2FFVB1156I, 5CSXFC6D6F31I7N | ARM cores + FPGA fabric, DSP slices, BRAM |
| Radar / medical imaging (high DSP density) | Xilinx Virtex UltraScale+, Intel Stratix 10 | XCVU13P-2FIGD2104E, S10MX (1SG280HU2F50E2VG) | Thousands of DSP slices, high-bandwidth memory (HBM) |
7. Cost-Benefit: FPGA vs. DSP vs. ASIC
- ASIC: Lowest per-unit cost at high volume (millions), highest NRE (mask sets, $10M+). Best for fixed-function mass production.
- DSP: Moderate NRE, easier programming (C language), but sequential bottleneck limits throughput for video. TI's C66x can do ~40 GMAC/s; an FPGA like Arria 10 can do >1000 GMAC/s.
- FPGA: Zero NRE (except development time), flexible, high performance. Per-unit cost higher than ASIC but lower than DSP for same throughput. Ideal for medium volume (100-100k units) or rapidly evolving algorithms.
For the H.264 encoder example, an FPGA solution (e.g., Xilinx Zynq) may cost $50-150 per chip in volume, while an ASIC solution (if >500k units) could drop to $10-20. But the ASIC design effort (12-24 months) and risk may not be justified for a product with uncertain lifetime.
Need an FPGA for your video compression or signal processing project?
LimChip supplies Xilinx, Intel, and Lattice FPGAs, development boards, and IP cores. We can help you select the optimal device for your performance, power, and budget requirements.
Contact LimChip for FPGA Selection →