FPGA USE CASES · HARDWARE ACCELERATION

What Can FPGAs Really Do? Logic Gluing, Real-Time Control, Signal Processing, and SoC - Why FPGAs Outperform DSP and ASIC for Complex Algorithms

Beginners often ask: what is the true scope of FPGA applications? The answer has evolved over time. Early FPGAs were used for logic gluing (connecting disparate chips). Then real-time control gave FPGAs a practical role. Flexible protocol implementation made FPGAs highly adaptable. Signal processing elevated FPGAs into high-end systems. And now, System-on-Chip (SoC) integration allows FPGAs to potentially replace many other components. However, a deeper question remains: can an FPGA handle video compression (JPEG, H.264) efficiently compared to dedicated DSPs or ASICs? This article examines FPGA processing models (parallel vs. pipelined) and explains why FPGAs are excellent for complex algorithms, while also noting where they struggle (e.g., sequential file system management).

1. The Evolution of FPGA Applications

FPGAs have transitioned through several major application phases:

Key insight: FPGAs are not a magic solution for every problem, but for tasks requiring high throughput, low latency, and flexibility, they often beat fixed-function ASICs or sequential DSPs.

2. Parallel vs. Sequential Processing: Why FPGAs Excel

Figure 1 (conceptual) shows a typical controller or processor (CPU, DSP) executing tasks sequentially. Software's inherent sequential nature means each output requires multiple steps (e.g., fetch, decode, execute, write). If one output takes 5 time units, four outputs take 20 units. Even with hardware accelerators like DMA, the coordination overhead limits speed.

[Processing flow diagram: Sequential processor takes T=5 per output, 4 outputs = 20T]

Figure 2 illustrates FPGA parallel processing: multiple independent data paths operate simultaneously. The same four outputs can be completed in 5 time units - a 4x speedup. However, parallelism consumes more logic resources (area = money). Example: a 4-channel FIR filter implemented with 4 parallel multipliers uses 4x the DSP slices of a single-channel design.

[Parallel processing diagram: 4 paths active concurrently, 4 outputs = 5T]

Figure 3 shows pipelined processing: the classic compromise. Data flows through stages (e.g., stage1: input, stage2: multiply, stage3: accumulate, stage4: output). After the pipeline fills, one output emerges per clock cycle. Resource usage is roughly 1/4 of full parallel implementation, while throughput remains near-parallel. For video compression (JPEG, H.264), pipelining is the ideal technique: first frame might have higher latency, but subsequent frames output at wire speed.

[Pipelined processing diagram: 4-stage pipeline, after filling, one output per cycle]

This is why FPGAs excel at real-time video coding. A Xilinx Zynq UltraScale+ can implement a H.264 encoder pipeline using its DSP48E slices (25x18 multipliers) and block RAM for line buffers, achieving 1080p60 encoding at under 10W - impossible for a general-purpose CPU or even many DSPs.

3. Case Study: JPEG and H.264 Compression

Traditional approaches for video compression include:

FPGA compression designs use on-chip memory (Block RAM) for pixel buffers and transform coefficients, DSP slices for DCT/quantization, and state machines for entropy coding (CABAC/CAVLC). For example, an Arria 10 (10AX115S2F45I2SG) can implement a full H.264 baseline encoder consuming ~40% of logic and 60% of DSP slices, handling 1080p at 60fps with <15W.

Key takeaway: No matter how complex the algorithm or high the real-time requirement, FPGAs are sufficient - especially using pipelining. The first output may have higher latency (acceptable in most systems), but subsequent outputs are continuous.

4. Memory and Arithmetic: Internal Resources vs. External

Real-time processing depends heavily on memory. External DRAM (DDR3/DDR4) adds complexity and reduces speed due to access latency and bus contention. FPGAs leverage block RAM (BRAM) and UltraRAM (in UltraScale+) for low-latency, deterministic storage. For arithmetic:

FPGA vendors provide IP cores for common math functions (e.g., Xilinx Floating-Point Operator, Intel ALTPIPE). For custom designs, designers implement CORDIC or LUT-based approximations.

5. What FPGAs Are NOT Good For

While it is true that "there is no job an FPGA cannot do", there are tasks that FPGAs are poorly suited for. Specifically, highly sequential, control-intensive algorithms with unpredictable data dependencies. Example: file system management (FAT32, exFAT on an SD card). Such tasks involve many conditional branches, scattered memory accesses, and state machine hell. A single huge state machine can be built in an FPGA, but it becomes extremely complex to design, debug, and maintain. For these tasks, a small microcontroller (ARM Cortex-M, RISC-V) or a soft-core processor (MicroBlaze, Nios II) inside the FPGA is a better approach.

Thus, FPGAs are best for dataflow-dominated algorithms (streaming video, digital filtering, FFT, packet processing) rather than control-dominated algorithms.

6. Recommended FPGA Part Numbers for Common Applications

Application DomainRecommended FamiliesExample Part NumbersKey Features
Logic gluing / low-cost controlIntel Cyclone IV, Xilinx Spartan-6EP4CE15F23I7N, XC6SLX9-2FTG256ILow LUT count, legacy I/O standards
Real-time motor controlIntel Cyclone V, Xilinx Artix-75CEFA7F23I7N, XC7A35T-1FTG256CBuilt-in ADCs (Xilinx XADC), PWM generators
High-speed protocol bridge (PCIe, JESD204B)Xilinx Kintex-7, Intel Arria 10XC7K325T-2FFG900I, 10AX115S2F45I2SGSerDes up to 12.5 Gbps, PCIe Gen3 hard IP
Video compression (JPEG, H.264)Xilinx Zynq UltraScale+, Intel Cyclone V SoCXCZU9EG-2FFVB1156I, 5CSXFC6D6F31I7NARM cores + FPGA fabric, DSP slices, BRAM
Radar / medical imaging (high DSP density)Xilinx Virtex UltraScale+, Intel Stratix 10XCVU13P-2FIGD2104E, S10MX (1SG280HU2F50E2VG)Thousands of DSP slices, high-bandwidth memory (HBM)

7. Cost-Benefit: FPGA vs. DSP vs. ASIC

For the H.264 encoder example, an FPGA solution (e.g., Xilinx Zynq) may cost $50-150 per chip in volume, while an ASIC solution (if >500k units) could drop to $10-20. But the ASIC design effort (12-24 months) and risk may not be justified for a product with uncertain lifetime.

Need an FPGA for your video compression or signal processing project?

LimChip supplies Xilinx, Intel, and Lattice FPGAs, development boards, and IP cores. We can help you select the optimal device for your performance, power, and budget requirements.

Contact LimChip for FPGA Selection →