Project 2 Creating Custom IP for ZYNQ's processing system

An introduction to Zynq APSoC Design Flow

17073

Introduction

This project presents a simple digital system that includes both a custom IP block in the FPGA, and control software running on the ARM. Vivado’s “IP Integrator” tool is introduced and used to define the hardware system.

Submission Form 2
Problem Set 2

Note: “IP block”, or intellectual property block, refers to a reusable logic circuit that belongs to some individual or entity. New, original IP blocks can be created in Vivado, and preexisting IP blocks can be downloaded and/or included using the Vivado IP Manager. Some of the downloadable IP is produced by Xilinx, and some by third parties; some IP is free and can be included in any design, and some requires a purchased license. Visit the Xilinx Intellectual Property page for more information.

The IP Integrator tool allows IP blocks to be created (or selected from the IP library) and attached to the ARM’s peripheral bus. Any newly created IP blocks can be defined and implemented using Vivado/Verilog. Once the entire hardware system has been defined, the ZYNQ/Blackboard can be configured, and the SDK tool can be used to write system software. Before starting a system design like this, you should be comfortable using the ARM/SDK and FPGA/Vivado systems independently.

Background

This project introduces all the steps required to configure the entire ZYNQ system. You are already familiar with configuring and using the ARM and the FPGA independently, so what remains is configuring these two major components to work together. The processor accesses FPGA circuits by reading and writing registers attached to the processor’s peripheral bus, so any FPGA circuits that will work with the processor must include such registers. In addition to the processor-accessible registers, the FPGA circuit can include as many features as desired.

As an example, consider Blackboard’s seven-segment display device (7sd). The 7sd is connected to pins on the FPGA, so the processor must write to registers in the FPGA to access the display. The registers could simply pass data through the FPGA, leaving the processor to deal with display timing by writing to individual bits in the anode register (i.e., the processor would write the 4-bit data values 0001, 0010, 0100, 1000 to the anode register in succession to turn on the digits, one at a time, at least 60 times per second). The processor would also need to write cathode data synchronous to driving the anodes, to cause the correct data to be displayed in each digit (this is shown in the “simple” controller on the lower left). Of course, a better system design would put 7sd timing control in a hardware circuit, thereby relieving the processor from having to control the display in real-time. In this case, the processor would only need to interact with data registers, and the hardware circuit would take care of getting the right data to the cathodes at the right time (shown in the “typical” controller on the lower right).

Figure 1. Two different ways to build a 7-segment IP block - one simple methods that forces the processor to control timing, and a better method that creates the required timing signals, and only consumes display data.

The ARM uses a 32-bit (4GByte) bus. In the ZYNQ system, the lower 1GByte of the ARM bus is reserved for external DDR memory, the middle two GBytes are routed into the FGPA, and the upper 1 GByte is used for accessing on-chip peripherals (like the USB controller, Ethernet controller, interrupt controller, etc.). Thus, any 32-bit address with the two most significant bits set to 01 or 10 target the FPGA. Users can define an FPGA circuit to decode an address within that 2GByte range, and select a register for read or write access. (Note: Although it is possible to manually connect FPGA IP blocks to the system bus, it is far easier to let the IP Integrator tool create the needed bus connections - see the tutorial below).

Figure 2. ARM’s main bus and IP blocks

The ZYNQ uses the industry-standard “Advanced Microcontroller Bus Architecture” (AMBA) bus to connect the ARM and the FPGA. AMBA, an open standard introduced by ARM in 1996, is the most widely used bus for connecting any microcontroller to any peripheral device. It is now recognized as the de-facto standard embedded controller system bus. More recently (in 2010), the AXI4 bus interface standard was added to specify a simpler bus more suited for use inside programmable devices. The ZYNQ devices use the AXI4 bus to connect the ARM and FPGA. You can read more about the AMBA and AXI bus at the AMBA wiki page, in Xilinx user guide UG761, and in various other web resources. Xilinx has also produced an introductory ZYNQ book that you can feely download here: ZYNQ book

Figure 3. The ZYNQ book

In order to use the ZYNQ system effectively, you need to understand the AXI bus, and how to configure it for use with custom IP bocks. This project illustrates how to setup a ZYNQ system that includes a custom IP block, and the next project looks more closely at the AXI bus.

The ZYNQ SoC includes many interfaces, ports, and peripheral circuits, all of which must be properly configured for the chip to function properly in a given system. The system designer defines many of the required configuration settings at the time the circuit board is created, based on what physical devices are available on the ZYNQ system board (for example, the amount and speed of external memory, the main clock frequency, which peripherals are connected to which pins and ports, etc.). These settings can be defined using a tool in the Vivado IDE, and they can be saved in a configuration file (a .tcl script) that can be applied at startup so board users don’t need to recreate all the hardware-specific settings. Such a configuration file has been created for the Blackboard, and in the tutorial below you will download and apply the file to configure the board.

ZYNQ System overview
GPIO reference manual
Verilog Primer
Xilinx ZYNQ documentation page

After applying the settings from the .tcl file provided, you can see an overview of the ZYNQ system, and examine configuration details. Take a few momements to examine the system configuration, and note the IP blocks that constitute the processing system. Many IP blocks connect to external pins through the MIO interface, and some connect to the FPGA (via the EMIO interface). You can see the pin connections for the QSPI ROM, the UART, the SPI bus, etc., and you can check default settings for some of the ports as well. You can also examine system clock configurations - the main system clock input is set to 33.3333Mhz (if you check the ZYNQ schematic, you’ll find that is the oscillator frequency loaded on the Blackboard). The 33.33MHz clock sets the base frequency from which all peripheral controller clock are derived. Note the DDR controller clock is set to 533.33MHz, the QSPI module is configured at 200MHz, and the first clock going to the FPGA is set at 100MHz (the FPGA also has in independent, external clock input, also at 100MHz). All of these settings can be changed, but if you change them for the Blackboard, it is likely your system will not work (so don’t).

When the ZYNQ system is programmed with a hardware definition file (.xsa), the settings in the .tcl script are an integral part of the overall chip configuration. But they are not the only part – more information can be added to further specify a given hardware system. In our case, we will define a new IP block and add it to the project, add AXI bus addresses so the ARM can access the IP block, define the IP behavior/function (this is the FPGA hardware configuration that forms the IP block), and then create a new .hdf file that can configure the entire ZYNQ hardware system. Once the .hdf file is complete, it can be used to program the ZYNQ hardware system, and then the SDK tool can be used to write software for the configured hardware system.

Tutorial Part 1

Tutorial Part 2

The ZYNQ device also contains an Artix FPGA. So far, you’ve considered the FPGA from a relatively abstract, high-level perspective – you’ve written and implemented Verilog projects, and trusted that somehow, after running the “implement” app in the IDE, the FPGA would behave according to your design. In upcoming projects, you’ll need to know more about what resources are available in the FPGA. The reading list is a good start to learning more about the FPGA hardware.

Artix FPGA reading list

Requirements

1. Verify the Demo Project

Follow this TUTORIAL: First Zynq Soc Project and then run the demo project on your blackboard. If you configured everything correctly, you should see the demo program; the first 4 non-RGB LED’s on the blackboard should turn on and off at regular intervals.

2. Modify your LED controller to have 8 channels

Modify the IP core to have 8 controllable LED channels controlled by a single register. Write a c test program to demonstrate you can control all 8 LEDs independently.

3. Add an enable functionality to your LED Controller

Use a second axi-connected register within the same IP core to enable and disable all of the LED’s. Use the first bit in the register as the enable bit and ignore all others in the register. When designing any IP with a memory map, you should document what is accessed at what address. Often it is a good idea to make note on how peripherals are initialized at reset and how they respond to bus reads and writes.

An example of a detailed memory map for this requirement is given below.

Address	Reg Name	Valid Bits	Reset Values	Description
BASE_ADDR + 0	LED_DATA	[7:0]	8b00000000	When enabled a 1 in a bitfield will drive the corresponding LED.
BASE_ADDR + 4	LED_CTL	[0:0]	1b0	setting bit zero enables all LED channels, clearing the bit disables all channels.

4. Write C functions for your LED controller

Hardware IP cores designed to be modular can be written once and used in many designs. Thus, it is a good idea to write similarly modular code that can be reused in different systems along with the hardware module. For every IP core you design, you should write code that controls the core’s basic functionality. Once you design an IP core and write code to control it you can reuse both hardware any subsequent projects and systems. To make reuse easier, document both the core’s hardware behavior and its software well. For a software interface it can be useful to note how a function might modify the state of the peripheral. Being detailed in both your hardware and software interfaces can make debugging and building larger systems easier.

Below is the start of a header file, led_ctl.h that defines the module base address, a macro to access the data register, and a function prototype. Make sure you set the base address for the IP block to that of your module instance in your block design.

//make type uint32_t availible
#include <std_int.h>

//Define base address used for register access
//(MAKE SURE TO SET THIS TO YOUR DESIGN's BASE ADDRESS)
#define LED_BASE 0x00000000


//define macros to access registers

///This macro allows accessing a memory address without pointer manipulation:
//dereference the address (LED_BASE (+0) )
//make sure accesses are on 32-bit address boundaries;
//address offset must be a multiple of 4 (0,4,8,C)
#define LED_DATA (*((uint32_t *)(LED_BASE+0x0)))

//function prototypes below will be made available to c source files that include this header:
uint32_t read_led_data(void);

Provided below is an function that implements the prototype found in the header. Place this in a .c source file, to keep it’s name relevant to the header. For example, You could call it led_ctl.c

//include header so we can use it's macros
#include "led_ctl.h"

uint32_t read_led_data(void)
{
	uint32_t val;

	//use macro to read from (LED_BASE) and store value into a variable.
	val = LED_DATA;

	//we only want data from the lower 8  bits of the register
	val &=0xFF;

	return val;
}

Complete the example header and source files and implement the following functions:

uint32_t read_led_data(void) : (provided above) returns the value of the LED’s as a 32-bit value
void write_led_data(uint32_t data): writes the lower 8-bits of the passed parameter to the LEDs.
void clear_led(void) : sets the value of all LEDs to zero (does not change the enable bit)
void enable_led(void) : enables the output of all LEDs (does not change value of LED data reg)
void disable_led(void) : disables the output of all LEDs (does not change values)

Write a program that uses your functions to control all 8 LED’s individually and illustrate the enable/disable functionality. Use the Vitis debugger to step over your functions and observe the changes to the LEDs.

5. Control the 7-segment display using the ARM processor

Write an AXI-connected IP that allows you to write values to drive the anodes and cathodes of the seven-segment display. You can use one AXI register for holding the cathode values (7 plus the decimal point) and another one for the anodes.

With direct control over the segments and digit enables you can now write code for the processor to display multiple digits on the display.

The program below shows a control flow that could be used to display different information on each digit. To use it you will need to complete it by accessing the right anode and cathode registers and by implementing a delay(you can use software or a zynq-PS timer). Try different values for the delay period; Note what happens if the delay gets too long. You should aim for a delay just long enough to prevent flicker of the digits.

//program to cycle data between 7-segment display digits
int main(void)
{
	//8-bit value (7-seg plus dp) for each digit
	//This is a local array (not a memory mapped register)
	char cath_val[4];

	//variable holding which digit is begin addressed
	int digit_num;

	//initial value of data to write to each digit
	//note this are arbitrary combinations of segments
	cath_val[0] = 0b11110000;
	cath_val[1] = 0b00001111;
	cath_val[2] = 0b11001100;
	cath_val[3] = 0b00110011;

	for(;;) //main loop
	{
		for(digit_num = 0;digit_num<4;digit_num++)
		{
			//enable desired anode

			//write cathode values for this digit

			//delay for some time
		}
	}
}

Use the example code, or write your own program to show unique values on the 7-segment display. Make sure the values are not the same as the ones found in the example code.

This approach to controlling the 7-segment display has a few drawbacks: Displaying different data on all digits requires both the anode and cathode values to change at a rate faster than the eye can perceive (a timing constraint). In the above exapmle program, the entirety of the processor’s workload is devoted to updating the display constantly. In a program where the processor is sharing processing time between multiple loads, it may not be able to update the display quick enough to prevent flicker from being visible.

Additionally, to continually change the values of the segments of the display, the processor is accessing the AXI-connected registers. This can incur memory access penalties (due to the difference in speeds between the processor and the axi bus). It can also take up memory bandwidth which could be used for other axi peripherals. In the case above the data being written over the bus is largely redundant as well.

6. Design a more complete AXI-connected 7-segment display controller

Redesign your 7-segment controller IP so the processor is not required to control the timing of the anodes. Use AXI-connected registers to hold the data to be displayed for each digit’s cathode segments. For each digit you will need 8-bits (7 segments plus decimal point), thus 32-bits for the entire display. You could store one digit’s data in individual registers or pack it into a single register, both approaches have pros and cons. Your design does not need a register to control the anodes, as those will be decoded in hardware. Note, This requirement doesn’t ask for the decoding of segment data into BCD or hex digits you just need control of turning on and off individual segments.

Your new design will remove the timing constraint from your software program and instead place it in hardware. Dedicated hardware not only allows offloading processing, it also allows more ‘reliable’ timing. Depending on the design this could be critical; For example, a video display controller with timing implemented using software only might monopolize the processor. If the processor can’t keep up with the timing requirements of the video display, the screen output could be corrupted.

Write functions to control your IP core. Make sure to separate them into their own header and source pair (not in the led controller’s source files); In general, each peripheral should have its own source file for basic functions.

Using your functions, write a c program that displays different data on each segment and verify your IP design works.

7. Display Numerical data on the 7-segment display using software

Design a program that counts a value up at a fixed rate. Display the count value (0-9999) as a decimal (Base 10) number on the 7-segment display, when the count reaches 9999 roll it over to zero again. You can use software delays to make the counting visible. To display a decimal value on the display you will need to do two things:

Extract individual decimal digits for display
Encode BCD values into pure segment data to be written to your IP’s memory-mapped registers

Extracting BCD Digits from a binary value

Remember that processors store and operate on pure binary numbers instead of Binary Coded Decimal (BCD). BCD manipulation instructions exist for some processors, but are almost never used by C compilers. Thus to display a base-10 number on your display using a c program, you will have to process the original number.

Were the data to be displayed as hexadecimal, each digit could be extracted by shifting and masking. The following illustrates breaking up a hex digit into 4 differnt digits stored in an arry

int a_value;
char hex_val[4];
a_value = 0xABCD;
hex_val[0] = a_value&0xF;	//extracts 0xD
hex_val[1] = (a_value>>4)&0xF;	//extracts 0xC
hex_val[2] = (a_value>>8)&0xF;	//extracts 0xB
hex_val[3] = (a_value>>12)&0xF;	//extracts 0xA

In the above code, hex digits are moved into place through shifting by four (each digit is 4-bits wide), and then the lowest 4-bits are ‘isolated’ by masking all other bits. Interestingly, this is equivalent to a division by a power of 16 and then taking the remainder of a division with 16 (a shift to the right is a truncated division by 2^n, thus multiples of n==4 would be 16, 256, 4096, etc). Thus, behaviorally this operation could also be done using division and modulus operator. The below code illustrates how to accomplish this.

a_value = 0xABCD;
hex_val[0] = a_value % 16;		//extracts 0xD
hex_val[1] = (a_value/16) % 16;		//extracts 0xC
hex_val[2] = (a_value/256) % 16;	//extracts 0xB
hex_val[3] = (a_value/4096) % 16;	//extracts 0xA

Take a minute to convince yourself that this works the same as the previous code block. Now using a similar method devise a way to extract BCD digits from a binary value. You will need to accomplish a “shift and mask” on a decimal number. Think of how number bases work and what makes the above example code work for base 16.

This kind of programming technique isn’t exclusive to seven segment displays. In fact it is very common. For instance, when converting a integer to a string representing it, digits are extracted, converted to their ascii value, then appended to a string. The standard C printf function uses similar techniques to print integers to stdout, for example.

Once you can extract individual BCD/hex digits, you can encode them into data to drive the cathodes of a 7-segment digit.

Decoding Numerical data to Digit Data

Since the cathodes of the individual digits are driven directly from the value stored in their data registers, to display numerical data on the display you will need to convert a individual digit’s value into the necessary bit-pattern. This could be done using if-else and switch structures, however this is not an optimal solution for many reasons. In our case every input value has a unique, corresponding bit-pattern. This one-to-one mapping is a perfect case for using a lookup table (LUT). Lookup tables are fast and code-size efficient, but are often limited only to sets of data where they can be feasibly implemented. In our case we are mapping a BCD number to 10 different bit-patterns (for 0-9).

You can easily implement software lookup tables in your c program. Just define an array with constant data (in the example below, arbitrary values are used).

//10-entry static array as a lookup table (digits 0-9)
//(array is 8-bit wide and 10-entry deep)
//You have to fill in the table's data yourself!
char bcd_lut[10] = { 0b11110000,0b00110011,etc};

You can access this as any other array. Once populated with the correct segment data, it can be used to decode an integer value into the corresponding segment data to display it:

char val;
val = bcd_lut[5]; //read the 8-bit digit data for 5 into val

When using a variable as an input to the LUT, you should be careful to stay within the bounds of the array. An easy way to do this is to use the modulus operator. This provides the remainder of a division. Modding with the size of an array ensures the accessed index is always within bounds, however requested indexes greater or equal to than the mod value will ‘rollover’; 4%4==0, 4%5==1, 4%6==2, 4%7==3, 4%8==0, etc. While this will prevent accessing out-of-bounds indices, be aware of logic errors (What does it mean to access an out of bound value in a LUT). This method should be thought of as a safeguard against out-of-bounds access. You always should be cognicent of how you access arrays.

int ind;
ind = 4;
val = bcd_lut[ind%10]; //effectively accesses bcd_lut[4]
ind = 12;
val = bcd_lut[ind%10]; //accesses bcd_lut[2]

To display hex digits you could create another lut array, containing 16 patterns (one for each potential digit. Note that the BCD digits are a subset of the hex digits, thus you only need one LUT for both (BCD would only use 0-9, where hex would use 0-F).

Challenges

1. Create a Memory-Mapped 7-segment display controller that decodes hexadecimal in hardware.

Design and implement an AXI-connected IP core that can display data held in it’s registers as hexadecimal numbers on the seven segment display. Have four ‘data’ registers, one for each digit, which hold the 4-bit value to display to display on a corresponding digit of the display. Use a fifth register as the ‘control’ register; Use the control register to disable and enable individual digits of the display (regardless of the values held in the data registers).

2. Create a 7-segment display controller with both hexadecimal and ‘raw’ modes

Design a new IP or modify existing IP to create a 7-segment controller that can display either hexadecimal or ‘raw’ data on a digit by digit basis; Each digit can be separately be configured to show a decoded hex-digit from 4-bit value, or the segments can be driven from the raw 8-bit data in a register. Allow each digit to be separately disabled and enabled (regardless of its configuration and value).

3. Create a 7-segment controller that Decodes BCD digits in hardware

Create a new IP or modify your existing 7-segment controller to have a BCD display mode. In this mode, the controller should have a single data register which a binary value can be written to. This value should be shown on the 7-segment display as a decimal number from 0-9999, thus the data register should hold hex values 0x0-0x270F.

4. Connect the XADC AXI IP to your block design

Add an XADC Wizard IP to your block design. In the block design configure the XADC for single channel mode and to read from the internal temperature sensor. Write a program to configure the XADC to continually make conversions, then read data from the ADC to be shown on the 7-segment display. Format the output data so you can see a rise in temperature on the display when you place your finger on top of the zynq chip on your board.