Project 1 VGA Display System

Verilog-based VGA contoller that uses external IP

3042

Introduction

This project presents a scanned (or raster) display system and a display system controller.

Virtually all of today’s displays are digital, whether they are simple systems that display relatively static alphanumeric characters, or complex graphic systems that display rapidly changing picture or video data. Digital displays decompose images into small “picture elements”, or pixels, that each drive a very small part of the display. If the pixels are bright enough and small enough, then the unaided human eye cannot discern them individually, and a smooth, continuous, and life-like image results. As pixels get larger, “pixelization”, or jagged edges and contours become more prominent, and the perceived image quality decreases. In practice, pixels that occupy less than about one-thirtieth of a degree in the human visual field cannot be perceived individually.

To create the appearance of smooth, fluid motion, pixels must be updated (or refreshed) faster than the response rate of the human eye. In practice, when pixels are updated at 60Hz or above, no flicker or motion artifacts will be perceived.

The collection of pixels displayed on a given monitor/display is called a “frame”. It is simpler to build a system that refreshes all pixels in a frame all the time, rather than attempting to update only the pixels that have changed. A typical refresh controller transfers 60 frames of data to the display each second, and this requires a continuous and high-bandwidth data stream for even moderate resolutions. Consider for example an older, 640 x 480 pixel VGA display. If each of the 300K+ pixels is defined by one byte of data to define brightness and/or color, then almost 20Mbytes of data must be moved to the display each second. In a 1080P display using 16-bit pixel data, 1920 x 1080 x 2 x 60 = 248Mbytes per second must be sent to the display (that’s a lot!).

Now consider a photograph displayed on a VGA display with 8-bit pixel data. Photograph data requires 640 x 480 = 307Kbytes, and each byte must be sent to the display every 16ms (1/60Hz = 16ms). If a processor were used to move the data, it would need to read each byte from memory, and write each byte to the video port. Each memory read operation would require at least three CPU clock cycles, and each video port write would require three more. Further, each byte’s address would need to be calculated (by incrementing an address inside of a loop), and that would require at least a load instruction, a store instruction, a compare to a loop value, an increment, and a branch instruction. Each of these instructions would need to fetched as well, so conservatively, perhaps another 12-14 CPU clock cycles. When accounting for all the required operations, at least 20 CPU clock cycles would be required to move one byte from memory to the display, so moving 20Mbytes each second would require 400M CPU cycles per second. That’s more bandwidth than many CPUs can support, and a huge fraction of any CPU’s bandwidth. And that’s just for servicing an old, low-resolution display. A more typical 1080P display would require more than 4G cycles per second, and our current generation of processors cannot run that fast. Clearly, the CPU cannot support the required data movement requirements – instead, a special and dedicated video controller circuit is needed.

Background

All modern computer/information displays are raster displays, meaning the displayed image is created by writing one line at a time, pixel by pixel, from the top of the display to the bottom. One display frame includes all display lines, and typically 60 frames per second are sent to the display. As an examples, a VGA display uses 480 lines with 640 pixels per line, and a 1080P display uses 1080 display lines, with 1920 pixels per line. Each pixel is defined by an 8-bit to 24-bit data word that defines the color and brightness (the more bits in the data word, the more colors that can be defined). A 1080P display using 24-bit color is typical – in this case, 1080x1920x60x3 (or 372 million) displayable bytes are moved to the display every second.

Most modern displays use “full persistence” pixels, meaning pixels are continuously driven until overwritten by the next pixel. A full-persistence display allows pixels to use a constant illumination (which is simpler and more efficient), but it can also increase motion blur in video images.

Some display technologies, like plasma and CRTs, only illuminate pixels for a very brief period (less than 1ms per pixel). These low-persistence displays do not suffer from motion blur, but they must drive pixels much more brightly to saturate human visual apparatus quickly so that the display appears continuous and flicker-free. Typically, the controller signals used for full persistence and low persistence displays are identical (only the display electronics are different).

Most displays can accommodate different resolutions. The display controller establishes the displayed resolution by controlling the frequency and timing of the system/pixel clock and two synchronization signals called “vertical sync” and “horizontal sync”. The controller must also send pixel data to the display at the precise time it is needed. The Vertical Sync (VS) signal marks the beginning of a new data frame. The VS frequency is typically greater than 60Hz, so that pixels are refreshed often enough to create the appearance of fluid motion. Frequencies up to about 240Hz are used by some displays to minimize motion blur; frame frequencies above 240Hz require very high speed signals that are challenging to produce and control.

The Horizontal Sync (HS) marks the beginning of a new line. Different displays from different eras have used various numbers of lines. Older, lower-resolution computer displays used 480 lines, older TV’s used 520 lines, more recent “standard” displays use 1080 lines, 4K displays use 2160 lines, and 8K displays use 4320 lines. Every line needs to be displayed in every frame, so the HS frequency is the product of the frame frequency and the total number of lines.

The pixel clock marks when each pixel is sampled by the display, and provides the time base for creating the HS and VS signals. Typically, the HS signal is generated by counting some number of pixel clocks, and the VS signal is generated by counting some number of HS signal transitions.

Most display controllers generate more pixel clocks per line than there are pixels per line, and more lines per frame than are displayed in a frame. These “extra” clocks and lines provide time to move non-displayed, control-oriented data to the display device. Originally, when CRTs were used as displays, this extra time was needed to reset the cathode ray to the start of a line or frame. The time at the end of a display line or frame, when the electron beam direction was reversed, was called the “back porch”, and the time just before a new line or frame was displayed, after the electron beam had restarted but before it had locked into a stable, linear sweep was called the “front porch”. Modern digital displays don’t use moving electron beams and so don’t require these “retrace” times, but these non-display times have been carried forward nonetheless. The figure below shows timings for older VGA displays and 1080P displays. A VGA CRT displays 307,300 pixels out of 420,000 pixel clocks per frame (73% of the controller’s bandwidth is used to transmit and display data), and a 1080P LCD displays 2,073,600 pixels out of 2,475,000 pixel clocks per frame (84% of the controllers bandwidth).

Figure 1. Display Control Signals and Timings
Figure 1. Display Control Signals and Timings

Over the years, different displays with different resolutions have been marketed; in general, the higher the resolution, the higher the cost. Circuits to drive displays, like those found in computers, or set-top boxes, or commercial information displays, evolved to accommodate different display resolutions. Dating back to the 1970’s, various standards were created so that display manufacturers and driver manufacturers could develop products independently. The Video Electronics Standards Association (VESA) began publishing standards in 1989; the High Definition Multimedia Interface (HDMI) standards were published in 2002; the Digital Visual Interface (DVI) working group was prominent for a few years in the early 2000’s, and Display Port gain prominence in 2012. The original VESA standard dealt with the 15-pin VGA “analog” connector and standards that dominated early video interfaces, before giving way the digital HDMI and Display Port standards. All of these standards deal with PC-connected equipment, but not smaller, embedded displays as might be found in ATMs, POS terminals, or other electronic equipment (those embedded displays typically have simple bus interfaces into their own internal memory). All standards set clock and sync signal timings, define signal voltages and rise/fall times, and relative signal timings, and the digital standards also define data protocols. As new technologies are invented, new standards will be created so those technologies can enter the marketplace.

In this project, we will create a VGA controller. VGA controllers are becoming outdated, but they are relatively simple and straight-forward to design. And since all the newer, more advanced display interfaces have their roots in the original VGA standard, creating a VGA controller is a good first step towards creating more advanced controllers. VGA displays and VGA connectors are becoming rare, so the Blackboard includes an HDMI port. The HDMI port can be driven from a VGA controller circuit using a “VGA to HDMI” IP block provided by Real Digital – one of the background topic documents provides details. A VGA controller design uses a 25MHz pixel clock to drive counters/comparators that produce horizontal and vertical sync signals, and to produce other timing signals that can be used to gate (or produce) video data. Again, one of the background topic documents provides details.

Debugging Tools

As designs get larger and more complex, the frequent use of simulation and visualization tools becomes more imperative. For larger designs, it is simply not possible to keep all relevant behaviors and interactions actively in your thoughts – you need a visualization tool to help.

Prior to implementing a design, you can simulate its behavior by writing a testbench and running the simulator. From your earlier experience, you are familiar with writing Verilog testbenches, and using the simulator to visualize and validate the behavior of your code.

After a design has been implemented, you can use a logic analyzer instrument to measure, record and visualize the time course of signals in the design. A logic analyzer attaches a probe to each signal to be measured, and it uses a high sample rate to record the signals behavior over time (logic analyzers can only measure/record whether a signal is at a logic 1 or logic 0, unlike an oscilloscope that uses an ADC to measure signal amplitude over time). Because logic analyzers use a high sample rate, they consume memory quickly, and so sample buffers are limited to perhaps hundreds (or thousands) of signal transitions (typically, the more signals recorded and the faster the rate, the shorter amount of time signals can be recorded). Xilinx provides an integrated logic analyzer tool (ILA) as an IP block that you can include in your design. The ILA can be used to sample any signal in your design, and the recorded signals can be viewed in the Xilinx tools. A background topic document provides details.

The ILA is a debugging tool to assist in tracking down problems encountered at runtime. It uses several resources in the FPGA, including block RAMs and CLBs. If the needed resources are already claimed by the design, then the ILA cannot be used. Because the ILA consumes FPGA resources, it may not be feasible to implement the ILA along with a large design. The ILA also increases the burden on the synthesis and implementation tools, and increases the time it takes run these steps. For these reasons, it is always best to try to verify designs using the simulator before turning to the ILA.

In this project, you will also be asked to use preexisting “IP” (IP stands for Intellectual Property, which has become the accepted terminology for predesigned hardware or software design modules that can be incorporated into new designs). Many IP sources are available, including designs produced by third parties (often these must be purchased), designs produced by Xilinx (some of these are free), and of course your own designs. In this project, you will use IP from Real Digital to convert VGA signals to HDMI signals that can drive Blackboard’s HDMI port. In Vivado, IP blocks can be instantiated inside a Verilog source, or at a higher-level block-diagram design view. In this project, you will instantiate the HDMI IP block as a module in your Verilog source file. Later projects will introduce the block diagram method of instantiating IP.

Two more Vivado/Verilog features will also be introduced in this project: using parameters to create modules that can be reconfigured at synthesis time, without needing to modify the Verilog code; and using Xilinx’s “clock wizard” to more easily define system clocks.

Requirements

1. Create a Parameterized Counter

Parameters are data values included in a Verilog description that give direction to the synthesizer – they do not define or represent signals within a design. The use of parameters allows for more generic modules that are easier to customize and reuse later. For example, a parameterized shift-register module could be defined where the number of included flip-flops is defined as a “WIDTH” parameter. The register could be customized with 8 cascaded flip-flops by setting WIDTH = 8, and the exact same code could later be used to define a 64 flip-flop register by setting WIDTH = 64.

Design a parameterized binary counter module that counts from zero to a value given as a parameter, and then resets to zero. Include a count enable input cen that enables counting only when asserted. In the example module definition below, a second parameter WIDTH is defined because the designer may want the counter to include more bits than are needed to count to MAX_COUNT. When instantiating the module you must set WIDTH to at least the integer ceiling of log2(max_count+1).

module bin_count #(parameter MAX_COUNT = 255, WIDTH = 8)
(
	input rst, clk, cen,
	output [WIDTH-1:0] val
);
	//module description here
endmodule

Instantiate two counters as shown in the wrapper module:

module count_wrap
(
	input rst, clk,
	input [9:0] a_val,b_val,
	input en,
	output A,B	
);

	wire clkb;
	wire a_en,b_en;

	//define 10 bit wide counter which overflows at 823
	//called cntrA
	bin_count #(
		.MAX_COUNT(823),
		.WIDTH(10) 
	)
	cntrA(
		.rst(rst),
		.clk(clk),
		.cen(a_en),
		.val(a_val)
	);

	//10-wide counter up to 600
	bin_count #(
		.MAX_COUNT(600),
		.WIDTH(10)
	)
	cntrB(
		.rst(rst),
		.clk(clk),
		.cen(b_en),
		.val(b_val)
	);


	assign a_en = 1'b1; //counter A always enabled
	
	//counter B increments when cntA at max count
	assign b_en = (a_val==823);
	
	//start output on when counter reset/overflow
	//turn outputs off when respective counters are at half value
	assign A = (a_val<=412);
	assign B = (b_val<=300);

endmodule

Write a testbench to verify that your parameterized counters behave as desired. You will need to simulate the clock and reset inputs to the wrapper module. When each counter reaches its final count value, the respective A and B outputs should be asserted. Counter B should only count up when counter A ‘overflows’ (goes from its max value to zero).

2. Use the clocking wizard IP

Modify the design in part 1 to use an IP module produced by the “clock wizard” to supply the two counters with a 7 MHz clock (from an input 100 MHz clock). Simulate your updated design and verify that the counters now count at the correct rate. For the timescale of the simulation to match up, make sure your simulated clock has a 100MHz frequency (10 ns period, 5ns per half-cycle).

3. Use the Integrated Logic Analyzer

Modify your design so the a_val and b_val wires are internal to the module (only the A and B signals are used as module outputs). Implement the design on your board, assigning output A to LED0 and B to LED1. Assign a single switch input to enable/disable both LEDs.

When you program your board, you should see LED0 stuck on while LED1 is toggling rapidly. Does it work as expected? Can the behavior be verified as correct from visual inspection? LED0 is actually switching on and off too fast to see, but its behavior can be verified using the ILA. Set up the ILA to probe nets a_val, b_val, A, and B, and define triggers to verify the circuit is operating properly when the LEDs are both enabled and disabled.

4. Create A VGA Sync Generator

Use your parameterized counter module to create a VGA-sync generator. The circuit should use a 25MHz pixel clock input to generate horizontal and vertical sync signals for a resolution of 640x480 (refer to the background material for details on VGA timing). The module should have two inputs (clock and reset) and three outputs (hsync, vsync, and video_active). Video active should be asserted whenever the counters are in the active display area range. You can add optional outputs to indicate the x and y coordinates of the current pixel (this will be helpful in later requirements).

Connect the VGA timing signals from the sync generator to the Real Digital HDMI IP (this will convert the VGA sync signals into the signals that HDMI and DVI monitors use). The HDMI IP requires the VGA pixel clock as well as a “5x” pixel clock. You can generate the pixel clock and 5x clock using the clock wizard IP.

Drive the red ®, green (G), and blue (B) inputs of the HDMI IP with all 1’s to show a blank white screen on your display.

5. Display a Crosshair

In the 640x480 screen area, draw one vertical and one horizontal line that meet in the center of the screen to form a cross-hair pattern.

You will need to use the color inputs to the HDMI IP to do this. Each pixel is driven according to the values on the R, G and B inputs at the time the pixel is addressed by the horizontal and vertical counters, so you need to assert R, G, and/or B at just the right times to cause the crosshair pattern to be displayed. Note that if you are using a white background (that is, R, G, and B are always asserted), you will need to de-assert at least one of them to create a different color for the crosshair (for this requirement, your crosshair only needs to be a different color than the background).

6. Display a box on the screen

Render a static square on the screen with a dimension of 32 x 32 pixels. Set the location of the box through use of module parameters.

Challenges

1. Make the Crosshair’s lines move

Make the vertical line of the crosshair slowly move left and right between the borders of the screen. You can also make the horizontal line move.

2. Make the box move

Make the box travel around the screen both vertically and horizontally. At reset have the box start at the location given as module parameters.

3. Control the box or crosshair

Use the pushbuttons on the Blackboard to control the movement of the box. You will need to limit the rate the object moves for the changes in position to be visible.

4. Debounce the Button Inputs

Use a state machine to debounce the button press. The background information contains a document that can help you with this.