Experiment 11 – Interrupts for I/O

Control the serial device with interrupts to free the processor.

Files

x11-interrupts:
`primes2.c`	Main program
`lib.c, lib.h`	Library with implementation of `printf`
`hardware.h`	Header file with layout of I/O registers
`startup.c`	Startup code
`device.ld`	Linker script
`Makefile`	Build script
`x11.geany`	Geany project file

Demonstration

The program primes2.c is functionally identical to the program primes.c from Experiment 10: it lists the first 500 primes on the serial port. Unlike the previous program, this one uses a hardware feature called interrupts to carry on searching for more primes while previous ones are being printed, so avoiding the gaps in the timing of the output that we saw earlier.

Look, no gap!

As in the previous experiment, the program times itself and at the end of the run prints two times:

9542+118milliseconds

The first time shows how long the program took to find the 500 primes; but because the program can store internally a queue of characters waiting to be output, when it has found the 500 primes, the program is not finished. The program must continue to run until all the characters have been output, and the time to do so is shown as the second number.

Cheating at the end

The LED is lit only during the first part of the program, so a logic analyser trace showing both the LED signal and the activity on the serial port reveals that characters continue to be output after the LED has been switched off.

micro:bit version 2

The identical program also works on the V2 board, but with slightly different timing

Activity

Use the logic analyser to examine the timing of output signals from the program, and verify that there are no gaps, and that the output continues after the LED has gone out.
The program contains a buffer array called txbuf (for transmit buffer) whose size, a constant NBUF, should be a power of 2. Try decreasing the constant from 64 to 32 or 16 or 8 and use the logic analyser to see if gaps begin to appear in the output. Does the program continue to work correctly if NBUF is decreased to 2 or even 1?

Background

The primes program in the previous experiment produced delays in its output whenever the gap between one prime and the next became significant. It also does not make good use of the micro:bit's processor, because once it has found a prime and is printing it, the processor spends almost all of its time in a tiny loop

while (! UART.TXDRDY) { }

that does nothing useful. The program would work better if it could use this time to get ahead with the job of finding primes to print in the future, hoping to avoid future gaps in the output. This experiment introduces a scheme for organising the program so that such an overlap between computing and input/output is possible.

There are two components to the scheme, one software-based and the other hardware-based. The software component is to introduce a buffer array between the prime-finding and the output-producing parts of the program, a queue of characters waiting to be transmitted by the UART. When it has found a prime, the main part of the program uses printf to format a line of output, and a call to printf turns into a series of calls to a subroutine serial_putc. In the old program, it's serial_putc that contains the wasteful loop shown above; in the new program, serial_putc will add a character to the buffer array, to be sent to the UART later. Complementary to this is a part of the program that checks whether the UART is idle: if so, it can get a character out of the buffer and put into the device register UART.TXD to start transmitting it. The hardware component of the scheme is a mechanism that will invoke this part of the program – the interrupt handler – whenever the UART has finished sending a character.

The existence of the buffer cannot deal in the long term with a mismatch in rates between the process of finding primes and the process of printing them: we know that eventually primes will become rare enough, and the division sums slow enough, that there will be inevitable gaps between printing one prime and printing the next. But the buffer can do a good job of dealing with variations in the rate of finding primes. If the main program finds several primes in quick succession, they can build up in the buffer and cover a later scarcity of primes without gaps appearing in the output.

Several details of this scheme remain to be filled in:

How to organise the buffer as a queue, so that it is simple and quick to add characters at the back of the queue as they are produced and to remove them from the front of the queue when the UART is ready to print them. We will use a structure called a circular buffer for this.
How to arrange that characters are removed from the queue and sent to the UART at the right time. For this, we will use a hardware mechanism called interrupts.
What should happen when the buffer becomes empty (because the prime-finding process hasn't found any more primes yet) or full (because so many primes have been found that the printing process needs to catch up).

Circular buffer

To store the characters waiting to be printed, we will use an array txbuf, together with several integer variables.

#define NBUF 64

static volatile int bufcnt = 0;
static int bufin = 0;
static int bufout = 0;
static volatile char txbuf[NBUF];

Note the use of the volatile keyword here, for reasons that will shortly be explained.

The variable bufcnt will always contain the number of slots in the array that are occupied, and the characters themselves are stored in txbuf[bufout], txbuf[bufout+1], and so on, up to txbuf[bufin-1].

A circular buffer

The trick is that if bufin < bufout, then the part of the array that is occupied wraps around from txbuf[NBUF-1] to txbuf[0]. This trick means that whenever the buffer is not full, we can put in another character, and when it is not empty, we can remove a character from the front of the queue, both without ever having to shift the other characters around in the array.

To add a character ch at the back of the queue (provided bufcnt < NBUF), we can use the function buf_put(ch).

void buf_put(char ch)
{
    assert(bufcnt < NBUF);
    txbuf[bufin] = ch;
    bufcnt++;
    bufin = (bufin+1)%NBUF;
}

The modulo operator % is used in the assignment to bufin so that once the index reaches the end of the array, it wraps around to zero. The C compiler can make this operation highly efficient if we choose NBUF to be a power of two.

To get a character out of the queue and store it in ch (provided bufcnt > 0), we can use a function buf_get().

char buf_get(void)
{
    assert(bufcnt > 0);
    char ch = txbuf[bufout];
    bufcnt--;
    bufout = (bufout+1)%NBUF;
    return ch;
}

Initially, we can make the buffer empty by setting bufin = bufout = bufcnt = 0.

Interrupts

Whenever UART.TXDRDY = 1 and bufcnt > 0, we can start the transmission of another character by retrieving it from the buffer and storing it into the device register UART.TXD. The question that remains is this: how can we arrange for the processor to notice that this condition is satisfied and do what is needed?

One solution would be to write a subroutine poll() that checks this condition, and call it frequently from the main program, perhaps just after each trial division as we test a number for being prime. We must arrange to call poll() frequently enough that delays in the output do not mount up unreasonably. In a simple program like this, that is not too bad a solution, because we can easily identify the time-consuming loops in the program, and arrange that each one contains a call to poll(). But in a more complex program this would become difficult, and it anyway spoils that independence between parts of the program. We could not call a library function, say, to test whether a number was prime without modifying the function to call poll() appropriately in each loop it contained.

A better solution is to make the processor hardware insert a call to a subroutine like poll() whenever the UART becomes idle, and that is in essence what the interrupt mechanism does. Setting up an interrupt involves telling the UART to request an interrupt whenever the event UART.TXDRDY occurs, telling the interrupt controller to interrupt the processor whenever the UART requests an interrupt, and installing a subroutine to be called on each interrupt. We identify this subroutine by giving it the special name uart_handler, then include the following statements as part of the initialisation of the UART.

UART.INTENSET = BIT(UART_INT_TXDRDY);
enable_irq(UART_IRQ);

The interrupt handler plays the role of the subroutine poll() in the preceding discussion. When called, it tests whether the UART has finished and there is a character waiting in the queue, and if so, it starts the UART transmitting again. If there is no character waiting, then it cannot restart the UART, and instead it sets a flag txidle to indicate that the next character to be generated can be sent to the UART immediately. The interrupt handler must also clear the event by setting UART.TXDRDY = 0, or the interrupt mechanism will immediately call the handler again, and the program will grind to a halt.

void uart_handler(void)
{
    if (UART.TXDRDY) {
        UART_TXDRDY = 0;
        if (bufcnt == 0)
            txidle = 1;
        else
            UART.TXD = buf_get();
    }
}

Complementary to this is the following implementation of serial_putc, which puts characters into the buffer, and also deals properly with the cases where the UART is idle or the buffer is full.

void serial_putc(char ch)
{
    while (bufcnt == NBUF) pause();

    intr_disable();
    if (txidle) {
        UART.TXD = ch;
        txidle = 0;
    } else {
        buf_put(ch);
    }
    intr_enable();
}

A number of things are done carefully here.

If the buffer is full (bufcnt == NBUF) then we must revert to the method of waiting in a loop until things are safe to proceed; that is the purpose of the while loop on the first line. The function pause() potentially stops the processor until the next interrupt, saving power and also ensuring that the interrupt generates a response as quickly as possible. If the buffer gets full too often, that may be a sign that characters are on average being generated faster than they can be transmitted, or it may simply be that a larger buffer is needed.
If the UART is idle (txidle) then there is no need to put the character into the buffer: we can send it to the UART immediately. Setting txidle to zero means that any characters that immediately follow will be put in the buffer, and the interrupt handler will find them there when it has finished transmitting this one.
The code that handles the character is surrounded by calls to intr_disable() and intr_enable() that prevent any interrupts from happening between one call and the other. This is necessary so that the interrupt handler doesn't get called while the character is being put in the buffer. The need for this care is explained below.

micro:bit version 2

On the V1, pause() is equivalent to a wfe instruction, and halts the processor until the next interrupt in the way described. On the V2, this instruction puts the processor into a low-power state from which it takes a long time to wake up, so pause() does nothing, returning immediately.

Handling interrupts

From a high-level point of view, an interrupt is like a subroutine call inserted into the program at the point where the interrupt is requested. The interrupt handler uart_handler does pretty much the same job as the subroutine poll() we discussed earlier, but there is no need to insert calls to it into the text of the program, so parts of the program that have nothing to do with driving the UART do not need to be modified, and the effect is as if calls to poll() were inserted just where they are needed. In a way that will be discussed in the next experiment, the processor arranges to preserve the values of all registers when the interrupt handler is called. For example, the statement count = count+1 or count++ in the main program might be translated by the C compiler into a short sequence of machine instructions:

ldr r0, =count
ldr r1, [r0]
adds r1, r1, #1
str r1, [r0]

An interrupt may occur at any point in this sequence, when the old or the new value of count is sitting in r1. But this causes no problems, because the interrupt mechanism arranges to preserve in r0 and r1 and all the other registers that values they had before the interrupt, so when the interrupt has finished, the interrupted calculation can proceed undisturbed.

This hardware mechanism deals well with the situation where the job being done by the interrupt handler is quite separate from the job being done by the interrupted main program: the interrupt handler has nothing to do with the variable count that stores the number of primes that the program has found. Things get more complicated when we think about variables such as txidle that are used in both the interrupt handler and subroutines like serial_putc() that are called by the main program. Because an interrupt can be requested at any time, several steps are needed to make this safe.

Variables such as bufcnt and txidle that are mentioned in both the main program and the interrupt handler are marked volatile. This prevents the compiler from looking at a loop such as

while (bufcnt == NBUF) pause();

and assuming that bufcnt doesn't change in the loop, and so does not need loading from memory in each iteration. On the contrary, bufcnt will eventually be reduced by the interrupt handler, and then the loop should terminate. This won't happen if the loop continues to look at a stale copy of the value in a register.

We know the statement bufcnt++ in buf_put, called from serial_putc, will be implemented by code similar to that shown for count++ earlier. If an interrupt happens in the middle of this code, then this statement will read the value of bufcnt before the interrupt, and set bufcnt to a value one greater after the interrupt has happened. The result of this is that the effect of the statement bufcnt-- in the interrupt handler will be lost, because when the interrupt handler returns, it is an old value of bufcnt that is incremented. The simple solution is to disable interrupts throughout this function by putting intr_disable() at the start and intr_enable() at the end.

Context

Interrupt-based control allows the program to make better use of resources, but there is still the limitation that the program must stop completely when the buffer is full and the program wants to add a character to it. It's possible that, even if the prime-printing task is held up because the buffer is full, there are other tasks that could continue to run, and we might need them to run so as (for example) to continue updating the display. For that, we will need to introduce an operating system that allows multiple processes that run independently, and we will do this in Part 3.

Tasks, events and interrupts

Peripherals on the Nordic chips have various device registers, but some are specially identified in the datasheet as tasks and events. Each task or event is a 32-bit register where only the bottom bit is meaningful. Setting that bit in a task triggers the peripheral to perform some action, and by reading the bit in an event register, the program can discover whether a specific event has happened, such as the completion of an action. Task and event registers behave in a uniform way across peripherals. For example, storing zero into an event register clears the bit; and most events can be configured to cause an interrupt when they occur, by setting the appropriate bit in a register INTEN associated with the peripheral.

It's possible to configure peripherals so that certain events trigger a task when they occur. For connections that are often used, there are specific shortcuts provided as a device register SHORTS for each device. For example, it's common to want to clear the counter associated with a hardware timer whenever the timer reaches a set limit, and there is a shortcut for that which we will use behind the scenes in Experiment 14 and others to provide a regular 'tick' signal. There is also a general mechanism, programmable peripheral interconnect (PPI), that connect events in one peripheral with tasks in another. We will use it in Experiment 19 to configure a timer so that it produces regular pulses of precise width on a GPIO pin to control a servo motor.

Challenges

Find out what buffer size is sufficient for the program to work.
How does the execution time of the program vary as the buffer size changes?
Implement a version of the program that uses a subroutine poll() instead of interrupts. How many calls to poll() must be inserted in the program for it to work perfectly?

Questions

What is the difference between the I/O registers UART.INTEN, UART.INTENSET and UART.INTENCLR?

The UART.INTEN register shows which events in the UART can cause an interrupt: you can write a value to the register to enable or disable individual interrupt sources, and you can read it to find out which sources are actually enabled. The Nordic chips have a scheme – mostly for convenience – that allows individual bits in this register to be set or cleared without having to read the register, modify it, and write back the result. If you write a value to the UART.INTENSET register containing one or more 1 bits, then the corresponding bits in UART.INTEN will be set to 1, and the other bits will remain unchanged. Similarly, writing a pattern of 1 bits to UART.INTENCLR clears the corresponding bits of UART.INTEN and leaves the others unchanged. The general scheme is described in Section 10.1.2 of the nRF51822 Reference Manual.