Control Characters in the Serial Port on Arduino

4 min
Advanced

We continue to delve into the use of the advanced serial port on processors like Arduino. In this post, we will see how to add frame delimiters and control characters to our transmission systems to make them more robust.

Previously we have seen how to send bytes via serial port as a convenient and “professional” way to communicate. In the previous post, we saw that we will frequently use one or more structures defining a message that we want to send or receive.

Now we want to expand the frame (the bytes we send in the communication) by surrounding the data bytes with a series of elements that increase the “quality” of the communication. An example is to add a checksum function to check the integrity of the data, something we will see in the next post.

Another example, which is what we will see in this post, is to add frame delimiters. That is, a certain “signal” or mark that identifies the beginning and end of the communication. While we’re at it, we’d also like to be able to add certain control characters that have a special meaning.

As usual, all of this has already been invented and is called, precisely, control characters. In fact, we are using them frequently since the first communication post every time we use ‘\n’ (line feed) or ‘\r’ (carriage return).

Here is a list of some of the available control characters with their hexadecimal value and their meaning.

Code	Hex	Alt.	Meaning
NUL	0	\0	Null
SOH	1		Start of Heading
STX	2		Start of Text
ETX	3		End of Text
EOT	4		End of Transmission
ENQ	5		Enquiry
ACK	6		Acknowledge
BEL	7	\a	Bell
BS	8	\b	Backspace
HT	9	\t	Horizontal Tabulation
LF	0A	\n	Line Feed
VT	0B	\v	Vertical Tabulation
FF	0C	\f	Form Feed
CR	0D	\r	Carriage Return
SO	0E		Shift Out
SI	0F		Shift In
DLE	10		Data Link Escape
DC1	11		Device Control One (XON)
DC2	12		Device Control Two
DC3	13		Device Control Three (XOFF)
DC4	14		Device Control Four
NAK	15		Negative Acknowledge
SYN	16		Synchronous Idle
ETB	17		End of Transmission Block
CAN	18		Cancel
EM	19		End of medium
SUB	1A		Substitute
ESC	1B		Escape
FS	1C		File Separator
GS	1D		Group Separator
RS	1E		Record Separator
US	1F		Unit Separator
SP	20		Space
DEL	7F		Delete

In particular, the accepted control characters for the start and end of the frame are, respectively, 0x02 (STX) and 0x03 (ETX). Of course, we are not required to use these characters. In fact, sometimes you will see code on the Internet using ‘H’ (Header) as the beginning of a header. There is no rule that prevents using it, but, given that control characters exist, it is logical (and more hygienic) to use the standard.

The operation is simple. When starting to send a frame, we will start by sending the STX character, and at the end, ETX. We are increasing the size of the frame by two bytes, at the expense of better communication quality. The relative increase in frame size is smaller the more data we are sending.

Here is an example of sending an array of data with frame delimiters.

const char STX = '\x002';
const char ETX = '\x003';

const int data[] = {0, 50, 100, 150, 200, 250};
const size_t dataLength = sizeof(data) / sizeof(data[0]);
const int bytesLength = dataLength * sizeof(data[0]);

void setup()
{
  Serial.begin(9600);
  Serial.write(STX);
  Serial.write((byte*)&data, dataLength);
  Serial.write(ETX);
}

void loop() 
{
}

While an example of a receiver would be as follows,

const char STX = '\x002';
const char ETX = '\x003';

const int dataLength = 3;
size_t data[dataLength];
const int bytesLength = dataLength * sizeof(data[0]);

void setup()
{
  Serial.begin(9600);
}

void loop()
{
  if (Serial.available() >= bytesLength)
  {
    if (Serial.read() == STX)
    {
      Serial.readBytes((byte*)&data, bytesLength);

      if (Serial.read() == ETX)
      {
        //performAction();
      }
    }
  }
}

However, control characters are nothing more than bytes. How secure are these delimiters? That is, is it possible that we confuse it with a data byte containing 0x02 or 0x3? Is it possible that, even losing bytes, we misinterpret one data byte as a delimiter?

Indeed, that is the case, no system is completely robust. Adding frame delimiters improves the system, but it does not make it infallible. In fact, we are not even checking the integrity of the data, we are only trying to check if we maintain a certain degree of “synchronization”.

For the delimiters to fail, it must coincide that, after losing several bytes, the received byte in the position where the delimiter should be has the same value. If we are working in an environment with many failures, it will not be enough to filter out all the defects.

It may seem unlikely, but in reality, the possibility of incorrectly interpreting a control code is 1/256. However, the combined probability of simultaneously misinterpreting the start and end of the message is 1/65,536.

However, the real advantage is that it provides a certain capacity for “resynchronization”. In a “normal” environment, in the face of an occasional loss of packets, the system can detect the failure and eventually recover synchronization.

Of course, we can greatly improve the transmission process by adding a timeout, or a checksum. We will see all of this in the next posts.

Download the code

All the code from this post is available for download on Github.