On the origins of serial communications and data encoding
by Nicolas Martin

Telegraphy and Morse Code
The first widely used character code for electronically processing textual data in the American West in the 19th century was Morse code, which was invented for transmitting messages via telegraph lines. The inventor of Morse code, American Samuel Finley Breese Morse (1791-1872), was also a renowned artist who studied painting in London, where he learned of research into electromagnetism by British scientists. It was on a return sea voyage to the U.S. in 1832 that he conceived his telegraph system, which started the movement toward the electronically networked world.

Morse did not invent the first telegraph system to be put into practical use. That honor belongs to two Britons: Sir Charles Wheatstone, a physicist and inventor, and Sir William Fothergill Cooke, an electrical engineer who installed the first railway telegraph system in England in 1837, the same year Samuel Morse invented the first American telegraph. However, their system was not a simple one. In contrast to this, Morse's system was simpler. It used a single wire for transmitting the signals, an “electromagnet” that attracted a small armature when an incoming signal was received. Morse demonstrated his system on May 24, 1844 in the first U.S. telegraph link, which ran between Baltimore, Maryland, and Washington, D.C. He sent the message “What hath God wrought!”

Morse invented the code he used to send his historic message in 1838. Like the binary system used in modern computers, it is based on combinations of two possible values — in the case of Morse code, a dot or a dash. However, unlike the character codes used in modern computers, the combinations of the two values used to represent characters in Morse code vary in length. The principle Morse used was to give the most frequently used letters the shortest possible patterns, which greatly reduced the length of a message. For example, the most frequently occurring English letter, 'E', is represented by just a dot; the second most frequently occurring letter, 'T', by just a dash. Interestingly, Morse found out the frequency of letters in English not by doing a study of texts, but by counting the individual pieces of type in each section of a printer's type box. The result of his labors was a highly efficient code that, with some modifications, is still in use to this day, 160 years after it was invented.

As it was not clear to anyone that the code could be interpreted by people as it was received, an elaborate machine called a “telegraph register” was designed that used the electromagnet to emboss a strip of paper that was pulled past it with a clockwork mechanism. So if the current was flowing, the stylus was marking the paper. When current was absent, there were spaces between the marks.

Morse code has evolved through several versions over the span of time. Beginning with Early Morse Code, it developed into American Morse Code, and then on into International Morse Code, as shown here.

After Morse's inventions were put into practical use, other inventors contributed to the development of telegraph technology by creating hardware such as relays that would allow telegraph signals to travel farther by giving them an electrical power boost. In addition, various schemes were developed to increase the utilization of the telegraph lines. These enabled “diplexing,” or the sending of two messages in the same direction at the same time; “duplexing,” or the sending two signals in opposite directions at the same time; and “quadruplexing,” or the sending of four messages (two in each direction) at the same time. Moreover, the reception of the incoming signals was mechanized through the introduction of tape-reader machines, which allowed traffic to speed up to 400 words per minute by 1900.

Jean-Maurice-Émile Baudot, and teleprinting
 
The next great step in telegraph technology was a primitive printing telegraph, or “teleprinter,” patented by Jean-Maurice-Émile Baudot (1845-1903) in France in 1874. Like Morse's telegraph, it involved the creation of a new character code, the 5-bit Baudot code, which was also the world's first binary character code for processing textual data.

Messages encoded in Baudot's code were printed out on narrow two-channel transmission tapes by operators who created them using a special five-key keypad. In later versions, typewriter keyboards that automatically generated the proper five-unit sequences were employed. Another interesting feature of Baudot's teleprinter system was that it was a “multiplex” system that allowed up to six operators to share a single telegraph line using a time division system. This led to a considerable increase in the transmission capacity of a telegraph line. Baudot's system proved to be fairly successful, and it remained in widespread use in the 20th century until it was displaced by the telephone, and, of course, personal computer communications.

Baudot also left a portion of his name to posterity in the form of the “Baud rate,” which refers to the number of data signaling events that occur in a second.

Being a 5-bit character code, Baudot code has room for handling only 32 elements (2^5 = 32 code points). This is not enough to handle both the letters of the Latin alphabet plus Arabic numerals and punctuation marks, so Baudot code employs a “locking shift scheme” to switch between two planes of 32 elements each (Fig. 3), which can be compared to the shifting and locking into place the upper case letters on a mechanical typewriter. Like the subset of International Morse Code given above, Baudot code has codes for the upper case letters of the Latin alphabet, Arabic numerals, and punctuation marks. However, in addition, it has control codes, which are also a feature of the character codes used in today's personal computers.

The reason Baudot was forced to limit his character code to 5 bits — and hence leave out the lower case Latin letters — was because of hardware constraints. A more complex code — even just a 6-bit code — would have necessitated a much more complex electromechanical device to transmit it, which would have been extremely difficult to fabricate using the technology in Baudot's time. After modifying Baudot's code to 55 elements — thus allowing for three places for national variants — the CCITT (Comité Consultatif International Télégraphique et Téléphonique [Consultative Committee on International Telephone and Telegraph]) in Geneva, Switzerland, standardized it in 1932 as a 5-bit code for teleprinters. It was given the designation “International Telegraphic Alphabet No. 2.”

The code of all zeros or “spaces” is special, since it represents an interrupted or broken circuit. When this happens, it is desirable not to print anything at all, since nothing useful is being communicated.

Normally, when current is flowing, the line is said to be “marking”, and the machine is sitting there with the motor running, waiting for a character to arrive. To prepare the machine to receive a character, a special zero bit is sent, called the start bit. This bit starts the wheels of the machine turning, and the machine counts and records each of the five bits as they arrive on their fixed schedule. When all five bits have arrived, the bits are used to position a print head or piece of type, which strikes the paper through an inked ribbon, marking the character on the page. After the last bit has arrived, the line is kept in the one state (marking) long enough for the decoding mechanism to reset. In Baudot teleprinters, this was 1.42 times as long as the bit time. It became known as the stop bit, although it was required at the end of each character and really did not stop anything.

With the start bit, five data bits, and 1.42 stop bits, the Baudot code took 7.42 bit times to print each character. Although the speed of different  machines varied, a common standard was for a bit time to be .022 second, for a rate (in Baud) of 45.5 bit times per second. At this rate, a teleprinter could print just over six characters every second, so if you are printing five letter words with spaces in between, that is just over 60 words every minute, a very respectable typing speed, and faster than almost all telegraph operators.

After Baudot's code : the ASCII code
 
As a result of the rapid development and spread of communications and data processing technologies in the United States in the first half of the 20th century, it became apparent there was a need for a standard character code for interchanging data that could handle the full character set of an English-language typewriter. The American Standards Association (ASA, which later changed its name to the American National Standards Institute [ANSI]) began studying this problem in the late 1950s, and it eventually decided that a 7-bit code that did not require shifting in the manner of Baudot code would be sufficient. In 1963, ASA announced the American Standard Code for Information Interchange (ASCII), which originally seems to have been named the American National Standard Code for Information Interchange (ANSCII). However, ASCII, as it was announced in 1963, left many positions, such as those for the lower case Latin letters, unallocated. It wasn't until 1968 that the currently used ASCII standard of 32 control characters and 96 printing characters was defined. Moreover, in spite of the fact that ASCII was devised to avoid shifting, it included control characters for shifting, i.e., SHIFT IN (SI) for SHIFT OUT (SO) for Baudot-style locking shift, and ESCAPE (ES) for non-locking shift. These control characters were later used to extend ASCII code into 8-bit codes with 190 printing characters.

RS-232 was originally adopted in 1960 by the Electronic Industries Association (EIA). The standard evolved over the years and in 1969 the third revision (RS-232C) has remained as the standard of choice for computers, and especially in the PC world.


The information provided on this page has been found on various places on the world wide web. To see the complete content of these articles and references, as long as copyright information, please refer to the following pages :