Learn Time Synchronization Across Switched Ethernets
Now and then you come across measurement problems that are tightly associated with the notion of synchronicity, meaning that things need to happen simultaneously. The usual things that need such synchronicity are data sampling and motion control. In the case of data sampling, you need to know the value of two different quantities measured at the same time (within a narrow tolerance). If the measurement sources are close together, this is fairly easy to accomplish, but if they are far apart and connected to different measurement nodes, it suddenly gets harder. The usual choices are:
- Use a special hardware signal on a separate cable between the controller and all nodes that need synchronization. If the nodes are far apart and the tolerances are tight, make sure that all cables that carry the synchronization signal have the same length.
- Add a local clock to each node and use the present automation network to keep them in synchronization. Tell each node how often the measurement sources should be sampled and require the node to timestamp each measurement.
We shall now take a look at hard synchronization requirements (maximum deviation 1ms) and discuss the possibility of implementing synchronization in a multi-traffic switched Ethernet environment. Common for both solutions is that they adhere to the same standardized time protocol. Such a step would significantly reduce the cabling and transceiver cost since costly dedicated (separate) links are used for this purpose today.
Table of Contents
The Concept of Time Stamping
Let us start at the very beginning – the concept of time stamping:
Timestamping is the association of a data set with a time value. In this context, “time” may also include “date”.
Why would anybody want to time stamp anything? The closest example may be on your PC – whenever you create a document and save it, the document is automatically assigned a data-and-time value. This value enables you to look for
- Documents created on a certain date (for example last Monday)
- Documents created within a certain period (for example last half of 1998)
- The order in which a set of documents were created (for example the e-mails in your inbox).
If we just look at the examples above, we see that the accuracy we need for the time-stamping is about the same as we expect from our trusty old wristwatch. This again means “within a couple of minutes”, but as long as the clock does not stop it does not matter much how precise it is.
Let Us Synchronize Our Watches!
Now we know about time stamping on our PC. The next step is to connect the PC to a network, maybe even to the Internet, and start exchanging documents and e-mails. What happens if the clock in your PC (the clock that is used for time stamping) is wrong by a significant amount?
- If you have an e-mail correspondence with someone, a reply (which is time-stamped at the other end) might appear to be written before the question (which is time stamped at your end)
- If you collaborate on some documents, getting the latest version might be problematic.
Therefore, when several PCs are connected in any sort of network, the PC clocks are still accurate enough, but a new requirement is that they should be synchronized (show the same time at one point in time). Now, we could go around to each PC, look at our wristwatch, and set the PC clock to agree with it. The trouble is that this is a boring and time-consuming job and we should look for a better solution.
One solution is to elect one PC to be the “time reference”, which means that every other PC should get the current time from it at least once a day and set its clock to agree with that time. This solution works satisfactorily on a local area network (LAN), but all PC clocks will lag the time reference by the time it takes a clock value to travel from the time reference to the synchronizing PC. Except for very unusual cases, this lag is less than one second and thus good enough for office purposes.
Enter the Internet. Suddenly the synchronization problem escalates since two collaborating PCs may be located in different time zones (remember to compensate for that) and a synchronization message may take a long time from one PC to the other. Fortunately, the Internet Network Time Protocol has a solution to both problems. This protocol involves sending a time-stamped time request message to a “timeserver”. This timeserver adds an arrival time stamp and a retransmit time stamp before returning the request message to the requesting PC. The requesting PC time stamps the message when it returns and uses all the timestamps in calculating the correct time. This protocol and its little brother, the Simple Network Time Protocol, can synchronize computers across the Internet with precision in the low milliseconds.
Stating the Problem – why Network Synchronization is Difficult
The delays from the time stamping of a time synchronization message in the message source node until it is time-stamped in the message destination node are:
- Message preparation delay
- Communication stack traversal delay (transmission)
- Network access delay
- Network traversal delay
- Communication stack traversal delay (reception)
- Message handling delay
Variations in the delays are due to:
- Real-Time Operating System (RTOS) scheduling unpredictability
- Network access unpredictability
- Network transversal time variations
Time stamping at the lowest stack level helps eliminate the stack delay variations and real-time OS scheduling unpredictability but introduces some complications in the implementation.
An NTP Time Protocol Implementation
The NTP/SNTP algorithm is based on a time client asking for a time server for the current time. To do so, the client creates an NTP network packet and inserts its current time in it. The time server logs the time the packet arrives, processes it as fast as possible, and transmits the packet back to the time client, adding a timestamp just before the packet is transmitted. What we now have is a network packet containing three-time stamps:
- t1: The (client) time the packet was generated in the client asking for the current time
- t2: The (server) time the packet arrived at the timeserver.
- t3: The (server) time the packet was updated and put into the transmission queue at the server.
In addition, the calculations require:
- t4: The (client) time the packet arrived back at the client.
From these four timestamps, we can calculate the best estimate for the difference between the Time Server clock and the time client clock: Δt=(t2+t3)/2 – (t1+t4)/2. We can also calculate an estimate for the time the message uses between the client and the server: τ=(t4 – t3 + t2 – t1)/2.
Now, t2 and t4 can easily be determined down to microseconds (and perhaps even better) using hardware or software timestamps based on packet arrival interrupts. The other two have definite problems, however.
For full accuracy, t1 and t3 should have been the time when the packet left the time client or timeserver respectively. The problem is that this timestamp is not available until the packet has really left the time server or client and then it is, of course, too late to incorporate it into the packet. Therefore the time synchronization inaccuracy for an NTP/SNTP setup is the variation in the delay between t1 and the time the packet leaves the time client plus is the variation in the delay between t3 and the time the packet leaves the timeserver.
Time client implementation issues
There are several ways of time stamping a network packet. We shall look at three of them and show that only the two first ones are suitable for accurate time synchronization:
- Hardware time stamping in the Ethernet controller.
- Software time stamping in an Interrupt Service Routine (ISR) outside the Real-Time Operating System (RTOS). This ISR should be connected to the Ethernet Interrupt Request signal and have a top hardware priority.
- Software time stamping in an Interrupt Service Routine (ISR) controlled by the RTOS (usually inside the Ethernet driver). This ISR is connected to the Ethernet Interrupt Request signal with a normal hardware priority.
Using any of these low-level time-stamping methods is considered an implementation issue and will not cause any incompatibility between a low-level time-stamping client and a standard high-level time stamping server. In addition to low-level time stamping the time client must consider the following aspects:
- The interval between time updates.
- The specifications of the local time-of-day clock concerning resolution, accuracy/stability, and the availability of drift and offset correction mechanisms.
- The usage of adaptive filtering and time stamp validation methods to remove network delay variations.
A New Time Synchronization Standard – IEEE 1588
In NTP the time client is the active part – it has to decide whether or not to synchronize and send out a synchronization request whenever it deems necessary. In the measurement and automation world, the clients are usually simple devices and are not relied upon to handle such tasks by themselves. The new time synchronization standard IEEE 1588 moves the responsibility for the synchronization to the time master, which is more in line with the traditional automation approach.
How IEEE 1588 Works
When an IEEE 1588 time server wants to synchronize the client clocks on a network, it broadcasts (or multicasts) the current time on the network for every client to pick up. Now each client will have the current time with an error dependent on the time it takes for the time message to pass from the time master across the network to the slave. A more advanced time server will create a timestamp at the actual time the time message has been successfully sent onto the network and then send out a “follow-up” message containing the time stamp in the original message and the new, corrected time stamp. This allows the clients to correct for the time server protocol stack delay and network access delay. It is also possible to ask the time clients to send their current time back to the time server, which allows the time server to calculate the protocol stack delay in the client and inform the client about it.
Some Properties of IEEE Ethernet1588
- The “time server” and “time client” roles in an IEEE 1588 network are not fixed but are a result of a continuous evaluation of the time messages.
- An IEEE 1588 device can be physically connected to more than one network. Such a device is designated a “boundary clock”.
- A boundary clock may have different roles on the different network interfaces. A common combination is to be a slave on one network and a master on the other networks.
- Since most routers are unable to handle multicasts, a boundary clock is the standard way for IEEE 1588 time information to pass subnet borders.
Comparing NTP and IEEE1588
- Both NTP and IEEE 1588 attain maximum accuracy on hub-based networks. Both standards have problems with the jitter inherent in switch-based networks. Both standards must implement corrections for this since 100Mbit/s Ethernet is the last version of Ethernet that will be able to use a hub.
- In a switch-based network, NTP measures the network delay in both directions every time, IEEE 1588 does it only when the time server deems it necessary. Thus, NTP has more measurements on which to base corrections.
- The clock synchronization accuracy will be appreciably better for an IEEE 1588 network where “follow-up” messages have been implemented. Without “follow-up” messages, the accuracy will be about the same for both protocols.
- IEEE 1588 mentions NTP as one of the possible standards for external clock synchronization.
- IEEE 1588 and NTP represent time in different ways and use different time origins.
- NTP contains chapters on local clock implementation and local clock adjustment. IEEE 1588 ignores these topics.
Ethernet infrastructure implementation issues
In a local measurement or automation network, only one switch should preferably be allowed between a time client and a time server. Having multiple switch levels will impose increased jitter[1] through the infrastructure, which again might call for more complex filtering on the time client-side. The Ethernet switch must also have good switch latency characteristics.
What kind of accuracy do we need?
As usual, the answer is: It depends. What do you need the measurements for and how fast does the measured signal change? If you want to measure temperature and flow rate in a chemical factory, a precision of 10ms will usually be more than adequate. If you want to measure phase information on two 800kV high voltage transmission lines, your measurement stations will usually be far apart (several 100m at least) and the measured voltage will change very rapidly (at 60Hz an 800kV line will change 426V/μs when it passes through 0V).
As our ambitions grow, our measurements need to be more precise. Precision is not only a question of the value of a measurement but also of precisely when the measurement is taken. This is especially important when we want to measure different physical entities and correlate the measurements afterward.
[1] Jitter: variations in the delay
Read my next article https://www.physicsforums.com/insights/administering-tcpip-automation-measurement-networks/
Master’s in Mathematics, Norway. Interested in Network-based time synchronisation.
Precise synchronization in a local area was an issue before networked digital computers. Take a typical RADAR installation circa 1970's. The transmitter and receiver share wave-guide, antenna, and feed-horn. For detection purposes the transmitter — a magnetron or klystron — frequency drifts within operating limits synchronized using a stable local oscillator (STALO). Add a tuned maser a/o parametric amplifier to the receiver to improve signal-to-noise-ratio (SNR).
What if the RADAR transmission was also used as an RF beacon expected at a precise frequency? We were able to compensate for the frequency drift of the transmitter by correcting the STALO referencing the parametric amplifier. I remember being able to improve lobe energy and reduce frequency jitter in the transmitter by tweaking the cavity of the parametric amp in the receiver. Crude methods for modern times but effective. (see "RADAR mile" in Wiki).
Later as a computer scientist helping with data collection at NASA Ames wind tunnels I remember using "official" time signals from the Naval Observatory compared to internal clocks on a VAX computer to improve the average time-stamps associated with data collected on PDP front ends. As previously mentioned the time-stamps themselves are collected along with the actual data of interest along with the NTP output. Still crude compared to particle physics but amenable to analysis.
When I wrote a 'general procedure' at work that swept up our various timing methods' calibration, precision, accuracy and gotchas, I included a caution about the unknown network latency of PCs. Though this was not noticed by my reviewer, it caused consternation at the next audit.
Given we were only timing stuff to plus or minus a second in half an hour, considering such latency was total overkill.
But, better to have it out in the open…
;-)
The idea of "what time is it" can get murky. In the NTP documentation is is made clear "No clock is right" but some are less wrong. Much of the NTP protocol uses stats and long term checks to gets progressively better agreement over what time it is.
World wide time overall theoretically based on some standard clock. (I think the Washington Naval Observatory is the usual one accepted, with Fort Collins Colorado as secondary) Fort Collins has most of the transmitters for broadcast time.
Even though the earth is a non-inertial frame, the distance between two points isn't changing. (You, the geologist in the back corner: Stop muttering about continental drift…) So allowing for for this it's theoretically possible to synchronize clocks to some arbitrary tolerance.
The GPS system has an atomic clock in each satellite. Getting a fix requires 4 satellites. 3 for space and one for time in a set of I'm sure are rather messy simultaneous equations in spherical geometry. Anyway, any GPS receiver should be able to determine time to a microsecond.
Your cell phone tower has to do a bunch of time share multiplexing. I recall they use timing off of the GPS system to keep synchronized.
One thing to add when you get to thinking about how far light can travel in a short period of time is that it travels about the length of a 12" plastic ruler in 1 ns. The speed in cables and fiber optic cables is lower than this of course.
Since light takes 130mS to travel around the Earth and time signals in Fibre Optic Cables rather longer you need to think at where the correct time is actually defined. The onine information does not seem to be clear on this – is the correct time defined as a shell around the surface of the Earth where the time changes at the same time even tough they cannot communicate this due to the speed of light ? If this is true it would be possible to send a message through the Earth by a hypothetical (Neutrino) communiication system and have it arrive before it was sent at least according to the time stamps.
Astronomers have a problem with timing the arrival of signals at Earth – it takes 16 minutes for light to cross Earth's orbit so signals would appear 16 minutes earlier in Summer than in Winter ! They solve this by refering the timing to a point near the Sun called the Solar System Barycentre which results in a timescale called TBC. Furthur confusion exists as clocks on the Earth run slow due to the gravitational field.
Interesting Insight!
233 picoseconds is not sufficient for the fastest detectors any more.
[url=http://arxiv.org/abs/0901.2530]Design of a 10 picosecond Time of Flight Detector using Avalanche Photodiodes[/url] (2009)
[url=http://psec.uchicago.edu/]Large-Area Picosecond Photo-Detectors Project[/url] ([url=https://www.sciencedaily.com/releases/2013/08/130806132621.htm]report[/url]), aiming for 1 ps to 30 ps, depending on the application (30 ps PET gives 1cm spatial resolution)
Oh, and 5 GHz CPUs have a cycle time of 200 picoseconds.
On the other hand, there is no reason why the year of arrival for those detectors should be relevant and stored in the same way as the precision timing information. Global networks like neutrino detectors would profit from that, but those don’t reach such a timing resolution (yet?).
“You are correct for 32 bit UNIX OS about the end of UNIX time.”I remember doing a calculation: 32bit + 32bit – why divide it exact in the middle? If you divide it 24bit (part of second) + 40bit (seconds) you get a span of 34842 years with a resolution of 59.6ns. The span is certainly long enough, but the resolution (which seemed OK around 1999) is possibly too coarse. Splitting the difference: 28bit + 36 bit gives a span of 2177 years with a resolution of 3.7ns…
[quote]approximately 292 billion years from now, at 15:30:08 on Sunday, 4 December 292,277,026,596[/quote]I wonder how they made a prediction for the future introduction of leap seconds for 292 billion years into the future.
[USER=455902]@anorlunda[/USER] – part of locale settings, available for almost any extant computer, is the timezone setting. This takes handles UTC -> (local time & date) and also the other way: (local time & date) -> UTC. There is also timezone database (was Olson now housed at IANA). It handles all of the politics involved in daylight time transitions and when and if the time changes. Example – The Knesset (Isreali parliament) determines each year when daylight time starts. Th IANA database keeps track of this kind of thing on an ongoing basis.
[USER=538805]@Svein[/USER] You are correct for 32 bit UNIX OS about the end of UNIX time. 64 bit does not have that problem (from Wikipedia):
[quote]Most operating systems designed to run on 64-bit hardware already use signed 64-bit time_t integers. Using a signed 64-bit value introduces a new wraparound date that is over twenty times greater than the estimated age of the universe: approximately 292 billion years from now, at 15:30:08 on Sunday, 4 December 292,277,026,596[/quote]
FWIW – if you remember Y2K there was concern about 16 bit computing and the century rollover.
Save it in UTC and (if necessary) in local time?
I think most databases do that. Certainly nearly all web pages.
You’ll get weird patterns for two nights (no entries in 1 hour, twice the normal entries in the same hour on a different day), but apart from that everything works nicely.
“That is why you should time stamp data using UTC.”
True but it is not always so easy. Many activities depend on human behavior, not physics. Things like rush hour traffic patterns, or peaks and valleys of electric power usage. Imagine the chaos if your doctors office made future appointments on UTC.
In my business, we sold things and kept records by the hour and day. Because of daylight savings time, we had to make the software work with 23, 24, and 25 hour days. It gets harder if your business span time-zone and daylight savings borders, it can become challenging. Many times I wished that time keeping was as simple as UTC.
“Because of a November “daily saving time” change, time went back one hour.
This complicates data analysis, since the same clock time can now have 2 different temperature associated with it.”That is why you should time stamp data using UTC.
Some anecdotes regarding time (these are copied from [URL]https://en.wikipedia.org/wiki/Year_2038_problem[/URL], go there for other examples):
[LIST]
[*]The latest time that can be represented in Unix’s [URL=’https://en.wikipedia.org/wiki/Integer_%28computer_science%29′]signed 32-bit integer[/URL] time format is 03:14:07 [URL=’https://en.wikipedia.org/wiki/Coordinated_Universal_Time’]UTC[/URL] on Tuesday, 19 January 2038 (2,147,483,647 seconds after 1 January 1970).[URL=’https://en.wikipedia.org/wiki/Year_2038_problem#cite_note-spinellis-2′][2][/URL] Times beyond that will “wrap around” and be stored internally as a negative number, which these systems will interpret as having occurred on 13 December 1901 rather than 19 January 2038.
[*]The [URL=’https://en.wikipedia.org/wiki/Network_Time_Protocol’]Network Time Protocol[/URL] has a related overflow issue, which manifests itself in 2036, rather than 2038. The 64-bit timestamps used by NTP consist of a 32-bit part for seconds and a 32-bit part for fractional second, giving NTP a time scale that [URL=’https://en.wikipedia.org/wiki/Integer_overflow’]rolls over[/URL] every 2[SUP]32[/SUP] seconds (136 years) and a theoretical resolution of 2[SUP]−32[/SUP] seconds (233 picoseconds). NTP uses an epoch of 1 January 1900. The first rollover occurs in 2036, prior to the UNIX year 2038 problem.
[/LIST]
And, of course, the infamous Y2K bug, stemming from the habit of specifying the year using only two digits in early FORTRAN and COBOL programs.
Reminds me of a “Temperature vs Time” graph from a weather monitoring station.
I was puzzled in the morning to see the previous night graph, which somewhat looked like the image attached.
[IMG]http://i65.tinypic.com/11inz45.png[/IMG]
Because of a November “daily saving time” change, time went back one hour.
This complicates data analysis, since the same clock time can now have 2 different temperature associated with it.
Good article. It made fun reading. We didn’t have to worry about these issues in the days before we started networking computers.
You mention AC phase-angle measurement. Such phase sensors are now installed continent-wide. It is my understanding that they use GPS to generate time stamps at each sensor.
Your article makes me wonder about a number of similar things outside the scope of your article. For example, the time synchronization requirements and methods of things like very-long-baseline interferometry, or the various devices NASA has deployed all over the solar system.
Aren’t there any USB GPS receivers that you can use to put accurate time tags on widely separated computers?
Nice post.
Millisecond synchronization? Oh, if we had milliseconds in particle physics… OPERA was off by just 60 nanoseconds, and measurements are often done with a precision of a nanosecond.
For nanosecond synchronization, cable lengths are critical, for high resolution PET scanners even an additional centimeter of cable length will reduce the measurement quality if not accounted for properly.
Interesting Insight!