DV Info Net - How is SMPTE timecode recorded in digital video?

DV Info Net (https://www.dvinfo.net/forum/)

- General HD (720 / 1080) Acquisition (https://www.dvinfo.net/forum/general-hd-720-1080-acquisition/)

- - How is SMPTE timecode recorded in digital video? (https://www.dvinfo.net/forum/general-hd-720-1080-acquisition/55965-how-smpte-timecode-recorded-digital-video.html)

How is SMPTE timecode recorded in digital video?

How do current video cameras (HDCAM, HDV, DVCPRO-HD) record TC on tape? In a digital format along with the data stream or as VITC in the video?

Do NLEs read this and sync multiple video streams automatically (when all cameras used corresponding TC on the shoot)?

How about 24p recording using pulldown? Will only actual frames get a timecode address using 23.976 fps TC?

Don't know all the speciifics Roy, but it's all just a data stream. There isn't even a vertical interval as such to put VITC. As far as timing goes, with NTSC, if you use non drop frame code, you are assuming 30 frames, so in an hour, you'll be a few seconds off in your calculation. But, everything works OK for internal calculations. The tough part is mixing TC - just hope your NLE maker does his math right.

As far as multiple cameras, it works pretty much like the old 16mm crystal controlled cameras and the Nagra audio recorder. The error between individual crystal referance in each camera is pretty small, but you could build up errors over a long period of time. When you bring each source into your NLE, they will be synchronized exactly, even though they were not recorded at EXACTLY the same speed.

I'm having some discussions with Ulead now. I don't think they are doing all their math right - causing audio sync errors.

In most HDV cameras timecode is stored within the MPEG2 bit-stream. The AV/C controls can also extract timecode, but there are not as accurate as the embedded data.

Quote:

Originally Posted by David Newman

In most HDV cameras timecode is stored within the MPEG2 bit-stream. The AV/C controls can also extract timecode, but there are not as accurate as the embedded data.

So do NLEs read this TC and is it possible to automatically line up multiple video tracks in your NLE (when all cameras on a multi-camera shoot were synced to the same master, having corresponding location TC)?

How about 24p recording using pulldown? Will only actual frames get a timecode address using 23.976 fps TC?

NLE do read this TC (though the appropriate tools -- like Aspect HD.) Yes it can help to sync multiple sources, although your cameras would need jam-synced timecode (which I think only the Canon currently has.) I not very knownledgable on the editing workflow for jam-syned cameras.

TC is in 23.976.

Interesting. So I assume NLEs create a timeline from the TC on the video.. What if you recorded Time Of Day TC to multiple cameras on the set? TOD TC is discontinuous and would be all over the place. So do you line up dailies on a take to take basis and then copy to a new project or can you have a ‘fractured’ timeline in NLEs? (not an editor myself..)

Outside of my knownledge.

The NLEs produce a continuous time code usually starting at 1:00:00;00 (unless you want it to start somewhere else). They do not use the clips' time codes to produce the sequence time codes but do display each clip's time code so that you can align things using time code. In a multicamera setup with separate sound a master clock (blackburst generator) is used to synchronize the cameras (XL-H1 is the only prosumer camera that accepts external sync but any camera with a composite out can be a synch master) and a master time code generator is used to supply the same time code to each (XLH1 is the only one that produces/accepts external time code though the XL2 does produce it) camera and the audio equipment. In this way video from different cameras and audio clips from different recorders can all be aligned to frame resolution and, if the master clock equipment produces audio sample clock as well, sound is locked to video to sample resolution.

Quote:

Originally Posted by A. J. deLange

Thanks for clearing that up, A. J. deLange! I have some more questions I'd love to have an answer to..

- so the master generator generates LTC + genlock for the cameras and LTC + wordclock for the audio recorder, right?

- when syncing cameras to external TC, they always need genlock as well to keep the TC running correctly, don't they?

- what's the difference between genlock and tri-level sync?

- do cameras ever jamsync themselves or do you always need a device like an Ambient Lockit box when jamsyncing?

Quote:

Originally Posted by A. J. deLange

They do not use the clips' time codes to produce the sequence time codes but do display each clip's time code so that you can align things using time code.

- Are NLEs able to do this automatically or do you always have to do it manually?

- Do NLE's support multiple timelines? E.g. location audio is often recorded at 29.97 fps while cameras run at 23.976 fps (24p shoot). When using a hard disk audio recorder recording to timestamped BWF-files, these files then correspond to a 29.97 timeline while the video will be 23.976. Can you drop these file into a 29.97 timeline, while also having a corresponding 23.976 timeline for the video? (of course 29.97 and 23.976 will be the same to the second)

- With 24p/23.976p shoots/editing and audio recorders being able to handle 23.976 TC, will 23.976 audio TC become more prevalent (instead of 29.97)?

Thanks

- so the master generator generates LTC + genlock for the cameras and LTC + wordclock for the audio recorder, right?
Yes though some audio recording equipment will resolve to just the LTC.

- when syncing cameras to external TC, they always need genlock as well to keep the TC running correctly, don't they?
If by "correctly" you mean to greatest accuracy, yes but some devices will simply jam the internal time code register to whatever is coming in over the LTC connector regardless of whether video sync is available.

- what's the difference between genlock and tri-level sync?
The second is a flavor of the first. In SD the horizontal sync pulse is unipolar i.e. it drops to -300 mV, stays there for 4.7 usec (NTSC) and then returns to 0 V. Trilevel sync is used in HD and, as the name suggests, involves 3 voltages. For 1080i/30 sync starts 44 samples (at a74.25MHz sample clock rate) before 0H when the signal drops from 0V to -300 mV. At 0H it rises to +300mV and 44 samples after 0H it returns to 0V. IOW the genlock signal in an SD system would have negative going sync pulses only while the genlock signal in an HD system would have trilevel sync with negative and positive going pulses.

- do cameras ever jamsync themselves or do you always need a device like an Ambient Lockit box when jamsyncing?
In the sense that you can set a timecode into a camera manually I suppose you could say they can be jamsynced without an external signal but in general an external timecode from another source (Lockit, another camera, master timecode generator...) is required.

- Are NLEs able to do this [align different sources] automatically or do you always have to do it manually?
I haven't stumbled across a way to do it automatically as yet but the programs are so powerful (so many features) that I wouldn't be too surprised to find out that it was possible.

- Do NLE's support multiple timelines? E.g. location audio is often recorded at 29.97 fps while cameras run at 23.976 fps (24p shoot).
I don't think so though I have no experience with this. I believe that it is necessary to tell the NLE that material being captured in 24fps so that it can make adjustments to (resample) the audio so that what you want to do is possible.

- With 24p/23.976p shoots/editing and audio recorders being able to handle 23.976 TC, will 23.976 audio TC become more prevalent (instead of 29.97)?
Really don't know. I'd guess that support of both is more likely to happen than discarding 29.97.

Quote:

Originally Posted by A. J. deLange

-- Do NLE's support multiple timelines? E.g. location audio is often recorded at 29.97 fps while cameras run at 23.976 fps (24p shoot).
I don't think so though I have no experience with this. I believe that it is necessary to tell the NLE that material being captured in 24fps so that it can make adjustments to (resample) the audio so that what you want to do is possible.

This is totally NLE dependent, obviously.
Sony Vegas manages this just fine. I know FCP and Avid don't, but haven't tested/worked with this anywhere else. On multiple occasions, we've recorded multitrack audio at 29.97 with 23.978 or 25.00 fps on the cameras, and they all work correctly.

Quote:

Originally Posted by Roy Bemelmans

...
Do NLE's support multiple timelines? E.g. location audio is often recorded at 29.97 fps while cameras run at 23.976 fps (24p shoot). When using a hard disk audio recorder recording to timestamped BWF-files, these files then correspond to a 29.97 timeline while the video will be 23.976. Can you drop these file into a 29.97 timeline, while also having a corresponding 23.976 timeline for the video? (of course 29.97 and 23.976 will be the same to the second)
...

No offense but I think you're reading far more into the technology of timecode than really is there and giving it far more importance that it merits, thinking that it does far more than it really does do. Unless you have to actually synchonize and blend data streams from several devices at once, such as where you cut from one camera to another in a live shoot or when you mix several digital audio streams (SP/DIF, ADAT, or AES, etc) from different sources together during a mastering session, all timecode does is give you a reference marker for time. Digital audio and digital video both use samples as their basic unit of data and none of the timecode choices have anything to do with how many of those samples are created per second nor are the samples of audio and those of video coordinated with each other. With audio, the various "frame rates" are really only indications of how finely you want to break up measuring a second of time and what units you prefer to use. AFAIK they have nothing to do with the way the audio is recorded. It's like having a wrist watch that has one minute gradations and no second hand, a wall clock with 1 second markings, a stopwatch with 1/10 second markings, and a chronometer with 1/100 second markings - they'll all keep track of 1 minute of time at exactly the same rate but you can measure more precisely with the chronometer than you can with the wristwatch. One minute of material is one minute of material regardless of which watch you use. One minute of material will have a readout of 1800 frames if you're using 30fps, 1798 frames at 29.97, 1440 at 24 fps, or 1438 at 23.976 fps. But it is 1 minute of material in every case. Now imagine measuring with a chronometer graduated in 1/2 second markings and another graduated in 1/3 second markings - that's what the various timecodes are like - but one minute is still 1 minute. Or viewed another way, your choice of timecode is like the choice between English and metric units of weight - a chicken weighs exactly the same regardless of whether your scale is calibrated in pounds or in kilograms, the only difference is the numbers you write down.

The NLE's timeline represents *program* time, not *source* time and it would be a rare situation indeed where the two matched up. A clip originally recorded with timecode of 01:45:30;15 at its first frame may end up being used at exactly 5 minutes, 30 seconds into the edited program so it would be placed starting at 00:05:30;00 in the NLE's timeline.

Think about it this way. You have 5 hours of material to edit down to a 1 hour program. Odds are the *source* timecode for the last few shots will be somewhere up around 05:00:00;00 and yet they'r going into the NLE timeline at perhaps somewhere around 00:59:00;00. Or another case, you may have a 2-shot recorded in the morning with timecode running from 01:23:45;00 to 01:45:00;30 and you're going to intercut it with a pair of closeups shot in the afternoon with timecodes from 03:22:33;00 to 03:40:44;12 and 04:33:44;00 to 04:45:45;12 respectively. If you saw the timecodes that were recorded with the source materials as you view the final cut frame-by-frame they would be jumping all over the place, breaking at every edit point. But the *program* timecode from the NLE timeline will be a nice and tidy linear progression through the entire sequence from start to end.

So why record timecode on double-system video/audio at all? Convenience! It's just a counter to aid in keeping track that *this" bit of audio was recorded as "that" frame of video was recorded, helping align them in post. While multiple cameras being intercut live need to be synced through genlock, a video camera and an audio recorder don't have to be locked to the same speed as long as you can track how the two recordings should be matched up later. It's simpler if they're at the same sample rate but remember timecode choice doesn't affect sample rate. A frame in video makes up a single still image but there's no equivalent in digital audio AFAIK and the reference to the "frames" in the 29.97 fps audio timecode you mention above is just a convenience so you can use the same numbers in measuring the passage of time in the audio as you use to measure time in the video. In the audio it means you can measure a sound's position in time to a 1/29.97 second of accuracy. (If you wanna really have your head spinning, the timecode standard often used for mastering audio CDs reference a 75 fps frame rate!) The camera and the recorder numbers need to match up for convenience but that's all. If the camera is at 23.976 fps and you can set the audio recorded to the same, do so so the timecode numbers on both at the same point in the shot match up. If you can't set the two to the same, figure out a conversion when you're editing such as "At exactly 2 minutes, 30 and one half seconds into the shot, the camera's timecode reads 00:02:30;12 while the timecode in the audio will read 00:02:30;15."

BTW, AFAIK the BWF file only records the first frame's timecode in the header. When you import it into an NLE, it drops it at the point where the BWF header timecode matches the NLE timeline timecode. Nothing else that happens to count of the code in the audio after that first frame matters. You could be counting in fortnights instead of frames and it wouldn't change anything.

Thanks for the elaborate reply steve. Much appreciated. You see I've been studying TC on the set as a means of syncing picture and audio in post, figuring out how it's all done. Of course everything comes together during editing, so my last questions are kind of about the last stages of all this and I want to get a clear picture of it, that's why I'm somewhat going on about it :-), sorrry..

To summarize:

I understand your point about the difference between program TC and source TC. Of course your timeline will show the continuous program TC, while the source TCs of different video tracks will be visible in other windows. You use the source TC in these windows to line up the video tracks. Then you import your BWF-files. You check their source TC start position and also manually line them up with the source TC in the video. And you're done. I guess that's as close a generic overview as I can get. The specifics will change from NLE to NLE. Maybe some have options to automatically move a BWF-file to its TC start postion using the location TC of a video track..

Quote:

Originally Posted by A. J. deLange

- what's the difference between genlock and tri-level sync?
The second is a flavor of the first. In SD the horizontal sync pulse is unipolar i.e. it drops to -300 mV, stays there for 4.7 usec (NTSC) and then returns to 0 V. Trilevel sync is used in HD and, as the name suggests, involves 3 voltages. For 1080i/30 sync starts 44 samples (at a74.25MHz sample clock rate) before 0H when the signal drops from 0V to -300 mV. At 0H it rises to +300mV and 44 samples after 0H it returns to 0V. IOW the genlock signal in an SD system would have negative going sync pulses only while the genlock signal in an HD system would have trilevel sync with negative and positive going pulses.

Wow! :-) Let me know if I correctly put this in in English: both are video word sync signals. Generally HD cameras want to see tri-level sync, SD cameras genlock (but not necessarily -> e.g. the Canon XL H1 uses normal genlock for word syncing if I'm not mistaking)

Quote:

Originally Posted by Douglas Spotted Eagle

Did I correctly summarize the workflow for this above, Douglas?

I agree with Steve. Don't worry about the technology. The speed of today's computers, camcorders, and digital audio recorders is controlled by a crystal. It's pretty common for crystal oscillators to maintain an accuracy of about 5 parts per million (PPM) or better. So even if you don't lock anything together, things should stay within a frame of each other over an hour's time. If you start all devices close to the same time, TC will be close. Use a clapboard, or just clap ypour hands once in view of all cameras. Once you calculate the offset, you're set (at least for awhile).

I recorded a piano concert a few years ago, and in an emergency used my son's mini-disc recorder. Had to do an analog capture to computer, but used THE SAME MD recorder to play back (same crystal oscillator), and had undectable error in an hour and a half. Two camcorders were also run independantly

Locked devices will sync perfectly, but today's stuff will still stay very close to speed.

I saw somebody using the term LTC earlier. In the old days, that meant the time code that was recorded onto a LINEAR audio track. VITC was put into the VERTICAL INTERVAL of analog video. As far as I know, neither of these terms are really meaningful in today's digital world. It's just "time code".

Steve,

I'm with you on everything you say except that there is no relationship between samples and time code and between audio and video samples. One of the things that got talked about a lot in the past was that most of the prosumer cameras did not tie audio and video sampling together which potentially led to problems. In the XLH1 that has been fixed - audio sampling is locked to video (in the SD mode you have a choice of locked or unlocked though I have no idea what the implications of unlocked might be - shouldn't be worse than with the XL2 and I never really found it a problem there).

In HD1080/60i the sample clock is 74.25 MHz which is 1237500 times the field rate. The audio sampling rate, 48kHz is 800 times the field rate. Each line has a specified numbe of samples. Similar numbers are assigned to the other standards with those derived from NTSC normalizing all numbers by 1.001 (e.g. 30/1.001 = 29.97). Thus one could compute the number of an audio sample that corresponds to a particular video sample and conversely.

Roy,

A genlock signal is a video signal with no picture information (it is sometimes referred to as a "black burst" signal because the picture is all black or blacker than black. A video signal contains synchronizing pulses and they are of two types: unipolar (bi-level i.e. two levels -300 mV and 0 mV) and bipolar (tri level i.e -300, 0 and +300mV). Bi level sync is found in SD video and SD genlock signals. Trilevel sync is found in HD video and HD genlock signals. Word clock is not contained in a genlock signal but is rather generated from the genlock signal in cases where sample accurate audio is desired.