How to merge SSD & HDD as a Fusion drive on OS X.
At work we have a web 2.0 intranet site called C3 that allows for collaboration. It allows posting of video, and the execs have started to use it for corporate communications, which is great. Except…
The lip sync is out.
Because I am out here in Australia, the first reaction from everyone in response to feedback about the problem is that it must be a network problem. They ask for trace routes, and check to see if there are CDN servers out here in the boonies of the internet.
It’s so frustrating when supposedly technically oriented people blame the network without understanding. I finally got some traction (though the issue is not solved yet, when I posted a blog about the issue and included a mini tutorial on digital video and why the lip sync shouldn’t be affected by the network, or the end user environment.
A brief Digital Video tutorial
To understand how to solve this, I’ll take a short detour and explain the fundamentals of digital video, not in technical depth, but enough to understand the issues.
Video file formats are container formats. They hold a number of media “tracks” which broadly can be video (moving pictures), audio, or text (subtitles, Kareoke anyone?). Other obscure media types, like geo-spatial position information, Non-Player Character presence, or practically anything else you can imagine can be added through private streams in formats like MPEG-4.
Each of these streams exists in the file independently, though often interleaved. The raw media itslef is chopped up into packets that contain a fragment of the real time track - a video frame (single picture) or a few milliseconds of audio. Within each packet is a time stamp about when that little piece of media needs to be played out by the player.
That should be pretty easy right? Get each stream, start playing it, and wait for the right time to play out each packet. In theory yes, but as you would expect, things get a little more complicated than that.
To synchronise the media streams, you need to have a common time reference system with a master “clock” that is at the same resolution. That resolution has to handle the many different encoding rates for both audio and video. The MPEG standards use a 90kHz clock, so all time stamps should be accurate to 1/90000th of a second. Then we have to take into account that there are many, many different ways that video and even audio can be represented digitally. The constant goal of video and audio CODEC designers is to get as much quality into as small a size as possible. To do that they come up with some pretty fancy techniques. One is to build in a mechanism to indicate when two video frames are identical, rather than the waste of storing two frames with the same image. Taking this a step further, only storing the differences between frames means a lot less information per frame needs to be transmitted.
MPEG uses three different types of video frame;
This means that as your video arrives over the network and the player is working out when to play each frame, it needs to be able to decode the frame and present (or display) the frame. To properly decode it, it may need information from a frame that comes after that frame in the video playback sequence. If it is time to present that frame but it can’t be properly decoded it because the information from a future frame hasn’t arrived, there is a problem. To solve this, MPEG streams have two separate time stamps; a decoding time stamp (DTS) and a presentation time stamp (PTS). The DTS tells us when we need to decode something, and the PTS tells us when we need to display something.
Armed with a PTS and DTS we can now re-order the frames we have in our file and send them over the network out of order, to minimise the buffer we need to store and to start playing video sooner for the end user. The player at the user end needs to look at each stream, decode it in the right order, and present it in the right order and at the right time. Sometimes it is impossible, through lack of resources (network, CPU, memory) to play out both streams accurately and play everything that is stored within the container. When you have two different media streams that are stored independently but need to be played out in lock step, it may mean droping or padding small fractions in one stream to keep the playback of the two streams together. Anyone who has worked with audio visual material for any length of time will tell you that “audio is king”, and users are far more tolerant of visual errors or noise than poor audio.
Streaming and Delivery Over the Network
If all we needed to do was play a local file, then keeping sync would be pretty straight forward. Things get a little more complex with wanting to play the video over the network, but only a little. In the early days of Internet (web) delivered video the “play a local file” was in fact the approach. If the video wasn’t linked to as a separate download, it was embedded in the page with a player, but you couldn’t start playing it until the entire file was available locally. (I think everyone could agree that since they are effectively downloading a file, the network isn’t going to cause the audio sync to be out).
Of course waiting for a large file to download is not really the experience you’re looking for. If we wanted to broadcast live video, that wouldn’t work at all, so we would go to the other extreme of sending little bits of video and audio over the network and when they arrive at the other end, we would play them out. Since the Internet is a best effort store and forward network, and the usual Internet transport protocol (TCP) is not suited to sending things in real time, this is very difficult. To make it a little better a lossy transport protocol is used, (RTP over UDP) to fire packets off from the sender without caring if they get there or not because there is no point re-sending a packet after it should have been played. This makes the syncing of live streaming even harder, but it still can be done - there is absolutely no technical reason to lose sync even in real time streamed video.
Most modern web video files don’t actually use streaming as such, but something called pseudo-streaming, where the file is being sent as a file (lossless using TCP not lossy RTP over UDP) but the container and stream formats are designed in such a way as to allow the player at the other end to begin playing the video before the whole file is downloaded. The player will buffer an (hopefully small) amount before it begins the presentation (play out) of the video and audio.
C3 seems to use some kind of simulated-streaming that is a bastardisation of RTP streaming and pseudo-streaming and is generally used to stop people saving a copy of the video. C3 videos appear to be delivered via http over TCP (ie lossless file transfer) but delivered as very small individual fragments/files, so there is no single file. I am guessing it’s RTMP - a proprietary Flash protocol. Unfortunately the flash player obfuscates the detail, and I can’t actually see what is going on.
Why the Network and Quality of Service shouldn’t affect lip-sync
For some reason, even technically oriented people seem to jump to the conclusion that if anything goes wrong when there is network delivery involved, it must be the network (this phenomenon is not isolated to C3 video!). I will try to explain why the network and Quality of Service (QoS) have nothing to do with losing sync in video.
There is absolutely no technical reason for the network (even a slow, high latency, lossy network) to cause video to lose sync with audio. It may cause many other issues, (jerky motion, long delays and interrupted playing) but if the transcoding at the server end, and the web player at the user end are “done right” then audio should never be out of sync.
Regardless of the actual delivery of the video (file, streaming, pseudo-streaming or simulated-streaming) the basic process of delivering video is essentially the same
* enough information is the only thing that varies as you move along the streaming / download continuum.
For real time streaming using RTP/UDP you simply drop audio or video packets whose time to play has passed. How choppy the sound and vision are depends on how much latency you can tolerate between the live action at the sending end, and the play out at the receiving end. This is one of the reasons that digital TV broadcasts are slightly delayed compared to analogue broadcasts - buffering a few seconds allows the player a much better chance of getting the packets in time to play them out.
For reliable protocols like TCP (download, pseudo-streaming, simulated-streaming including C3), it is really easy for the player to get packets and assemble them in the correct order and align the timing of the two streams, even without QoS.
All that is required for this apparent miracle of not having the network affect the audio sync is
Please - stop blaming the network.
#!/bin/bash FBR='/Users/james/bin/fb-rotate' SID=`$FBR -i | grep 0x | grep -v main | sed -E 's/^[0-9]* *([^ ]+).*/\1/'` ROT=`$FBR -i | grep 0x | grep -v main | sed -E 's/.* ([0-9]+) */\1/'` if ( [ "90" == "$ROT" ] ) then ROT=0; else ROT=90; fi $FBR -d $SID -r $ROT;
It seems I can never remember this command
sudo fs_usage -f filesys
stat -f '%u %Su' /dev/console
This won’t let you know who else is using the GUI via screen sharing though. Still looking for an answer to that one.
If you have more than one Mac running OS X Lion and you’re signed in to the same iCloud account on all of them, you can SSH between them via iCloud’s IPv6 network.
First, find your Back To My Mac account number by running
Then SSH to another machine like so
ssh -2 -6 username@computer-name.[account number].members.btmm.icloud.com
That’s hard to remember and a hassle to type, so might want to add something like the following to your
Host mac-remote User username HostName computername.123456789.members.btmm.icloud.com AddressFamily inet6 Protocol 2
Which means you can just type
to log in to your other Mac when you’re out and about.
Reader TJ wrote in to say:
If you don’t want to hard code your Back To My Mac address into your .ssh/config you can get it dynamically using this line:
echo show Setup:/Network/BackToMyMac | scutil | sed -n 's/.* : *\(.*\).$/\1/p'