CSIS Seminar Presentation outline

CSIS Seminar Presentation outline

Audio Networking Technical challenges and possibilities for distributed digital sound production at Otago Chris Edwards Department of Information Science University of Otago CSIS Seminar series, July 2010 Motivation and Background Music Departments new $1M SSL mixing console New Zealand Music Industry Centre (NZMiC) KAREN high-capacity network connectivity Interesting technical and creative possibilities: Remote (live) mixing

Remote recording (live or layered multi-track) Distributed real-time performance (live and recorded) Internet broadcast/multicast/streaming Asynchronous production tasks, e.g. (re)mixing, mastering, film score composition, with very short turnaround The SSL Console Solid State Logic model C200 HD Digital control surface, array of common per-channel controls Signal level metering Transport control, timecode display Full automation (programmable fader (etc.) motion) DAW control (mouse, keyboard, display) 64 dedicated control strips, pageable for even more

(and it can play Pong!) Not just a pretty face: Behind the console Dual C-SB Stageboxes 48 high-quality mic inputs each Gain and pre-amp behaviour remotely controllable ~2 km reach over single-mode optical fibre Portable; plans for Marama Hall, Stadium, Town Hall Centuri core Signal routing and control surface I/O Storage DSP modules

Outboard hardware effects processors DAW (a Power Macintosh with MADI card) KAREN connectivity Albany Street Installation 48 mic input channels per stage unit SSL SSLConsole Console C-SB C-SB

Stage Stage Unit Unit optical fibre (2 km reach) C-SB C-SB Stage Stage Unit Unit

SSL SSL Centuri Centuri core core SSL SSLDSP DSP Units Units O/B O/B

Effects Effects Units Units MADI (Multi-Channel Audio Digital Interface) DAW DAW (Mac) (Mac) Network

KAREN Kiwi Advanced Research and Education Network Operated by REANNZ (Research and Education Advanced Network New Zealand), Ltd. (Crown-owned company) 10 Gb/s generally available between participating institutions 16 national points-of-presence (PoPs) International links to Australia, North America and, via these, to Asia and Europe KAREN NZ Network Map Source: http://www.karen.net.nz/topology/

Digital Audio Basics Discretised, quantised representation of continuous analog signal Signal is represented as a stream of numbers Driven by hardware clock (typically a crystal oscillator) One sample recorded/played every clock cycle Fs is sampling frequency Amplitude Typ. transmitted digitally using PCM (pulse-code modulation) Time

Digital Audio Basics (2) Typical audio sampling frequencies are 10s100s of kHz Human hearing tops out around 1620 kHz Nyquist limit (highest reproducible frequency) is F s/2 Fs {44.1 kHz (CD-DA), 48 kHz, 96 kHz, 192 kHz} {44.1 kHz (CD-DA), 48 kHz, 96 kHz, 192 kHz} Higher Fs means more resampling options, better resampling quality Higher Fs also makes life easier for the low-pass filters Oversampling is also commonly used 24-bit signed integer precision common in studio work ~120 dB theoretical dynamic range; limited by analog noise floor in practice Different ways of being wrong:

Quantisation error, jitter et al. Use of quantisation means approximation, error Amplitude (bits per sample) Time (sampling frequency) More bits and/or faster clock: better approximation Often a worthwhile trade-off (no generation loss, DSP, transmission) Hardware clocks are not perfectly stable Non-uniformity classified as follows: Jitter (short-term variations: cycle-to-cycle) Wander (medium-term) Drift (long-term) Measurement: eye diagrams, Allan variance

A stable clock is not necessarily an accurate clock! Transmitting Digital Audio (within the studio) Example protocols: AES3 (AES/EBU) MADI ADAT Optical Interface S/PDIF Local digital audio transmission is generally synchronous. Must avoid clock drift to avoid buffer over-/under-runs A word about word clock

Synchronous operation means having a common reference clock Word clock is a dedicated digital signal operating at Fs Word clock timecode, its a frequency reference Master clock signal fanned out to slave devices via dedicated co-axial cable Clock can also be sent as part of audio data In-band bit clock (self-clocking signal) Used in AES3, ADAT, S/PDIF Typically uses bi-phase mark coding or similar Both assume complete control of physical medium. Digital audio is surprisingly demanding on clock quality. Transmitting Digital Audio (beyond the studio)

Pro networked-audio protocols generally operate at OSI Layer 2 Data Link layer, i.e. not routeable, local area only Examples: 4. Transport TCP UDP Ethernet is popular, unsurprisingly AVB (IEEE 1722)

EtherSound 2. Data Link 1. Physical UTP CobraNet Ethernet AES51 (AES3 over ATM over Ethernet)

3. Network IP AES47 (IEC 62365, AES3 over ATM) Some OSI Layer 3/4 protocols do exist: NetJACK (Open Source) Livewire (Some) challenges for distributed digital audio production

1. Audio hardware clock synchronisation 2. Audio data delivery (network service quality) Network capacity (bandwidth) Latency (packet delivery time, i.e. delay)

Trade-offs between these Quality of Service (QoS) assurance (per-packet priority) Network out[r]ages 3.

Timecode and transport control 4. Interoperability in general Protocols Framing

Data representation and encoding Challenge 1: Hardware Clock Drift Unsynchronised audio hardware clocks will drift Drifting too far will lead to buffer over-/under-runs Unacceptable audio glitches (drop-outs, pops/clicks) Word-clock operates at physical level Running co-ax to Auckland, London, Seattle not feasible! Hardware clock synchronisation: Some possible solutions

Discipline the audio clock using a common external source Internet Network Time Protocol (NTP) (Mills, 1980s) Timecode embedded in application-level packets GPS PPS timekeeping signal Dynamically resample audio at each node Solution should be low-jitter: More than a few hundred picoseconds may be unacceptable Jitter may manifest as white noise or more complex distortions At best, jitter undermines SNR Reasons for leaning towards GPS Resampling degrades quality; avoid if possible. Pro audio hardware generally has word-clock input anyway.

A hardware solution would be convenient. NTP doesnt claim especially high accuracy Approx. 10 ms for general use on the Internet Personal computer hardware clocks are not especially accurate or stable NTP is primarily concerned with absolute timekeeping. We care more about consistent frequency. NTP assumes symmetric network paths (not a problem for frequency reference only?) NTPs clock slewing behaviour might be disruptive if applied to audio AD/DA converters? NTP experts recommend using GPS anyway! (Shalunov, 2005) GPS is globally available and uses a dedicated radio signalling. GPS satellite network is guaranteed to keep closely in sync; ideal for single master clock

approach. Pros and cons to be investigated further! GPS The Global Positioning System Wherever you go, there you are. anon. The Global Positioning System (GPS) Basically a distributed high-precision time-keeping and message broadcasting system 24 satellites (plus spares!) in medium Earth orbit (20,000 km altitude) 6 orbital planes with 4 satellites each

4 must be visible to receiver to get precise position. True position of each satellite is known/predictable (the ephemeris). Satellites broadcast time-stamped messages. Delay in receiving timestamped message determines distance from satellite. Intersection of distances pinpoints location in space GPS is also used to help other satellites know where (and when) they are. How GPS location works GPS uses distance (from time) rather than direction: Receiver uses delay in receiving each message to calculate distance to the satellite that sent it. Requires very precise timekeeping, as messages travel at/near light speed. Relativistic effects must be accounted for!

1D position (i.e. on a line) requires two distance measurements. 2D (on a plane) requires three distance measurements (circles). 3D (in space) requires four distance measurements (spheres). Earths sphere could be used to provide the fourth distance (provided you are on the surface). Would still require four readings for altitude. (essential if flying or in space) Using four measurements improves accuracy as well. GPS in one dimension You Are Here

r1 Satellite 1 Satellite positions are known Messages are time-stamped, so time of sending is known Delay in receiving message can be measured Distance is proportional to delay Intersection of distances determines actual position r2 Satellite 2 GPS for time-keeping PPS (pulse per second, i.e. 1 Hz) signal available is externally on

many GPS receivers. Can be used for precise timekeeping, even in remote areas. Once location is determined and locked in, even higher timing accuracy is possible. Can derive higher frequencies (for word clock) using frequency synthesis. Proposed Scheme Use globally available GPS PPS signal to discipline local audio hardware clocks Uniform frequency (not absolute time) is the critical thing. Avoid clock drift across sites, to avoid buffering errors. Already been done! Shera (1998):

Ham radio application, originally Voltage-controlled crystal oscillator (VCXO) PLL-based regulation (phase-locked control feedback loop, de Bellescize (1932)) Temperature-sensitive (even with thermostatic oven) 27 MHz master clock is common in multimedia systems Because of NTSC television timings, AFAICT Video sync input required for SSL Centuri (implications?) Shera (1998): block diagram u-blox LEA-6T GPS receiver module for precision timing applications Position-lock for greater timekeeping accuracy

Programmable output clock pulse, 1/60 Hz to 10 MHz High sensitivity; useable indoors 15 ns accuracy achievable Ideally would simply connect LEA-6T clock output to audio word clock input Innovative Integration PCIeTiming card PCIe expansion card GPS receiver for clock discipline Multiple programmable digital clocks 1560 kHz .. 1 GHz output 0.2 ps jitter specification

How a PLL works (analogy: two cars on a race track) 1 lap = 1 clock cycle Master reference car and following slave car Lead or lag is phase difference Measure once per lap or continuously Constant phase difference means same frequency If gaining, slow down slightly If lagging, speed up slightly Frequency is the derivative of phase! PLL Demo in Pure Data (if time)

The Software Side (in case you dont know JACK ) ) JACK = JACK Audio Connection Kit (Paul Davis, ~2000?) Audio server program providing low latency and sample-accurate sync Like an Open Source combination of ReWire (inter-process audio), ASIO (low-latency audio I/O) and VST (software plug-ins) Provides audio routing among software clients and hardware Clients may be ordinary processes or in-process plug-ins Originated on Linux, now also runs on Mac OS X, Windows, BSDs Also can provide network transport over IP (NetJACK)! Probably an ideal platform for research software development

JACK details (1) Runs at real-time priority where possible No additional latency due to JACK itself mmap()s to system audio buffers Provides a high-level audio API Client software requires no audio hardware access code Various audio back-ends: ALSA, FFADO, Core Audio, PortAudio, etc. Enables rapid development and portability of audio apps Client connects to server, registers audio input/output port(s) Registered clients have process() callback invoked on demand by JACK server

Synchronous execution of all clients Supports MIDI data streams too; may support video etc. in future JACK details (2) All audio data represented uniformly as 32-bit IEEE floating-point, normalised to -1.0..+1.0 Provides global transport control and timecode No multiplexing/interleaving (e.g. stereo, 5.1, etc.) at the JACK level One port: one channel Use whatever channel configuration you need Buffer over-/under-runs (xruns) detected and reported by JACK

server Server can disconnect misbehaving clients JACK details (3) Audio processing driven by audio hardware Hardware buffer typically divided in two (double-buffering): Software reads from one buffer, writes to the other One interrupt period to receive input Two interrupt periods to process and deliver (input and output) Example timing: 256 frames/period 2 periods/buffer @ 96 kHz: (1 frame is all samples across channels taken at one sampling interval) 375 Hz interrupt rate

~5 ms through latency Comparable to sound delay from monitor speakers JACK buffer management Software Buffer (frames/period nperiods) Period 1 512 frames Period 2 512 frames Audio Hardware

JACK: Implementation Challenges (Hard) real-time processing requirements Also, want non-root users to be able to run JACK and clients May have only hundreds of microseconds to run all client process() callbacks Overhead of context switches (e.g. CPU cache invalidation) is significant! Linux signals proved too slow to be used for JACK IPC. Current design uses FIFOs. Client callbacks must of course be RT-safe. Recording/streaming software must do I/O!

NetJACK Networking extension to JACK Technically just another audio back-end Allows multiple JACK instances to communicate via UDP/IP Remote (slave) JACK instances run inside the master JACK loop BUT!: slave instances are generally deaf and mute No audio clock available; driven by reception of network packets instead Processing only; no audio I/O (DSP farming) However: Sample-rate conversion exists in code-base for local audio I/O CELT lossy codec with packet-loss concealment also available

Might be suitable for use/adaptation for distributed studio work Large buffer period sizes to handle latency (4096 frames for 96 kHz within NZ?) NetJACK: Possible modifications Allow normal audio I/O on NetJACK slave instances No resampling, so no loss of quality Could be feasible if hardware clock synch scheme works Would it require/experience some extra buffering? Jitter buffer I/O still triggered by audio hardware Facility to measure and record network latencies (Local) JACK already accounts for latency throughout the call graph JACK transport pre-roll can compensate for playback latency

Challenge 2: Data Delivery Quality Long-range Internet transport is highly variable: Non-uniform delivery time of packets Variable bandwidth available Congestion, traffic-shaping, etc. Live audio data must be delivered as fast as possible Buffering generally increases throughput, robustness and jitterimmunity at the expense of latency Network performance on KAREN KAREN should provide a good starting point for feasibility studies Bandwidth aplenty: Up to 10 Gb/s generally available

~10,000 typical home DSL Typically under 5% utilisation Audio: ~600 raw Mb/s for 96 32-bit audio channels at 192 kHz Whole-session transfer in < 10 s (in theory) 4 minutes 24 tracks of 24-bit audio @ 96 kHz 4 GB Endpoint disk I/O is probably the bottleneck in practice Interestingly, no QoS facilities Latency is the big problem Audio signals must be kept within ~15 ms to seem musically simultaneous Acoustic and electromagnetic signal propagation is not instantaneous

~3 ms/m for sound waves in air (~330 m/s) Light (fibre-optic) and electrical signal propagation is typically around 0.7c ~5 ms/1000 km 2030 ms RTT (round-trip time) observed between Otago and Auckland via KAREN (so 1015 ms each way) Worst-case latency is really the important case The Latency Problem Guitar [email protected] @

Auckland Auckland 15 ms delay 2. Guitar plays in sync with heard drum sound 15 ms delay Drums [email protected] @ Dunedin Dunedin

1. Drum part provides reference timing 3. Guitar part sounds late by 30 ms I canna change the Laws of Physics Ob s e r v e d l a t e n c i e s t o i n t e rn a t i o n a l l o c a t i o n s v i a KAREN (source:

https://kmeasure.karen.ac.nz/cgi-bin/smokeping.cgi?target=INTERNATIONAL_LOCATIONS Sydney: ~40 ms RTT Perth: ~80 ms RTT Seattle: ~160 ms RTT North America generally: 200..300 ms RTT Asia: 300..500 ms RTT

Europe: 300..400 ms RTT Note: these are averages (show me the histograms!) ) Equivalent Approx. Distances (cf. propagation of sound waves in air) d=v/t Dunedin to Auckland: 5 m Dunedin to Sydney: 10 m Dunedin to Seattle:

30 m Dunedin to Europe: 60 m International latencies, in musical terms At 120 BPM tempo (e=120): 2 beats/s Asia round-trip e North America round-trip r Australia round-trip y

Network latency will be a problem for certain applications. Acoustic demonstrations of delay Phasing (comb filtering) 0.02..15 ms Stereophonic (Haas effect) shifts Distinct echoes ~50+ ms ~10-50 ms

Synchronisation vs. delay Synchronisation and delay are two different problems. For some applications, delay is largely irrelevant e.g. mixing a band from 20 m away can still be done Synchronisation, however, is generally critical esp. if the same audio is split across multiple paths and recombined comb filtering, changes in comb filtering What Might Be Feasible? Mixing can be considered part of a live performance, but latency requirement is less stringent Remote recording is one-directional; high latency is quite acceptable. Internet streaming ditto.

Pre-scored performance is easier than fully live E.g. Sibelius score, sequenced backing, metronome/click-track Pre-roll to compensate for latency Layered multi-track recording generally doable Latency requirements can be relaxed considerably under certain conditions: In particular, if nodes dont need to hear all other nodes Acyclic audio processing graph Sync is more important than absolute delay in many situations Better read up on some graph theory...! For further investigation

Determine required audio hardware clock quality (jitter, drift, etc.) Trial the GPS hardware clock sync idea Is variable satellite visibility a problem? Test feasibility of NTP for hardware clock sync Determine latency requirements for potential applications Develop/co-opt network analysis framework for distributed studio Delve into the JACK code (ZOMG! Real-time C code!) Investigate network tuning parameters Investigate use of the Internet for longer-haul transport Moving to the Internet KAREN provides many benefits over a normal consumer Internet connection

Long-haul Internet would mean significantly lower connection quality (bandwidth, latency, packet jitter, reliability) Potential hassles: QoS and traffic-shaping Firewalls and NAT CELT for lower data rate and concealment of packet loss? Only if necessary (it is lossy) References and further reading Stereophile magazine article on digital audio clock jitter http://www.stereophile.com/reference/193jitter/ Sound on Sound article on digital studio clocks http://www.soundonsound.com/sos/jun10/articles/masterclocks.htm

Brooks Shera's GPS-Controlled Frequency Standard http://www.rt66.com/~shera/index_fs.htm Phase-Locked Loop (PLL) overview http://en.wikipedia.org/wiki/Phase-locked_loop NTP overview http://www.eecis.udel.edu/~mills/exec.html Shalunov, 2005: NTP Cookbook http://www.internet2.edu/workshops/npw/binder-docs/ntp-cookbook.pdf NTP RFC document http://www.eecis.udel.edu/~mills/database/rfc/rfc1059.txt NetJACK2 architectural overview http://trac.jackaudio.org/wiki/WalkThrough/User/NetJack2 KAREN timing statistics https://kmeasure.karen.ac.nz/cgi-bin/smokeping.cgi?target=INTERNATIONAL_LOCATIONS

Allan clock variance measurement http://en.wikipedia.org/wiki/Allan_variance To find this document, go to: http://eprints.otago.ac.nz/ Questions? ? Suggestions: What about Skype? How does it manage? What about MIDI?

How do musicians deal with latency normally?

Recently Viewed Presentations

  • Location, Climate, Distribution of Natural Resources and ...

    Location, Climate, Distribution of Natural Resources and ...

    SS6G6: The student will explain the impact of location, climate, distribution of natural resources, and population distribution in Canada. A. Describe how Canada's location, climate, and natural resources have affected where people live. B. Describe how Canada's location, climate, and...
  • At the Movies 1952 - 1969  Tom R.

    At the Movies 1952 - 1969 Tom R.

    Ben-Hur. North by Northwest. Sleeping Beauty. Some Like It Hot. Rio Bravo. The Diary of Anne Frank. Journey to the Center of the Earth. ... Coogan's. Bluff. Butch Cassidy and the Sundance Kid. Easy Rider. Midnight Cowboy. True Grit. The...
  • Project 2 Part D - Kailyn Rachel White

    Project 2 Part D - Kailyn Rachel White

    Avoid having a print ad that is too busy. Avoid using abbreviations, slang, jargon. Always be clear and concise in your message. Parts A, B, and C helped us learn how to make a successful publicity campaign
  • targettrans63 - Brookhaven National Laboratory

    targettrans63 - Brookhaven National Laboratory

    NuMi target: graphite fin core. Water-cooling tube provides mechanical support. Target is upstream of the horn. Nova target for 0.7 MW. Upstream of horn. Graphite fins, 120 cm tota.l Water-cooled Al can. Proton beam = 1.3 mm. Annular channel (4...
  • Introduction to Libstats Kristin Whitehair Danielle Theiss-White Jason

    Introduction to Libstats Kristin Whitehair Danielle Theiss-White Jason

    Introduction to Libstats Kristin Whitehair Danielle Theiss-White Jason Coleman Dale Askey Standard data format? RUSA defines a reference transaction ACRL, etc. want to know our stats Need to compare ourselves to peers Speaks to the need for a standard But,...
  • The Missing Piece of Formative Assessment: Application Susan

    The Missing Piece of Formative Assessment: Application Susan

    *Hint- We use an actual dictionary for this! Write two guide words on a colored index card different from the other cards. Have the students work to first put the words in alphabetical order. Then point out the guide words...
  • Exceptional Control Flow I

    Exceptional Control Flow I

    Exceptional Control Flow. Change in control flow in response to a system event. Low level Mechanisms. 1. Exceptions and interrupts. Higher Level Mechanisms
  • 2012 2012 Building Building Codes Codes Take Take

    2012 2012 Building Building Codes Codes Take Take

    Wood floor assemblies using dimension lumber or structural composite lumber (LVL's or similar) equal to or greater than 2 inch by 10 inch. DOES ON INCLUDE OPEN WEB TRUSSES. When located over sprinkled areas. When located over a crawl space...