Skip to Main Content

Java SE (Java Platform, Standard Edition)

Announcement

For appeals, questions and feedback about Oracle Forums, please email oracle-forums-moderators_us@oracle.com. Technical questions should be asked in the appropriate category. Thank you!

Interested in getting your voice heard by members of the Developer Marketing team at Oracle? Check out this post for AppDev or this post for AI focus group information.

How can I sync with audio framerate?

845492Apr 28 2012 — edited May 30 2012
I hope that there is an entirely different way of doing what I'm trying to do because I've run up against what seems like an insurmountable obstacle. I have been working on a project for many months and the last piece to figure out is synchronizing application actions with the audio framerate. I have seen other apps doing this successfully (but proabably written in C++, not Java) so it can be done with standard Wintel boxes - but I'm now concerned that it cannot be done using Java Sound.

My project attempts to animate sound with visual images (not video or pictures but more abstract visualizations based on the audio itself). It works fine when simply "listeing" and responding to the audio. The final step is to be able to sync specific abstract visualizations with precise points in the audio (frames or position in microseconds). I'm now wondering if what I need is handled by the soundcard (hardware) and not available through Java. I presumed that I would be able to rely on the soundcard to handle the realtime audio stream (using its clock) and I would simply wait for increments of a certain frame count that correspond to the refresh/trigger times for my visualizations. As it turns out, after a lot of experimenting with my configuration, I can't find a way to trigger a method in my application synchronized wiht a certain period in the audio being played. I am playing audio by writing to a SourceDataLine, and listening (using loopback, I believe) on a TargetDataLine. To give an idea of what I'm looking for, if a LineListener supported notification based on reaching a certain frame position in the audio stream, that would be great. That doesn't exist (that I could find, only START/STOP/OPEN/CLOSE events). I've been trying to set the buffer size on the TargetDataLine, and the number of bytes (frames) that I read from that TDL, to achieve a synchronization with the actual rate the audio is playing through the sound card. What I've seen is that the number of frames made available through TDL is not consistent. The read() method blocks for various times and with differing incremental frame positions as the TDL buffer is replenished in real time.

My goal is to be able to detect when a number of frames corresponding to 20 milliSeconds at 48kHz or 44.1kHz sample rate have been played through the sound card. Is this possible? I've not posted any code because the code I have isn't non-functional, it's just not capable of performing the task I am trying to achieve. Suggestions as to how I can refine my existing approach, or what other direction/implementation I can pursue will be greatly appreciated.

Edit: typo.

Comments

Phil Freihofner
When you do a read() or write() in the SourceDataLine or TargetDataLine, you can increment a frame counter and use that to trigger an event. The problem, though, is that because of JVM switching, the point in time that a particular frame gets processed isn't the same time as the real time occurances of the playback.

If you have an "anchor" point: the nanoTime of when the first frame gets processed, you can calculate, with excellent precision, the nanoTime of the target frame relative to that anchor, via the sample rate, and use that to schedule an event via a Timer. But this would be limited by the accuracy of Timers which tend to rely on the OS. Microsoft's OS is something like 15msec accuracy, though there is a hack to trick it to increase its accuracy to 1 msec.

I had luck going in the "other" direction--taking EDT events and using them to affect a TDL playback. I did this by enqueuing and time-stamping the events and calculating the corresponding frame. For example, if two mousemotion events are N msecs apart, I make sure the TDL processes them the corresponding M frames apart. A frame counter in the TDL I wrote for the synthesizer determines when to act. This is done in a Java Theremin program that is online: http://www.hexara.com/VSL/JTheremin.htm

For sure, there must be a way to do the synchronization you describe, though, because I have seen it done with Processing (aka Proce55ing) a program for "Artists" that is built with Java. Example: works by "flight404" with amazing visuals cued off of the music. (http://vimeo.com/2120027)

Maybe his visuals are streamed via some sort of "frame rate" and the point is to get a formula for translating from one stream's frame rate to the other, as opposed to using "real time" triggers.
845492
I think this breaks down into two related but separate issues:

First, the latency between writing audio (to a SDL) and when that audio is actually heard. If I could find a way to measure that, and it was consistent, I could adjust my processing and delivery of audio data to that SDL. I have a hack that sets up a TDL and polls using read() and when the first sample above some threshold is encountered I can measure system time and infer a delay from the SDL.write(). The problem with this is that I have to start and stop the time measurement after the SDL.write() and TDL.read(), respectively. By testing various scenarios I've found that the delay is not constant, and I'm not able to determine an optimal use of read() or write() (buffer size, for example) that minimizes the delay or makes it consistent. This also doubles the number of paths introducing error, and can also be disturbed by other sounds being "heard" on the TDL and misinterpreted as realization of the audio I've written to the SDL when it is in fact not related at all. I have found that it is common for most platforms to not support the ability to "tee" audio being sent into an SDL of a Port Mixer out through both the Port (to render physical audio) as well as a TDL (to recapture the audio stream - and thus potentially determine actual frame rate and latency. I'm mimicing this using a separate Port Mixer that is listening to a selected recording port/line)

[Some more specific details: I've measured that there is a variable delay for a call to SDL.write() to return, even if the SDL buffer is tested to ensure it has room available to hold the audio being written, or is always starved (empty) or is as full as I can get it beforehand. I've also seen that (based on the amount the frame position (as reported by the SDL.getLongFramePosition() method) is incremented as a result of the SDL.write() call) different amounts of data (number of bytes/frames) are read from the SDL at a time. I was hoping that if I wrote one buffer of audio data at a time equal to say 20 milliSeconds of audio then each time the SDL.write() returned I could increment my frame position counter by an equal amount. That didn't work. I would like to find source to validate my assumptions based on observation. I +think+ that there are two buffers in play in the JS implementation: the first is the buffer created when the line is opened, and there is another that is part of the SDL/TDL Class implementation which is hidden from the caller and is populated from the line's buffer. This hidden buffer is then send to the audio hardware device, and the current position of the audio that is next to be sent from the hidden buffer to the audio device is what is reported by the getLongFramePosition() method.]

The other problem is that even if I were able to get an accurate measurement of the delay (latency) in the audio lines, if I then set system timers to cause action in sync with the audio frame rate based on audio sample rate (frame rate for uncompressed PCM audio) there is still an error path. Again from observation not direct knowledge, it seems that there is no direct link between the sound card clock (controlling real time frame rate and what should really be the master of all actions) and the system clock (which I would use as a surrogate for the actual sound card clock to synchronize my actions with the audio being heard). Even if I were to establish perfect synchronization at the start of a song, by the time 5 or 10 minutes of audio has been processed, I suspect there will be an accumulation of errors between the sound card clock and the system clock that becomes noticeable.

BTW, I'm not trying to get to exact, professional audio synchronization here (although I don't know what that is quantitatively). My experimentation has shown that a difference of somewhere between 20 and 50 milliSeconds between perceived audio and video is disturbing. I can see that there are applications in existence that acheive this level of sync, but I'm not sure how; I may just be pushing on the wrong architecture, or those applications may be built on native audio APIs.

I'm open to ideas, criticism or examples of how I might acheive my goal. While it might not be practical or the only way, the best I can describe what I'm looking for is to have an audio-event driven system where I can determine a frame position corresponding to the audio being rendered in real time and trigger events to keep other actions synchronized with that audio. If a constant delay (latency) was maintained and measureable, that would be OK.

This is a difficult problem.

Edited: added more detail
Phil Freihofner
ags wrote:
I think this breaks down into two related but separate issues:

First, the latency between writing audio (to a SDL) and when that audio is actually heard. If I could find a way to measure that, and it was consistent, I could adjust my processing and delivery of audio data to that SDL.
As I tried to explain in my previous post, there will ALWAYS be a variability in the time when a given frame is processed in an SDL and when it sounds. Biggest variability is between the time required for the onset of the first sound (played via bytecode interpretation) and onsets of subsequent sounds (if played via code loaded into memory). But JVM switching also accounts for variability.

Have you seen this article?
http://quod.lib.umich.edu/cgi/p/pod/dod-idx?c=icmc;idno=bbp2372.2007.131
"Real Time Low Latency Audio Processing in Java"

It does a good job of listing various factors that affect real-time audio latencies.


The other problem is that even if I were able to get an accurate measurement of the delay (latency) in the audio lines, if I then set system timers to cause action in sync with the audio frame rate based on audio sample rate (frame rate for uncompressed PCM audio) there is still an error path. Again from observation not direct knowledge, it seems that there is no direct link between the sound card clock (controlling real time frame rate and what should really be the master of all actions) and the system clock (which I would use as a surrogate for the actual sound card clock to synchronize my actions with the audio being heard). Even if I were to establish perfect synchronization at the start of a song, by the time 5 or 10 minutes of audio has been processed, I suspect there will be an accumulation of errors between the sound card clock and the system clock that becomes noticeable.
The sound card clock and the system clock are perfectly in synch. The problem is the granularity of the system clock. With Microsoft, the clock info is only updated once every 15 msec or so. As a consequence of this, System.currentTimeMillis() will only report the time at the last update. Also, Thread.sleep(millis) and setting a Timer are subject to this constraint. The error doesn't accumulate, though. It is just an issue of having a reduced "granularity" that creates a predictable error. (System.nanoTime will always be more accurate--it is not subject to letting the OS determine the update frequency.)

As I wrote, there is a hack that allows you to increase the granularity of the OS clock interrupt. It involves setting an independant, background thread to sleep for the maximum amount of time, e.g. Thread.sleep(Long.Max_Value). This somehow forces the OS to use a higher granularity, and with Microsoft the accuracy becomes about 1 msec (comparable to Linux and Mac systems). I learned this at the forum Java-Gaming.org.


>
I'm open to ideas, criticism or examples of how I might acheive my goal. While it might not be practical or the only way, the best I can describe what I'm looking for is to have an audio-event driven system where I can determine a frame position corresponding to the audio being rendered in real time and trigger events to keep other actions synchronized with that audio. If a constant delay (latency) was maintained and measureable, that would be OK.

This is a difficult problem.
Agreed about the difficulty!

Given all this, I think the questions are:

1) how much in advance do you know about given frames that you wish to use as cues?
2) how well can you guestimate an anchor point? (and, is there a way to improve upon the guess with feedback?)
3) how accurately can you trigger the visual cue?

Regardless of when Java decides to process a frame, it DOES manage to keep the playback steady. Thus, if you focus your efforts on finding an ANCHOR (for example, the nanoTime, or "close enough" approximation, when the first frame sounds) that you can use as a reference point for the duration of the sound, then you can use that point to calculate conversions between the frame count of the sound and the time it can be expected to play.

For example, if the frame is 44100 and you estimate frame 0 was at time 1336426670003 (System.currentTimeMillis), then set a Timer, in advance, to trigger the visual cue at 1336426671003 (I'm adding 1000 msec). The level of accuracy will remain the same regardless of the frame number and duration involved as long as you use the same anchor (and the sound playback is not interrupted).
845492
I'm not famiiliar with Microsoft native sound APIs, so I can't compare. I wonder if this would be easier to accomplish using native API and a JNI wrapper? I haven't used JNI either, but documentation is available and I could figure it out. Of course, that's the end of "write once, run anywhere" if I take that route (actually not a big issue for this project).

I did some searching but didn't find anything that seemed related to the context of your comment that "JVM switching also accounts for variability". Is there another term that is used to describe what you are referring to?

Finally, I'm still working on implementation, and I have little choice but to continue until I find some workable design (unless I want to abandon a year's effort so far). I was under the impression (as I stated prevoiusly) that the sound card clock and the system clock would have some drift. Not just the granularity issue that is discussed, but an actual drift since they are not rooted in the same fundamental hearbeat. If that is incorrect, then your idea of the anchor point is the way to go. I'll try some long-running tests to see if I can validate that there is no drift after 10 minutes of playback. That would be helpful to say the least (but I'll still be surprised to learn that the system clock contributes to the sound card's heartbeat/sample rate clock; I would have expected that it had its own clock onboard).
Phil Freihofner wrote:

Given all this, I think the questions are:

1) how much in advance do you know about given frames that you wish to use as cues?
2) how well can you guestimate an anchor point? (and, is there a way to improve upon the guess with feedback?)
3) how accurately can you trigger the visual cue?
1) I know to the exact frame where my cues are. The problem is that it is to be synced with the frame when it is rendered (heard) not when written to the line.
2) I have experimented with listening on a separate TDL for the rendered sound as feedback to determine delay. That also adds delay from the time I hear the frame and when I can "see" it as a non-zero bytestream on the TDL. I was wondering if there was some way to make the delay from SDL.write() to hearing the audio more repeatable, whatever the delay is. I've tried changing the size of the buffer on the SDL, and the size of the "chunks" of audio that I write to the SDL incrementall with SDL.write(). I have documented some of the results in the thread regarding [Undocumented behavior of SourceDataLine|https://forums.oracle.com/forums/thread.jspa?threadID=2382581&tstart=0]
3) Measurements show that I can reliably trigger the visual event with a 2 milliSecond accuracy generally, with 10 milliSecond accuracy almost always. The occaisional miscreant is acceptable.

Edit: added link to other thread
Phil Freihofner
I don't know much about Microsoft native sound API's either compared to anything else, either. The timing issue I was talking about refers to the Java's sleep or getMillis methods, on the Microsoft OS. Davison's "Killer Game Programming in Java" explains it better than I can.

Maybe I am garbling my terminology with the phrase "JVM switching". I meant to refer to the fact that one has no control over which thread the JVM decides to run at any given time, so the threads for real time actions and the threads for getting data ready to stream are running somewhat independently--getting ahead and behind of each other. And attempts to control the JVM in this sort of thing are usually indicative of a wrong-headed approach. (At least, it is for beginner/intermediate programmers like myself.)

I think clocks are pretty darn accurate these days. Its not like the 1970's or '80's using tape decks and servo-motors anymore. I'd be really surprised if two clocks measurably "drifted" even a handful of nanos in a mere 10 minutes. Mostly likely the drift is caused by something else, if it exists, don't you think?

Sorry I don't have anything more helpful. Did you check out the article I cited on "Real Time Audio Processing in Java"?

Maybe posting on StackOverflow with a Bounty will get you more responses. If you post elsewhere, I hope you come back and put a link here or an answer here. I am curious how you finally solve this.

Or try JavaRanch. I've had excellent luck there.
1 - 5
Locked Post
New comments cannot be posted to this locked post.

Post Details

Locked on Jun 27 2012
Added on Apr 28 2012
5 comments
4,717 views