Vous êtes sur la page 1sur 4

Automatic Time Stamp

Beat Detection Based on Max Peak Values for Rhythm Game

Muhammad. Iman
https://web.facebook.com/MumblingManGames
ptrusted@gmail.com
Surabaya, Indonesia

Abstract Finding time data of specific sound in a music file That time data and specified sound then inputted manually
may be hard and time consuming, though sound and timing are into the game. It may be easy when we work on a short music
the crucial part of rhythm games. This paper presents an idea of clip, but will be much harder when we using a longer music
finding the time stamp of simple beat consisted of kick, snare, that may consisted of hundreds of sounds to be processed. And
and hi-hat sound automatically. The purpose of this automatic
so the purpose of this paper is giving you a better solution with
time stamp is extracting the needed part of music like what sound
it is and when it's played in the timeline, and store it somewhere the use of Automatic Time Stamp that will processing your
that can be used later. There are three main processes involved, music and store the data into something you can use to make
those are attack detection, time stamping, and saving result. The the development of rhythm game faster and easier. There are
time stamps for each sound (kick, snare, and hi-hat) will be three main processes involved in this Automatic Time Stamp
represented as a list of float and then will be saved as an XML as showed in picture 2 below, those are attack detection, time
file in Unity Game Engine, and finally it can be implemented stamping, and saving result. All those processes are
later in rhythm game projects. In the end, it is known that the implemented using Unity Game Engine to make it efficient
result value has 0.015 to 0.043 seconds latency to the actual time while we develop a game in the same engine.
stamp value so it may need a little tweaking in the algorithm for
the next research.

Index TermsRhythm, game, beat, extracting, unity.

I. INTRODUCTION
Rhythm games is a type of game that challenges player's
sense of rhythm [1]. The rhythm can be interpreted as a
sequenced objects or object movements or any other visual
expressions that is matched well with the music played. Some
of the most popular rhythm games are Guitar Hero, Rock Band,
PaRappa the Rapper, Just Dance, and Symphonic Rain [2].
In order to match the game visual with the music rhythm,
we may need a time data corresponded to the specific sound
played. But when we do it manually based on the music file, it
will be time consuming and inefficient. As we can see in the
picture 1 below, it shows one of many ways to find the time
data (circle number 2) of the specific sound (circle number 1) Picture 2 Automatic time stamping process
in Audacity software manually.

II. UNITY GAME ENGINE


A big reason why we use Unity Technologies is that it
offers a platform for creating beautiful and engaging 2D, 3D,
VR, and AR games and apps. It has a powerful graphics engine
and full-featured editor enable us to realize our creative vision
fast, and deliver our content to virtually any media or device.
Unity Technologies serves millions of registered developers
including large publishers, indie studios, students and
hobbyists around the globe [3].
III. ATTACK DETECTION
The concept of attack detection is finding the max spectrum
data value of two peaks of a specific sound in the waveform
that will leads us to find when the sound starts playing. As you

Picture 1 Manual time stamping


can see in picture 3 below, it shows a waveform in audacity To find which category the kick, snare, and hi-hat sound is,
software. we need to play the music first then observe the visualization.
Picture 4 shows the sound of snare which is located between
2/64 to 6/64 of 1024 audio samples. Table 2 shows the codes to
find array group for different audio samples size. Variable
ArrayGroupCoefs[0] and ArraySize is user inputted values. If
you observed that the snare sound located in 2/64 to 6/64 in
1024 audio samples, then you should set the
ArrayGroupCoefs[0] value to 2, ArrayGroupCoefs[1] value to
6, and ArraySize to 1024.

Picture 3 Delta value of two different peak


Table 2 Array group calculation
1 // Array group that contains the considered frequency
2 // for kick, snare, and hi-hat.
The compared peaks showed in the picture 3 has a big 3
4 ArrayGroup = new Vector2[3];
volume data differences. We can see clearly that the 2nd peak 5 ArrayGroup[0] = new Vector2(
is greater than the 1st one and it's supposed to mean that the 6 0f,(ArrayGroupCoefs[0]/64f)*ArraySize);
sound start playing in the 2nd peak. To do this in Unity Game 7 ArrayGroup[1] = new Vector2(
Engine, first we need to get the spectrum data of the sound, and 8 ArrayGroup[0].y,(ArrayGroupCoefs[1]/64f)*ArraySize);
9 ArrayGroup[2] = new Vector2(
then we should extract the spectrum data to find max peak, and 10 ArrayGroup[1].y,(ArrayGroupCoefs[2]/64f)*ArraySize);
considering when the attack is detected. You can see the As we got array group for each sound, we can find the max
method in table 1 below. peak by comparing the current max spectrum data with the
previous one. First we find the max value of channel 0 and
Table 1 Attack detection method
channel 1 spectrum data, then it's compared with the previously
1 /// <summary> saved max peak value. If the current maxValue meets all
2 /// Attack detection process. condition, then it will be saved in MaxPeak. And all those
3 /// Started with getting spectrum data then extract it. processes are done under the for statement that's looped as
4 /// </summary>
5 private void AttackDetection () {
much as the length of array group it has. You can see the
6 GetSoundSpectrum(); algorithm in table 3.
7 ExtractSoundSpectrum();
8 } Table 3 FindMaxPeak function
9 private void GetSoundSpectrum () {
10 // Get spectrum data of channel 0. 1 /// <summary>
11 TheSound.GetSpectrumData ( 2 /// Finds the max peak of each sound in spectrum data.
12 SpectrumData1,0,FFTWindow.BlackmanHarris); 3 /// </summary>
13 // Get spectrum data of channel 1. 4 private void FindMaxPeak () {
14 if(TotalChannel>1) 5 for(int a=0; a<MaxPeak.Length; a++) {
15 TheSound.GetSpectrumData( 6 // Save previous max peaks.
16 SpectrumData2,1,FFTWindow.BlackmanHarris); 7 PrevMaxPeak[a] = MaxPeak[a];
17 } 8 // Reset max peak.
18 private void ExtractSoundSpectrum () { 9 MaxPeak[a] = 0f;
19 // Find max peak for each sound. 10 }
20 FindMaxPeak(); 11 float[] maxValue = new float[3];
21 // Classified the sound as kick, snare, or hi-hat. 12 // For all index in array,
22 Considering(); 13 // we should know which group is kick, snare, or hi-hat.
23 } 14 // Then we can find the max peak out of it.
15 for(int a=0; a<3; a++) {
A. Find Max Peak 16 for(int b=(int)ArrayGroup[a].x; b<(int)ArrayGroup[a].y; b++) {
17 maxValue[a] = Mathf.Max(
To find the max peak of each sound, first we need to 18 SpectrumData1[b],SpectrumData2[b]);
determine the array group of them. As you can see in the 19 if(maxValue[a] > MaxPeak[a])
picture 4 below, it shows an in-app screenshot of the spectrum 20 MaxPeak[a] = maxValue[a];
visualization. The red line shows the spectrum data of channel 21 }
22 }
0 and yellow shows spectrum data of channel 1. Since the 23 }
minimum audio samples we can get from unity's
AudioSource.GetSpectrumData() function is 64 [4], we should B. Considering
group it as eight categories. Once we got the max peak, we can consider does the max
peak for a specific sound meets the threshold or not. As you
can see in table 4 below, the max peak for 3 sounds is checked
under the for statement. Once it is greater than Threshold
value, it then be compared with the previous MaxPeak and
checked whether the distance between those two peaks is
greater than the MinDistance or not, and when it meets all of
Picture 4 Spectrum data visualization the conditions, we then can say that this is considered to be the
sound we looking for. Next we can increase the counter of that
specific sound, and time stamping the value.
At the end of the function we check does the music still 6 TextWriter writeResult = new StreamWriter(fileName);
playing or not, because the attack detection process is only 7
8 serializer.Serialize(writeResult,TimeStampResult);
working when the music is playing. When the music is not 9 writeResult.Close();
playing any more, we change the process state of 10
AttackDetectionState to false and SavingResultState to true, it 11 // Change the process state.
leads to the next process. 12 SavingResultState = false;
13 }

Table 4 Considering function VI. USER INTERFACE


1 /// <summary> In designing the user interface you may want to create a
2 /// Consider does the max peak meet the threshold or not.
3 /// When it does, check further does it greater than the previous max
simple and easy to understand one. This project has the
4 /// peak or not. interface that been made in a simplest way possible so it's not
5 /// </summary> taking an extra time to develop. As you can see in picture 5
6 below, it shows the unity editor view of the automatic time
7 private void Considering () {
8 for(int a=0; a<MaxPeak.Length; a++) {
stamp scene. The automatic time stamping script is attached to
9 if(MaxPeak[a] > Threshold[a]) { Main Camera object. As you can see in the inspector tab on the
10 if( (MaxPeak[a] > PrevMaxPeak[a]) && right, there are some public variables that you can change as
11 ((MaxPeak[a]-PrevMaxPeak[a]) > MinDistances[a])) { needed such as file name, array size, the thresholds, array
12 // Increase counter.
13 TotalAttackDetected[a] ++;
group coefficients, and minimum distances. You can see the
14 // Save the time stamp. automatic time stamp interface in the left side of window, it has
15 TimeStamping(a,TheSound.time); a black background with white colored text. When the program
16 } is played, the info text will be displayed, and the sound will
17 }
18 }
play along with the spectrum data that will be visualized in the
19 // Ends the attack detection process. center part of window. Info text will inform you the detected
20 if(!TheSound.isPlaying) { sound counter, current window resolution, sound samples
21 // Switch the states. frequency, and many more. And when all process is finished, a
22 AttackDetectionState = false;
23 SavingResultState = true;
finish image will be shown up, means that the result data been
24 } saved into an XML file with the specific name you inputted.
25 } Picture 6 below shows you the finish conditions.
IV. TIME STAMPING
Once the sound is detected in Considering function, the
process continues to the TimeStamping function. Time
stamping is basically a process to save the time value of
currently played sound right when it meets all the given
conditions into a list of float. C# List Class is used because it
provides methods to search, sort, and manipulate lists [5].
Table 5 below shows the TimeStamping function that takes
parameters of an index of detected sound (0 for kick, 1 for
snare, and 2 for hi-hat) and the value of time in seconds.
Picture 5 Unity automatic time stamp scene
Table 5 Time stamping function
1 /// <summary>
2 /// Time stamping process.
3 /// Here we just save the time when attack was detected into a list.
4 /// </summary>
5 private void TimeStamping (int index, float value) {
6 TimeStampResult[index].Add(value);
7 }

V. SAVING RESULT
When the music ends and attack detection process finished,
lists of float data are written into an XML file. It can be done
by using the .NET XML and SOAP Serialization. XML format
is chosen because it is an open standard and the XML stream Picture 6 Processes finished
can be processed by any application [6]. Table 6 below shows
how to save the lists of float into an XML file with a specific
file name. VII. CONCLUSION
To check how well the result of this automatic time stamp
Table 6 Saving result function is, the comparison between manual and automatic calculation
1 /// <summary>
should be done. It used a simple music file consist of 3 kick
2 /// Saving the result into an xml file. sounds, 2 snare sounds, and 5 hi-hat sounds. The latency is
3 /// </summary> calculated by subtracts the manual and automatic calculation.
4 private void SavingResult (string fileName) { As you can see in picture 7 below, the automatic time stamp
5 XmlSerializer serializer = new XmlSerializer(typeof(List<float>[]));
results 12 detected sounds. It means that there are 2 sounds that
had been time stamped twice. And it clearly seen in picture 7 REFERENCES
that the latency ranged from 0.015 to 0.043. For a practical use, [1] What is Rhythm Game,
those latency and double detected sound may be forgivable https://en.wikipedia.org/wiki/Rhythm_game, 2017.
since it is a quite small value. But the further research is [2] List of Rhythm Game, http://www.ranker.com/list/all-rhythm-
needed, to find a better yet more efficient method to time game-games-list/reference?ref=dshare&source=tshare, 2017.
stamping the music file, to make the development of rhythm [3] Unity Company Facts, https://unity3d.com/public-relations,
game easier. 2017.
[4] Unity Spectrum Data,
https://docs.unity3d.com/530/Documentation/ScriptReference/A
udioSource.GetSpectrumData.html, 2017.
[5] C# List Class, https://msdn.microsoft.com/en-
us/library/6sh2ey19(v=vs.110).aspx, 2017.
[6] .NET XML and SOAP Serialization,
https://docs.microsoft.com/en-
us/dotnet/framework/serialization/xml-and-soap-serialization,
2017.

Picture 7 Comparison of manual and automatic time stamping

Check this link to the files : https://github.com/ptrusted/Automatic-Time-stamp

Vous aimerez peut-être aussi