Skip to content

When you cannot sleep

It’s a shame I never wrote about Redial class since it really turned out to be my favorite this semester. Asterisk is so fun to hack around with although so far we’ve just been doing really simple stuff.

For the midterm, Li and I teamed up and decided to create an answering bot which you can trash-talk to when you really cannot sleep at night and don’t wanna bother your friends. Some apparent challenges are real-time level detection, smart (or vague) response behavior (to make sure it’s not too robotic), and the uncertainty that the monitor application might lock the access to the audio file that it’s writing into.

DIAL PLAN AND AGI

We started by wrapping up everything within a single php file keeping everything through AGI, and our extension dial plan looked as simple as:

1
2
3
4
5
6
7
8
9
10
11
12
exten => s,1,Answer()
exten => s,n,SetGlobalVar(lx243_caller=${CALLERID(num)})
exten => s,n,SetGlobalVar(lx243_timestamp=${STRFTIME(${EPOCH},GMT-5,%C%y%m%d%H%M%S)})
exten => s,n,System(rm /home/lx243/tmp/${CALLERID(num)}/*.*)
exten => s,n,System(rmdir /home/lx243/tmp/${CALLERID(num)}/)
exten => s,n,System(mkdir /home/lx243/tmp/${CALLERID(num)})
exten => s,n,System(chmod 777 /home/lx243/tmp/${CALLERID(num)})
exten => s,n,Monitor(wav,/home/lx243/tmp/${CALLERID(num)}/monitor)
exten => s,n,AGI(/home/lx243/asterisk_agi/li.php);

exten => h,1,System(mv /home/lx243/tmp/${CALLERID(num)}/monitor-in.wav /home/lx243/asterisk_sounds/moniter-in-${lx243_caller}-${lx243_timestamp}.wav)
exten => h,n,System(mv /home/lx243/tmp/${CALLERID(num)}/monitor-out.wav /home/lx243/asterisk_sounds/moniter-out-${lx243_caller}-${lx243_timestamp}.wav)

The dial plan itself didn’t do much except for initializing some of the global variables, clearing up some of the tmp folders, starting the monitor recording and eventually move the recorded file to another place.

We decided to use Monitor instead of using continuous Record because we realized we cannot record while there’s an audio file being played back. So far I haven’t find a good way to bypass this synchronous behavior in AGI. (Probably need to involve more complicated shell scripts? )

Ideally I would use a time stamp as part of the folder name in combination with the caller id, but I’m having two issues here.

  • I first tried to start the monitor within the php file using AGI, but with no luck. I did not spend too much time on that but for now it seems still doable by including the asterisk manager script and using the Monitor function. Will try next time.
  • Due to the failure above, I just thought I could initiate the Monitor outside the php script in the dial plan, and pass the path to the monitor recording file to php script for further processing. However this is not that straightforward. AGI() application in dialplan does not take more arguments than the php script file name. I guess it could be a workaround to call it by passing the param in a querystring and use $_GET in php to retrieve that. I cannot remember whether that worked for me or not but anyway I didn’t go that direction.

So, I didn’t want to implement another protocal between dial plan and php either, so I just defined the arbitary naming convention for the monitor recording to be stored under ~/tmp/[callerid].

CUT THE FILE

The second step turned out to be creating a loop in the php script, to constantly monitor and analyse the recorded audio as an user input every ten seconds to provide reasonble feedback. But we were not that far yet. We did some research to find possible ways to analyse only the short clip of the audio file that we’re interested (last 10 seconds). Several feasible solutions:

  • shntool, pretty easy to use and to embed in the php code, which is what we used for this project. We called shntool to cut the file on demand in each iteration by following command:

    shntool split -O always -f [cue file name] -d [output folder] [input file name]

    The cue file was generated automatically according to the total elapsed time and last interested time period, in following format:

    INDEX 01 00:10
    INDEX 02 00:20

    Detailed documentation of shntool.

  • sox, we just realized after we finished almost everything that sox can do this even simpler, by using trim effect as following:

    sox [source file name] [target file name] trim [SECOND TO START] [SECONDS DURATION]

    We haven’t tried this anyway, just throwing it here for documentation.

GET THE STAT

Another exciting feature in sox that we found was to do DFT on the sound data. The command is also simple:

sox [sound file name] -e stat -freq

Using the stat effect without the -freq option will give brief info which is actually good enough for basic level detection. Adding the -freq would print out the DFT data. One more thing is it’s printing to stderr by default, so if you want it be directed to a log file would need to call:

sox [sound file name] -e stat -freq -v 2> [log file name]

We were not sure about the DFT data we got, and solely looking at the numbers was not helpful enough for us to determine the thresholds for volumes. We would like to know the typical output of murmuring, normal talking and probably yelling, so we built a simple processing sketch to verify these outputs.

e59bbee78987-1

As shown in the screenshot, the processing sketch is polling the stat file created for each short chunk of the audio file. We failed to get a perfect realtime output reading in the processing since each file needs a little bit time to be trimmed and calculated. Also we were getting some weird problem which I cannot describe in English now that the AGI scripts would get halt if it’s trying to access a just-generated file too hasty. Anyway the shortest latency we got was a 5-second-late visualization of the telephone conversation.

MOODS

The processing sketch proved the stat info from sox was valid, and within acceptable range of delay, so we moved on to define different “mood” status for the answering bot. We put down the sound effects that we would like to create and broke them into over ten categories, including 3 different greeting mode, 3 levels of “interested” mode, 3 levels of “idle” mode and a frustrated mode which hangs up the phone — we called it a BOT HANGUP.

The actual recording of these sound effects didn’t take very long and was pretty fun to do. We ended up having 58 different responses according to different contexts. Li recorded almost all the sounds, hopefully he would enjoy talking to himself!

STATE MACHINE

The transfer between moods was implement in a simple state machine, by looking into current mood states and current (and some of the history) context, which contains the duration of the previous recorded audio piece, the average, minimum and maximum amplitude. There’s whole bunches of more information that are made available through the DFT however we didn’t have enough knowledge about sounds to push it further.

The basic idea is the harder the caller talks, the more interested the bot is supposed to be. If the caller stops talking, the bot falls into idle mode and would eventually go to frustrated mode and hangup. I guess we did pretty well in some of the transitions, but it did happen that the caller might be stuck in one of these states. Sometimes the splitted audio file just could not be updated and so was the stat file. We haven’t figured out why.

TESTS

We tried to trash talk to the bot, read poems and random news to “him”. He got mad pretty easily, or sometimes went retarded. We tweaked the numbers in the state machine, and we read more news to him. We stopped while he was not getting too excited about poems any more. You know these kind of stuff can take forever.

We asked some friends to call. Our friends on the floor were pretty cool and tolerated its stupidness. Some of the others didn’t feel comfortable with it and would not like to talk. One of my friend hated it and told me it’s the dumbest thing ever and I guess this was my fault.

CONCLUSION

I’m happy with what we did in this week and am excited that we made this framework work. The bot sounds pathetic sometimes, but probably I do want to call him while I cannot sleep. To make him smarter seems to be a pretty clear direction to go, we also got some really interesting ideas from Jorge that we might introduce some vague quotes that will fit any conversation, or let him talk about any sports or whatever suggestive topics that might help the conversation. Also we would like to build a web interface for people to retrieve their recordings the following day. The possibility of application is endless, but I will definitely apply this to one of my previous project, which is a dream sharing community website, to make submission via phone possible. I really love this class so far. Asterisk rocks!

If you would like to talk to the bot, or if you were so bored to read my blog at midnight, call:

1.212.796.0961 x 193

One Trackback/Pingback

  1. [...] Detailed documentation can be found here.   [...]

Post a Comment

Your email is never published nor shared. Required fields are marked *
*
*