About

Implements speech recognition.

Usage

\\ detect_speech <mod_name> <gram_name> <gram_path> [<addr>]\\ detect_speech grammar <gram_name> [<path>]\\ detect_speech grammaron <gram_name>\\ detect_speech grammaroff <gram_name>\\ detect_speech grammarsalloff\\ detect_speech nogrammar <gram_name>\\ detect_speech param <name> <value>\\ detect_speech pause\\ detect_speech resume\\ detect_speech start_input_timers\\ detect_speech stop\\ 

Examples

Start the recognizer and select a grammar in one shot:

\\ SendMsg e2d1c628-f32c-4497-b813-7474ce406317\\ call-command: execute\\ execute-app-name: detect_speech\\ execute-app-arg:pocketsphinx yesno yesno\\ 

You should see DETECTED_SPEECH events with «Speech-Type: begin-speaking» when the recognizer notices the start of speech. For example: (using «plain» events)

\\ Content-Length: 1605\\ Content-Type: text/event-plain\\  \\ Event-Name: DETECTED_SPEECH\\ Core-UUID: 6213bbdd-5801-4aeb-b1db-b94a47b0188d\\ FreeSWITCH-Hostname: vm1\\ FreeSWITCH-IPv4: 192.168.1.241\\ FreeSWITCH-IPv6: %3A%3A1\\ Event-Date-Local: 2010-03-09%2010%3A39%3A48\\ Event-Date-GMT: Tue,%2009%20Mar%202010%2015%3A39%3A48%20GMT\\ Event-Date-Timestamp: 1268149188380725\\ Event-Calling-File: switch_ivr_async.c\\ Event-Calling-Function: speech_thread\\ Event-Calling-Line-Number: 2430\\ Speech-Type: begin-speaking\\ Channel-State: CS_EXECUTE\\ Channel-State-Number: 4\\ Channel-Name: sofia/internal/sip%3A1000%40192.168.1.104\\ Unique-ID: e2d1c628-f32c-4497-b813-7474ce406317\\ Call-Direction: outbound\\ Presence-Call-Direction: outbound\\ Channel-Presence-ID: 1000%40192.168.1.241\\ Answer-State: answered\\ Channel-Read-Codec-Name: PCMU\\ Channel-Read-Codec-Rate: 8000\\ Channel-Write-Codec-Name: PCMU\\ Channel-Write-Codec-Rate: 8000\\ Caller-Username: 1001\\ Caller-Dialplan: inline\\ Caller-Caller-ID-Name: Extension%201001\\ Caller-Caller-ID-Number: 1001\\ Caller-Network-Addr: 192.168.1.104\\ Caller-ANI: 1001\\ Caller-Destination-Number: 1000\\ Caller-Unique-ID: e2d1c628-f32c-4497-b813-7474ce406317\\ Caller-Source: mod_sofia\\ Caller-Context: default\\ Caller-Channel-Name: sofia/internal/sip%3A1000%40192.168.1.104\\ Caller-Profile-Index: 2\\ Caller-Profile-Created-Time: 1268149185069331\\ Caller-Channel-Created-Time: 1268149168974894\\ Caller-Channel-Answered-Time: 1268149169744923\\ Caller-Channel-Progress-Time: 1268149169164940\\ Caller-Channel-Progress-Media-Time: 0\\ Caller-Channel-Hangup-Time: 0\\ Caller-Channel-Transfer-Time: 0\\ Caller-Screen-Bit: true\\ Caller-Privacy-Hide-Name: false\\ Caller-Privacy-Hide-Number: false\\ 

If recognition is successful, you should also see a DETECTED_SPEECH event with «Speech-Type: detected-speech» and some XML describing what was detected. For example:

\\ Content-Length: 1791\\ Content-Type: text/event-plain\\  \\ Event-Name: DETECTED_SPEECH\\ Core-UUID: 6213bbdd-5801-4aeb-b1db-b94a47b0188d\\ FreeSWITCH-Hostname: vm1\\ FreeSWITCH-IPv4: 192.168.1.241\\ FreeSWITCH-IPv6: %3A%3A1\\ Event-Date-Local: 2010-03-09%2010%3A39%3A49\\ Event-Date-GMT: Tue,%2009%20Mar%202010%2015%3A39%3A49%20GMT\\ Event-Date-Timestamp: 1268149189731224\\ Event-Calling-File: switch_ivr_async.c\\ Event-Calling-Function: speech_thread\\ Event-Calling-Line-Number: 2430\\ Speech-Type: detected-speech\\ Channel-State: CS_EXECUTE\\ Channel-State-Number: 4\\ Channel-Name: sofia/internal/sip%3A1000%40192.168.1.104\\ Unique-ID: e2d1c628-f32c-4497-b813-7474ce406317\\ Call-Direction: outbound\\ Presence-Call-Direction: outbound\\ Channel-Presence-ID: 1000%40192.168.1.241\\ Answer-State: answered\\ Channel-Read-Codec-Name: PCMU\\ Channel-Read-Codec-Rate: 8000\\ Channel-Write-Codec-Name: PCMU\\ Channel-Write-Codec-Rate: 8000\\ Caller-Username: 1001\\ Caller-Dialplan: inline\\ Caller-Caller-ID-Name: Extension%201001\\ Caller-Caller-ID-Number: 1001\\ Caller-Network-Addr: 192.168.1.104\\ Caller-ANI: 1001\\ Caller-Destination-Number: 1000\\ Caller-Unique-ID: e2d1c628-f32c-4497-b813-7474ce406317\\ Caller-Source: mod_sofia\\ Caller-Context: default\\ Caller-Channel-Name: sofia/internal/sip%3A1000%40192.168.1.104\\ Caller-Profile-Index: 2\\ Caller-Profile-Created-Time: 1268149185069331\\ Caller-Channel-Created-Time: 1268149168974894\\ Caller-Channel-Answered-Time: 1268149169744923\\ Caller-Channel-Progress-Time: 1268149169164940\\ Caller-Channel-Progress-Media-Time: 0\\ Caller-Channel-Hangup-Time: 0\\ Caller-Channel-Transfer-Time: 0\\ Caller-Screen-Bit: true\\ Caller-Privacy-Hide-Name: false\\ Caller-Privacy-Hide-Number: false\\ Content-Length: 165\\  \\ <?xml version="1.0"?>\\ <result grammar="holdr">\\ <interpretation grammar="yesno" confidence="98">\\ <input mode="speech">YES</input>\\ </interpretation>\\ </result>\\ 
Note: The XML body at the end there with our result has a Content-Length of 165. That is included as part of the overall count of 1791 at the beginning.

It is common to play prompts while detecting speech. Making a change like this to the media will pause the recognizer. For example, if you start to play a file:

\\ SendMsg ad375c14-ba41-46c8-b800-4aa2ef295bba\\ call-command: execute\\ execute-app-name: playback\\ execute-app-arg: say-yes-or-no.wav\\ 

you should immediately resume the recognizer:

\\ SendMsg e2d1c628-f32c-4497-b813-7474ce406317\\ call-command: execute\\ execute-app-name: detect_speech\\ execute-app-arg: resume\\ 

Recognition will happen while the file is playing. You will need to have divert_events on to receive the ASR events while the file is being played.

Each start of the recognizer detects only one phrase so if you want a somewhat continuous recognition, you will need to resume the recognizer after each successful recognition as well.

When you are done, you'll want to stop the recognizer to save precious CPU cycles:

\\ SendMsg e2d1c628-f32c-4497-b813-7474ce406317\\ call-command: execute\\ execute-app-name: detect_speech\\ execute-app-arg: stop\\ 

See Also

 

  • freeswitch/mod/mod_dptools/detect_speech.txt
  • Последние изменения: 2020/04/12