Freeswitch: mod_dptools: detect_speech

Реализация распознавания речи.

Использование

 detect_speech <mod_name> <gram_name> <gram_path> [<addr>]
 detect_speech grammar <gram_name> [<path>]
 detect_speech grammaron <gram_name>
 detect_speech grammaroff <gram_name>
 detect_speech grammarsalloff
 detect_speech nogrammar <gram_name>
 detect_speech param <name> <value>
 detect_speech pause
 detect_speech resume
 detect_speech start_input_timers
 detect_speech stop
 

Примеры

Нажмите, чтобы отобразить

Нажмите, чтобы скрыть

-- Не расзпознано 1
local sounds_dir = "/usr/local/freeswitch/sounds/ivr3/";
local ivr_dir = "/usr/local/freeswitch/scripts/ivr3/";
 
dofile("/usr/local/freeswitch/scripts/detect.lua");
 
trans = {
   ["NOT_RECOG_1"] = ivr_dir.."divr_notrecog_1.lua",
   ["NOT_RECOG_2"] = ivr_dir.."divr_notrecog_2.lua",
   --no
   ["нет"] = ivr_dir .. "divr_no_definitely_1.lua",
   --not recog
   ["нетда"] = ivr_dir.."divr_notrecog_1.lua",
   --yes
   ["да"] = ivr_dir.."divr_yes_info.lua",
}
 
local message1 = sounds_dir.."notrecog_1.wav";
-- Создадим пустую таблицу для записи результатов.
results = {};
 
session:setInputCallback("onInput");
session:sleep(200);
-- Воспроизведем приветственное сообщение.   
session:streamFile(message1);
-- Активируем распознавание и укажем grammar.
session:execute("detect_speech", "pocketsphinx yesno yesno");
 
while (session:ready() == true) do 
   session:sleep(3000);
   session:sleep(3000);
   if ( results.text ~= nil ) then
      session:execute("detect_speech", "stop");
      freeswitch.consoleLog("info",dump(results));
      --results.text, results.score
      score = tonumber(results.score);
      ftext = results.text:gsub("%s+", "");
      if (score > 0) then
         results = {};
         if (tableHasKey(trans,ftext) ~= false) then
            session:execute("lua", trans[ftext]);
         else
            session:execute("lua", trans["NOT_RECOG_1"]);
         end
      else
         results = {};
         session:execute("lua", trans["NOT_RECOG_1"]);
      end
   else
      results = {};
      session:execute("detect_speech", "resume");
      session:execute("lua", trans["NOT_RECOG_1"]);
   end
end

Старт распознавания и назначение grammar в одном событии:

 SendMsg e2d1c628-f32c-4497-b813-7474ce406317
 call-command: execute
 execute-app-name: detect_speech
 execute-app-arg:pocketsphinx yesno yesno
 

You should see DETECTED_SPEECH events with «Speech-Type: begin-speaking» when the recognizer notices the start of speech. For example: (using «plain» events)

Нажмите, чтобы отобразить

Нажмите, чтобы скрыть

 Content-Length: 1605
 Content-Type: text/event-plain
  
 Event-Name: DETECTED_SPEECH
 Core-UUID: 6213bbdd-5801-4aeb-b1db-b94a47b0188d
 FreeSWITCH-Hostname: vm1
 FreeSWITCH-IPv4: 192.168.1.241
 FreeSWITCH-IPv6: %3A%3A1
 Event-Date-Local: 2010-03-09%2010%3A39%3A48
 Event-Date-GMT: Tue,%2009%20Mar%202010%2015%3A39%3A48%20GMT
 Event-Date-Timestamp: 1268149188380725
 Event-Calling-File: switch_ivr_async.c
 Event-Calling-Function: speech_thread
 Event-Calling-Line-Number: 2430
 Speech-Type: begin-speaking
 Channel-State: CS_EXECUTE
 Channel-State-Number: 4
 Channel-Name: sofia/internal/sip%3A1000%40192.168.1.104
 Unique-ID: e2d1c628-f32c-4497-b813-7474ce406317
 Call-Direction: outbound
 Presence-Call-Direction: outbound
 Channel-Presence-ID: 1000%40192.168.1.241
 Answer-State: answered
 Channel-Read-Codec-Name: PCMU
 Channel-Read-Codec-Rate: 8000
 Channel-Write-Codec-Name: PCMU
 Channel-Write-Codec-Rate: 8000
 Caller-Username: 1001
 Caller-Dialplan: inline
 Caller-Caller-ID-Name: Extension%201001
 Caller-Caller-ID-Number: 1001
 Caller-Network-Addr: 192.168.1.104
 Caller-ANI: 1001
 Caller-Destination-Number: 1000
 Caller-Unique-ID: e2d1c628-f32c-4497-b813-7474ce406317
 Caller-Source: mod_sofia
 Caller-Context: default
 Caller-Channel-Name: sofia/internal/sip%3A1000%40192.168.1.104
 Caller-Profile-Index: 2
 Caller-Profile-Created-Time: 1268149185069331
 Caller-Channel-Created-Time: 1268149168974894
 Caller-Channel-Answered-Time: 1268149169744923
 Caller-Channel-Progress-Time: 1268149169164940
 Caller-Channel-Progress-Media-Time: 0
 Caller-Channel-Hangup-Time: 0
 Caller-Channel-Transfer-Time: 0
 Caller-Screen-Bit: true
 Caller-Privacy-Hide-Name: false
 Caller-Privacy-Hide-Number: false

If recognition is successful, you should also see a DETECTED_SPEECH event with «Speech-Type: detected-speech» and some XML describing what was detected. For example:

Нажмите, чтобы отобразить

Нажмите, чтобы скрыть

 Content-Length: 1791
 Content-Type: text/event-plain
  
 Event-Name: DETECTED_SPEECH
 Core-UUID: 6213bbdd-5801-4aeb-b1db-b94a47b0188d
 FreeSWITCH-Hostname: vm1
 FreeSWITCH-IPv4: 192.168.1.241
 FreeSWITCH-IPv6: %3A%3A1
 Event-Date-Local: 2010-03-09%2010%3A39%3A49
 Event-Date-GMT: Tue,%2009%20Mar%202010%2015%3A39%3A49%20GMT
 Event-Date-Timestamp: 1268149189731224
 Event-Calling-File: switch_ivr_async.c
 Event-Calling-Function: speech_thread
 Event-Calling-Line-Number: 2430
 Speech-Type: detected-speech
 Channel-State: CS_EXECUTE
 Channel-State-Number: 4
 Channel-Name: sofia/internal/sip%3A1000%40192.168.1.104
 Unique-ID: e2d1c628-f32c-4497-b813-7474ce406317
 Call-Direction: outbound
 Presence-Call-Direction: outbound
 Channel-Presence-ID: 1000%40192.168.1.241
 Answer-State: answered
 Channel-Read-Codec-Name: PCMU
 Channel-Read-Codec-Rate: 8000
 Channel-Write-Codec-Name: PCMU
 Channel-Write-Codec-Rate: 8000
 Caller-Username: 1001
 Caller-Dialplan: inline
 Caller-Caller-ID-Name: Extension%201001
 Caller-Caller-ID-Number: 1001
 Caller-Network-Addr: 192.168.1.104
 Caller-ANI: 1001
 Caller-Destination-Number: 1000
 Caller-Unique-ID: e2d1c628-f32c-4497-b813-7474ce406317
 Caller-Source: mod_sofia
 Caller-Context: default
 Caller-Channel-Name: sofia/internal/sip%3A1000%40192.168.1.104
 Caller-Profile-Index: 2
 Caller-Profile-Created-Time: 1268149185069331
 Caller-Channel-Created-Time: 1268149168974894
 Caller-Channel-Answered-Time: 1268149169744923
 Caller-Channel-Progress-Time: 1268149169164940
 Caller-Channel-Progress-Media-Time: 0
 Caller-Channel-Hangup-Time: 0
 Caller-Channel-Transfer-Time: 0
 Caller-Screen-Bit: true
 Caller-Privacy-Hide-Name: false
 Caller-Privacy-Hide-Number: false
 Content-Length: 165
  
 <?xml version="1.0"?>
 <result grammar="holdr">
 <interpretation grammar="yesno" confidence="98">
 <input mode="speech">YES</input>
 </interpretation>
 </result>
Note: The XML body at the end there with our result has a Content-Length of 165. That is included as part of the overall count of 1791 at the beginning.

It is common to play prompts while detecting speech. Making a change like this to the media will pause the recognizer. For example, if you start to play a file:

 SendMsg ad375c14-ba41-46c8-b800-4aa2ef295bba
 call-command: execute
 execute-app-name: playback
 execute-app-arg: say-yes-or-no.wav

you should immediately resume the recognizer:

 SendMsg e2d1c628-f32c-4497-b813-7474ce406317
 call-command: execute
 execute-app-name: detect_speech
 execute-app-arg: resume

Recognition will happen while the file is playing. You will need to have divert_event on to receive the ASR events while the file is being played.

Each start of the recognizer detects only one phrase so if you want a somewhat continuous recognition, you will need to resume the recognizer after each successful recognition as well.

When you are done, you'll want to stop the recognizer to save precious CPU cycles:

 SendMsg e2d1c628-f32c-4497-b813-7474ce406317
 call-command: execute
 execute-app-name: detect_speech
 execute-app-arg: stop

См. также

 

  • freeswitch/mod/mod_dptools/detect_speech.txt
  • Последние изменения: 2022/01/11