程式師世界 >> 編程語言 >> .NET網頁編程 >> C# >> C#基礎知識 >> 語音合成與識別技術在C#裡面地運用

語音合成與識別技術在C#裡面地運用

編輯：C#基礎知識

　　在.net中,對英文語音有較好的支持，但是對中文語音的支持還沒有加入進來，我們要想實現中文發音或中文語音識別，必需先安裝微軟的Speech Application SDK（SASDK），它的最新版本是 SAPI 5.1 他能夠識別中、日、英三種語言，你可以在這裡下載：http://www.microsoft.com/speech/download/sdk51/,需要安裝這兩個文件Speech SDK 5.1和5.1 Language Pack，其中5.1 Language Pack可以選擇安裝支持的語言。

　　安裝好以後，我們就可以開始進行語音程序的開發了，當然，在這之前我們需要把SAPI.dll通過如下圖所示添加到引用中

　　下面我們設計一個能夠朗讀中英文混合語言的類：

　　我們將用單例模式實現該類，類的代碼如下，我們將詳細解釋：

public class Speach
{
　private static Speach _Instance = null ;
　private SpeechLib.SpVoiceClass voice =null;
　private Speach()
　{
　　BuildSpeach() ;
　}

public static Speach instance()
{
　if (_Instance == null)
　　_Instance = new Speach() ;
　　return _Instance ;
}

private void SetChinaVoice()
{
　voice.Voice = voice.GetVoices(string.Empty,string.Empty).Item(0) ;
}

private void SetEnglishVoice()
{
　voice.Voice = voice.GetVoices(string.Empty,string.Empty).Item(1) ;
}

private void SpeakChina(string strSpeak)
{
　SetChinaVoice() ;
　Speak(strSpeak) ;
}

private void SpeakEnglishi(string strSpeak)
{
　SetEnglishVoice() ;
　Speak(strSpeak) ;
}

public void AnalyseSpeak(string strSpeak)
{
　int iCbeg = 0 ;
　int iEbeg = 0 ;
　bool IsChina = true ;
　for(int i=0;i<strSpeak.Length;i++)
　{
　　char chr = strSpeak[i] ;
　　if (IsChina)
　　{
　　　if (chr<=122&&chr>=65)
　　　{
　　　　int iLen = i - iCbeg ;
　　　　string strValue = strSpeak.Substring(iCbeg,iLen) ;
　　　　SpeakChina(strValue) ;
　　　　iEbeg = i ;
　　　　IsChina = false ;
　　　}
　　}
　　else
　　{
　　　if (chr>122||chr<65)
　　　{
　　　　int iLen = i - iEbeg ;
　　　　string strValue = strSpeak.Substring(iEbeg,iLen) ;
　　　　this.SpeakEnglishi(strValue) ;
　　　　iCbeg = i ;
　　　　IsChina = true ;
　　　}
　　}
　}//end for
　if (IsChina)
　{
　　int iLen = strSpeak.Length - iCbeg ;
　　string strValue = strSpeak.Substring(iCbeg,iLen) ;
　　SpeakChina(strValue) ;
　}
　else
　{
　　int iLen = strSpeak.Length - iEbeg ;
　　string strValue = strSpeak.Substring(iEbeg,iLen) ;
　　SpeakEnglishi(strValue) ;
　}
}

private void BuildSpeach()
{
　if (voice == null)
　　voice = new SpVoiceClass() ;
}

public int Volume
{
　get
　{
　　return voice.Volume ;
　}
　set
　{
　　voice.SetVolume((ushort)(value)) ;
　}
}

public int Rate
{
　get
　{
　　return voice.Rate ;
　}
　set
　{
　　voice.SetRate(value) ;
　}
}

private void Speak(string strSpeack)
{
　try
　{
　　voice.Speak(strSpeack,SpeechVoiceSpeakFlags.SVSFlagsAsync) ;
　}
　catch(Exception err)
　{
　　throw(new Exception("發生一個錯誤："+err.Message)) ;
　}
}

public void Stop()
{
　voice.Speak(string.Empty,SpeechLib.SpeechVoiceSpeakFlags.SVSFPurgeBeforeSpeak) ;
}

public void Pause()
{
　voice.Pause() ;
}

public void Continue()
{
　voice.Resume() ;
}

}//end class

　　在 private SpeechLib.SpVoiceClass voice =null;這裡，我們定義個一個用來發音的類，並且在第一次調用該類時，對它用BuildSpeach方法進行了初始化。

　　我們還定義了兩個屬性Volume和Rate，能夠設置音量和語速。

　　我們知道，SpVoiceClass 有一個Speak方法，我們發音主要就是給他傳遞一個字符串，它負責讀出該字符串，如下所示。

private void Speak(string strSpeack)
{
　try
　{
　　voice.Speak(strSpeack,SpeechVoiceSpeakFlags.SVSFlagsAsync) ;
　}
　catch(Exception err)
　{
　　throw(new Exception("發生一個錯誤："+err.Message)) ;
　}
}

　　其中SpeechVoiceSpeakFlags.SVSFlagsAsync表示異步發音。

　　但是，這個方法本身並不知道你給的字符串是什麼語言，所以需要我們它這個字符串用什麼語言讀出。SpVoiceClass 類的Voice 屬性就是用來設置語種的，我們可以通過SpVoiceClass 的GetVoices方法得到所有的語種列表，然後在根據參數選擇相應的語種，比如設置語種為漢語如下所示：

private void SetChinaVoice()
{
　voice.Voice = voice.GetVoices(string.Empty,string.Empty).Item(0) ;
}

　　0表示是漢用，1234都表示英語，就是口音不同。

　　這樣，我們就設置了語種，如果結合發音方法，我們就可以設計出一個只發漢語語音的方法

private void SpeakChina(string strSpeak)
{
　SetChinaVoice() ;
　Speak(strSpeak) ;
}
　　只發英語語音的方法也是類似的，上面程序裡有。

　　對於一段中英文混合的語言，我們讓程序讀出混合語音的方法就是：編程把這段語言的中英文分開，對於中文調用SpeakChina方法，英文調用SpeakEnglishi方法；至於怎樣判斷一個字符是英文還是中文，我采用的是判斷asc碼的方法，具體的類方法是通過AnalyseSpeak實現的。

　　這樣，對於一段中英文混合文字，我們只需把它作為參數傳遞給AnalyseSpeak就可以了，他能夠完成中英文的混合發音。

　　當然，對於發音的暫定、繼續、停止等操作，上面也給出了簡單的方法調用，很容易明白。

　　下面簡單介紹一下中文語音識別的方法：

　　先把該語音識別的類源代碼貼在下面，然後再做說明：

public class SpRecognition
{
　private static SpRecognition _Instance = null ;
　private SpeechLib.ISpeechRecoGrammar isrg ;
　private SpeechLib.SpSharedRecoContextClass ssrContex =null;
　private System.Windows.Forms.Control cDisplay ;
　private SpRecognition()
　{
　　ssrContex = new SpSharedRecoContextClass() ;
　　isrg = ssrContex.CreateGrammar(1) ;
　　SpeechLib._ISpeechRecoContextEvents_RecognitionEventHandler recHandle = new _ISpeechRecoContextEvents_RecognitionEventHandler(ContexRecognition) ;
　　ssrContex.Recognition += recHandle ;
　}

　public void BeginRec(Control tbResult)
　{
　　isrg.DictationSetState(SpeechRuleState.SGDSActive) ;
　　cDisplay = tbResult ;
　}
　public static SpRecognition instance()
　{
　　if (_Instance == null)
　　　_Instance = new SpRecognition() ;
　　　return _Instance ;
　}

　public void CloseRec()
　{
　　isrg.DictationSetState(SpeechRuleState.SGDSInactive) ;
　}
　private void ContexRecognition(int iIndex,object obj,SpeechLib.SpeechRecognitionType type,SpeechLib.ISpeechRecoResult result)
　{
　　cDisplay.Text += result.PhraseInfo.GetText(0,-1,true) ;
　}
}

　　我們定義了ssrContex 和isrg為語音識別的上下文和語法，通過設置isrg的DictationSetState方法，我們可以開始或結束識別，在上面的程序中是BeginRec和CloseRec方法。cDisplay 是我們用來輸出識別結果的地方，為了能夠在大部分控件上都可以顯示結果，我用了一個Control 類來定義它。當然，每次語音識別後都會觸發ISpeechRecoContextEvents_RecognitionEventHandler 事件，我們定義了一個這樣的方法ContexRecognition來響應事件，並且在這個方法裡輸出識別結果。

　　這樣，中文語音處理的一些最基本的問題就有了一個簡單的解決方法，當然，這種方法還有很多不完善的地方，希望大家多提出批評意見，共同提高。