26.8 Voice Recognition System

UP


26.8.1 Voice Recognition System Definition

The Voice Recognition System is a peripheral device for the N64 which makes it possible for words spoken by the user to be recognized during an N64 game. To use the system, insert the plug on the Voice Recognition Unit into the N64 controller port, thereby connecting the special microphone on the Voice Recognition Unit. This makes it possible to make the characters in the game move and respond by voice in addition to the conventional controller-only interface. This allows the game to proceed with a sense of "realism", for instance, allowing the player to give verbal commands which put secondary characters into action, while moving the main character with the controller.

UP


26.8.2 Features of the Voice Recognition System

The main features of the Voice Recognition System are shown below.

Item

Function

Voice recognition format

Semisyllabic voice recognition system *1

Language recognized

Japanese words (delineates one word by a 0.4 second silence after pronunciation is completed)

Speakers recognized

Any speaker (speaker needs no specific prior training)

Maximum registered words

Maximum 255 words (about 80 words at 5 syllables per word) *2

Characters per word

Maximum 17 characters

Text registration method

Enter using shift-JIS code

Maximum pronunciation length

About 10 seconds (any sound in excess of the maximum pronunciation length is processed as noise)

Recognition result output method

Words closest to voice input are output ranked 1st ~ 5th

*1 Words are comprised of syllables. Syllables divided into two parts by the center of the vowels are called "semisyllables".
*2 Use osVoiceCountSyllables() function, described later, to obtain the number of semisyllables in a word.

UP


26.8.3 Voice Recognition System Configuration

The Voice Recognition System configuration is shown below. The system is used by inserting the plug on the Voice Recognition Unit into the N64 controller port, thereby connecting the special microphone on the Voice Recognition Unit. Since power is supplied by the N64, batteries are not needed.

UP


26.8.4 Voice Recognition Structure

UP


26.8.4.1 Voice Recognition Structure

Words to be recognized by the Voice Recognition System are placed in a registered word dictionary. Words can be freely registered in the registered word dictionary from the program side. The words which are determined to match closest to the voice input, are output from among these registered words. The Voice Recognition structure is shown in the figure below. A description of each step follows.

(1) Registration of word data to dictionary

Input the words that are to be recognized using the SJIS code. The words which have been input are converted to the format necessary for voice recognition processing and registered in the dictionary.

(2) Voice input

The user's voice is input via the special microphone connected to the Voice Recognition System. The input voice is then converted into the format necessary for voice recognition processing.

(3) Comparison between input voice pattern and registered words

The voice input is compared with the patterns of the words registered in the dictionary and a distance value (a numeric value expressing how different the voice input is from the word to which it is being compared) is computed.

(4) Output of similar word ranking

The words from among those words registered in the dictionary with the smallest distance values are output ranked in order from 1st ~ 5th place.

UP


26.8.4.2 Status When Voice Recognition is Running

Changes in status while voice recognition is running are explained below. There are 5 command statuses during voice recognition execution.

The processing flow is shown in the figure below. A description of each step follows.

When the status moves from VOICE_STATUS_END to VOICE_STATUS_READY or is VOICE_STATUS_END, the Get Recognition Results command can be executed. If the Get Recognition Results command is executed while the status is VOICE_STATUS_END, the status will switch to VOICE_STATUS_READY after completion of the Get Recognition Results command. Once the status has switched to VOICE_STATUS_READY, the next Start Recognition command can be executed.

The variable which indicates the current status is stored in the voice recognition system control structure. Please see Section 26.8.6.1 "Initialize Voice Recognition System" for details.

UP


26.8.5 Assembling the Voice Recognition System Program

Following is a simple example of the flow of a program for performing voice recognition.

First, initialize the Voice Recognition System. Next, initialize
the registered word dictionary and register the words to be recognized. Once word registration is completed, the program moves to voice recognition processing. By starting voice recognition, voice input from the microphone can be acquired as words. Execute the Get Voice Recognition Results function to acquire a word.
The library functions for the Voice Recognition System which perform processing at each step of the flow are explained in Section 8.6. Detailed programming procedures, including error branching, etc., are explained in Section 8.7.

UP


26.8.6 Voice Recognition System Function Specifications

The library functions used when the Voice Recognition System is handled by an N64 program are explained below. There are a total of 10 Voice Recognition System-related functions.

UP


26.8.6.1 Initialize Voice Recognition System

Function

osVoiceInit

Initialize Voice Recognition System control structure and hardware

Syntax

#include <ultra64.h>
s32 osVoiceInit(OSMesgQueue *siMessageQ, OSVoiceHandle *hd, int channel);

Arguments

siMessageQ
Initialized message queue associated with OS_EVENT_SIOS_EVENT_SI
hd
Voice Recognition System control structure
channel
Controller channel number

Description

The osVoiceInit() function initializes the Voice Recognition System. It initializes both the hardware and the Voice Recognition System control structure. Consequently, there is no need to initialize the hd structure on the application side. Call this function first when using the Voice Recognition System.

It is recommended that you check to see which device is connected to a particular port prior to initialization. Standard controllers and peripheral devices other than the Voice Recognition System may be inserted into the controller ports as well. This check can be accomplished with the osContStartQuery() function and the osContGetQuery() function. The Voice Recognition System is connected if the value of the member variable "errno" of the OSContStatus structure is 0 (zero), and if the AND (logical product) of the value for type and CONT_TYPE_MASK is CONT_TYPE_VOICE.

siMessageQ is the message queue initialized in connection with OS_EVENT_SI. Please refer to the osSetEventMesg() function regarding how to establish this connection. channel is the channel number of the controller port to which the Voice Recognition Unit is connected. It is a value 0~3.

The Voice Recognition System control structure OSVoiceHandle is configured as follows:

typedef struct {
        OSMesgQueue *__mq;      /* SI message queue */
        int  __channel;     /* Controller port No. */
        s32 __mode;             /* Used within the OS */
        u8 cmd_status;          /* Command status */
} OSVoiceHandle;

Do not change the values of these various members in the application. In addition, the only member variable which is referred to and which has any meaning is cmd_status. The member variables other than cmd_status are used by the system and therefore do not need to be referred to by the application.

The member variable cmd_status indicates the voice recognition command status. When the voice recognition command status is checked within the voice recognition library, that value is stored in cmd_status. Specifically,the following function calls update the values.

osVoiceInit()
osVoiceClearDictionary()
osVoiceSetWord()
osVoiceMaskDictionary()
osVoiceStartReadData()
osVoiceStopReadData()
osVoiceGetReadData()

The following values can be handled by cmd_status. Please see Section 26.8.4.2 "Status When Voice Recognition is Running" for details on each status.

Definition Name

Value

Description

VOICE_STATUS_READY

0

Stop/End

VOICE_STATUS_START

1

Voice Undetected (no voice input)

VOICE_STATUS_CANCEL

3

Cancel (cancel extraneous noise)

VOICE_STATUS_BUSY

5

Detected/Detecting (voice being input, recognition processing under way)

VOICE_STATUS_END

7

End recognition processing (enable execution of Get Recognition Results command)

The returned value is an error code. A 0 (zero) is returned when processing ends normally. If an error occurs, this function has the following error codes.

CONT_ERR_NO_CONTROLLER

Nothing is connected to the controller port.

CONT_ERR_DEVICE

Something other than the Voice Recognition System is connected to the controller port.

CONT_ERR_VOICE_NO_RESPONSE

There was no response from the Voice Recognition System. There may be a problem with the hardware.

CONT_ERR_CONTRFAIL

There was a data transmission failure. There is a problem in the Voice Recognition System connection.

CONT_ERR_INVALID

There is an error in the function call method or in the argument. This error will not occur if the function is being used correctly. Write your program so that this error does not occur when development is completed.

UP


26.8.6.2 Initialize Registered Word Dictionary

Function

osVoiceClearDictionary

Initialize Registered Word Dictionary

Syntax

#include <ultra64.h>
s32 osVoiceClearDictionary(OSVoiceHandle *hd, u8 words);

Arguments

hd
Voice Recognition System control structure
words
Number of words registered

Description

The osVoiceClearDictionary() function initializes the registered word dictionary for the Voice Recognition System. The dictionary is initialized so that the specified number of words can be registered in the dictionary. Words cannot be registered with the osVoiceSetWord before the dictionary is initialized with the osVoiceClearDictionary() function.

hd is the Voice Recognition System control structure. The Voice Recognition System must be initialized with the osVoiceInit() function before the osVoiceClearDictionary() function is called. The number of words to be registered is specified in words. 1~255 words can be registered in the dictionary.

The returned value is an error code. A 0 (zero) is returned when processing ends normally. If an error occurs, this function has the following error codes.

CONT_ERR_NO_CONTROLLER

Nothing is connected to the controller port.

CONT_ERR_DEVICE

Something other than the Voice Recognition System is connected to the controller port.

CONT_ERR_VOICE_NO_RESPONSE

There was no response from the Voice Recognition System. There may be a problem with the hardware.

CONT_ERR_CONTRFAIL

There was a data transmission failure. There is a problem in the Voice Recognition System connection.

CONT_ERR_INVALID

There is an error in the function call method or in the argument. This error will not occur if the function is being used correctly. Write your program so that this error does not occur when development is completed.

UP


26.8.6.3 Register Words into Dictionary

Function

osVoiceSetWord

Register Words into Voice Recognition System Dictionary

Syntax

#include <ultra64.h>
s32 osVoiceSetWord(OSVoiceHandle *hd, u8 *word);

Arguments

hd
Voice Recognition System control structure
word
Word to be registered

Description

The osVoiceSetWord() function is for registering words in the Voice Recognition System dictionary. hd is the Voice Recognition System control structure. The Voice Recognition System must be initialized with the osVoiceInit() function before the osVoiceSetWord() function is called.

The word (SJIS) to be registered is specified in word. The word can be up to 17 characters long. Since calling the osVoiceSetWord() function once registers one word, execute osVoiceSetWord() multiple times to register multiple words. The number of words registered must match the number set by the osVoiceClearDictionary() function. Please note that an error will be generated when the osVoiceStartReadData() function is executed, if the number of words registered is greater than or less than the specified number of words.

The maximum number of words which can be registered in the dictionary is about 80 words, assuming 5 syllables per word. Therefore, while the maximum number of words which can be registered is set at 255, if there are several syllables per word, the dictionary may subsequently overflow the memory. In this case, voice recognition can be executed without an error being caused by the osVoiceStartReadData() function even if the number of registered words is less than the number set by the osVoiceClearDictionary() function.

The characters which can be registered and their codes are shown in the table below.


In addition, the following restrictions apply to character combinations when registering words. Use the osVoiceCheckWord() function to check whether or not the word that you are trying to register can be registered in the Voice Recognition System. Use this in the case of game applications in which registered words will be input during debugging or by the game player.


Usage Character
No limitation on use
Can be used only after specified characters
(Combinable characters)
Cannot be used at the beginning of a word
Cannot be used at the end of a word
Cannot be used in front of "-"
Cannot be used after small "tsu"
Combinations which cannot be used

The returned value is an error code. A 0 (zero) is returned when processing ends normally. If an error occurs, this function has the following error codes.

CONT_ERR_NO_CONTROLLER

Nothing is connected to the controller port.

CONT_ERR_DEVICE

Something other than the Voice Recognition System is connected to the controller port.

CONT_ERR_VOICE_NO_RESPONSE

There was no response from the Voice Recognition System. There may be a problem with the hardware.

CONT_ERR_CONTRFAIL

There was a data transmission failure. There is a problem in the Voice Recognition System connection.

CONT_ERR_INVALID

There is an error in the function call method or in the argument. This error will not occur if the function is being used correctly. Write your program so that this error does not occur when development is completed.

This error will also occur when you attempt to register more words than specified with the osVoiceClearDictionary() function.

CONT_ERR_VOICE_WORD

A word containing improper characters has been registered. The set word is invalidated and the word number is not incremented. Execute the osVoiceSetWord() function to register a proper word.

CONT_ERR_VOICE_MEMORY

This indicates a dictionary memory overflow. However, if the recognition command is executed in this condition, normal recognition processing can be performed even if the number of words which have been set is less than the number of words set by the osVoiceClearDictionary() function. When this error is generated, control the number of words actually set on the application side.

UP


26.8.6.4 Check Registerable Words

Function

osVoiceCheckWord

Check to see if the target word can be registered in the dictionary

Syntax

#include <ultra64.h>
s32 osVoiceCheckWord(u8 *word);

Arguments

word
Word to be registered

Description

The osVoiceCheckWord() function is for checking whether or not a specified word can be registered in the Voice Recognition System. Use this when the words to be registered will be input during debugging or by the game player.

word specifies the word (SJIS) to be registered. An error will be returned if a word is specified which contains a character combination which does not satisfy the conditions listed in the table below.


Usage Character
No limitation on use
Can be used only after specified characters
(Combinable characters)
Cannot be used at the beginning of a word
Cannot be used at the end of a word
Cannot be used in front of "-"
Cannot be used after small "tsu"
Combinations which cannot be used

The returned value is an error code. A 0 (zero) is returned when processing ends normally. If an error occurs, this function has the following error codes.

CONT_ERR_VOICE_WORD

The word cannot be registered. This word cannot be registered in the voice recognition dictionary.

UP


26.8.6.5 Count Semisyllables in Word

Function

osVoiceCountSyllables

Count the number of semisyllables in word

Syntax

#include <ultra64.h>
void osVoiceCountSyllables(u8 *word, u32 *syllable);

Arguments

word
Word to be registered
syllable
Number of semisyllables in word (Number of syllables times two)

Description

The osVoiceCountSyllables() function is for calculating how many syllables there are when registering a specific word in the Voice Recognition System. By using this function, you can later determine how many words can be registered in the dictionary. It is convenient to use the function during debugging or when asking the game player to input registered words.

word specifies the word (SJIS) to registered. The number of semisyllables resulting from the calculation is substituted for *syllables.

The total number of semisyllables which can be registered in the Voice Recognition System dictionary is 880 (440 syllables). If more than this are registered with the osVoiceSetWord() function, a CONT_ERR_VOICE_MEMORY error will occur.

The number of semisyllables is calculated as follows. One semisyllable per word must be added as an offset value.

Type of Syllable

Number of Semisyllables

Conditions

Vowel only

2

Start of word

Vowel only

1

Anywhere but start of word

Consonant + vowel

2

Start of word, or anywhere but when start of word is Romanized by k, t, c, or p

Consonant + vowel

3

Anywhere but start of word, anywhere except when preceding character is a small "tsu", or when start of word is Romanized by k, t, c, or p

Consonant + diphthong

2

Small "ya" and the like. Start of word or when start of word is Romanized by k, t, c, or p

Consonant + diphthong

3

Small "ya" and the like. Anywhere but start of word, anywhere except when preceding character is a small "tsu", or when start of word is Romanized by k, t, c, or p

"n" sound

1

none

Long "-" sound

1

none

Assimilated "tsu" sound

1

none

UP


26.8.6.6 Mask Registered Words

Function

osVoiceMaskDictionary

Switch between recognizing words registered in the dictionary and eliminating words from recognition

Syntax

#include <ultra64.h>
s32 osVoiceMaskDictionary(OSVoiceHandle *hd, u8 *maskpattern, int size);

Arguments

hd
Voice Recognition System control structure
maskpattern
All words mask pattern
size
Number of bytes in maskpattern

Description

The osVoiceMaskDictionary() function is for masking words registered in the Voice Recognition System. hd is the Voice Recognition System control structure. The Voice Recognition System must be initialized with the osVoiceInit() function before the osVoiceMaskDictionary() function is called.

Specify the word mask pattern in maskpattern. The mask data for all words are enumerated in maskpattern. The number of bytes in maskpattern is specified in size. In the mask data, one byte equals one word. A zero (0) indicates to mask (do not recognize a word) and a one (1) indicates not to mask (recognize a word). The word number (the number assigned the registered words in the order that they were registered) sequence in the mask data corresponds with the LSB to MSB sequence. In other words, bit 0 of the first byte corresponds with word No. 0, while bit 7 corresponds with word No. 7. If there are many words, prepare arrays for necessry number of bytes to create mask data. If the number of words is not a multiple of 8, put zeros (0) in the remaining most significant bits of the last byte of the mask data. If the osVoiceMaskDictionary() function has not been called, all of the words are unmasked.

The returned value is an error code. A 0 (zero) is returned when processing ends normally. If an error occurs, this function has the following error codes.

CONT_ERR_NO_CONTROLLER

Nothing is connected to the controller port.

CONT_ERR_DEVICE

Something other than the Voice Recognition System is connected to the controller port.

CONT_ERR_VOICE_NO_RESPONSE

There was no response from the Voice Recognition System. There may be a problem with the hardware.

CONT_ERR_CONTRFAIL

There was a data transmission failure. There is a problem in the Voice Recognition System connection.

CONT_ERR_INVALID

There is an error in the function call method or in the argument. This error will not occur if the function is being used correctly. Write your program so that this error does not occur when development is completed.

UP


26.8.6.7 Start Voice Recognition

Function

osVoiceStartReadData

Start voice recognition by the Voice Recognition System

Syntax

#include <ultra64.h>
s32 osVoiceStartReadData(OSVoiceHandle *hd);

Arguments

hd
Voice Recognition System control structure

Description

The osVoiceStartReadData() function is for starting recognition processing by the Voice Recognition System. Before starting voice recognition processing with the osVoiceStartReadData() function, the Voice Recognition System must be initialized with the osVoiceInit() function, the dictionary must be initialized with the osVoiceClearDictionary() function, and word registration must be performed with the osVoiceSetWord() function. Be absolutely sure to call the osVoiceStartReadData() function after calling these functions.

After calling the osVoiceStartReadData() function, recognition results can be obtained by calling the osVoiceGetReadData() function. In addition, once voice recognition has been started, call the osVoiceStopReadData() function to forcibly stop recognition.

The returned value is an error code. A 0 (zero) is returned when processing ends normally. If an error occurs, this function has the following error codes.

CONT_ERR_NO_CONTROLLER

Nothing is connected to the controller port.

CONT_ERR_DEVICE

Something other than the Voice Recognition System is connected to the controller port.

CONT_ERR_VOICE_NO_RESPONSE

There was no response from the Voice Recognition System. There may be a problem with the hardware.

CONT_ERR_CONTRFAIL

There was a data transmission failure. There is a problem in the Voice Recognition System connection.

CONT_ERR_INVALID

The voice recognition process attempted to start up, however, words were not registered properly in the dictionary with osVoiceSetWord(). There is an error in the function call method or in the argument. This error will not occur if the function is being used correctly. Write your program so that this error does not occur when development is completed.

UP


26.8.6.8 Get Recognition Result

Function

osVoiceGetReadData

Get voice recognition result from the Voice Recognition System

Syntax

#include <ultra64.h>
s32 osVoiceGetReadData(OSVoiceHandle *hd, OSVoiceData *result);

Arguments

hd
Voice Recognition System control structure
result
Recognition result

Description

The osVoiceGetReadData() function is for getting the recognition result from the Voice Recognition System. hd is the Voice Recognition System control structure. The Voice Recognition System must be initialized with the osVoiceInit() function before the osVoiceGetReadData() function is called.

The recognition result is stored in result of the OSVoiceData structure. The contents of the OSVoiceData structure are as follows:

        typedef struct {
                u16 warning; /* Warning */
                u16 answer_num; /* Candidate number (0~5) */
                u16 voice_level; /* Voice input level */
                u16 voice_sn; /* Relative voice level */
                u16 voice_time; /* Voice input time */
                u16 answer[5]; /* Candidate word number */
                u16 distance[5]; /* Distance value */
        } OSVoiceData;

The warning member variable of the OSVoiceData structure is the warning which pertains to the recognition result. The following bits are flagged when there is any problem with the recognition result.

Warning Name

Value

Description

Conditions

VOICE_WARN_TOO_SMALL

0x0400

Voice level is too low

100 < Voice Level < 150

VOICE_WARN_TOO_LARGE

0x0800

Voice level is too high

Voice Level > 3500

VOICE_WARN_NOT_FIT

0x4000

No words match recognition word

No. 1 Candidate Distance Value > 1600

VOICE_WARN_TOO_NOISY

0x8000

Too much ambient noise

Relative Voice Level =< 400

The answer_num member variable is the number of valid candidates. This is the number of words judged by the Voice Recognition System being valid as candidates. It is a value from 0 to 5. If this is 0, there are no valid candidates.

The voice_level member variable is the level of the input voice. The greater the voice input, the larger this value is.

The voice_sn member variable is the relative level of the voice input to the noise input.

The voice_time member variable is the voice input time in ms units.

The answer[] member variable is the numbers of the words from the 1st candidate to the 5th candidate. The word numbers are always output from the 1st candidate to the 5th candidate, but those which are deemed by the Voice Recognition System to be valid are numbered as candidates from the first to number of words in answer_num. Normally, answer[] is a value 0 ~ 0x00ff, but if there are no suitable words, its value is 0x7fff.

The distance[] member variable is the distance value of the word from the 1st candidate to the 5th candidate. The more similar the word, the smaller this value is.

Before calling the osVoiceGetReadData() function, voice recognition processing must be started with the osVoiceStartReadData() function.

The returned value is an error code. A 0 (zero) is returned when processing ends normally. If an error occurs, this function has the following error codes.

CONT_ERR_NO_CONTROLLER

Nothing is connected to the controller port.

CONT_ERR_DEVICE

Something other than the Voice Recognition System is connected to the controller port.

CONT_ERR_NOT_READY

Either no voice has been input, or results cannot be acquired for some reason, such as that processing is still underway, etc. Wait for a moment then try calling this function again. This error will occur if the status following execution of the osVoiceStartReadData() function is VOICE_STATUS_START, VOICE_STATUS_CANCEL, or VOICE_STATUS_BUSY.

CONT_ERR_VOICE_NO_RESPONSE

There was no response from the Voice Recognition System. There may be a problem with the hardware.

CONT_ERR_CONTRFAIL

There was a data transmission failure. There is a problem with the Voice Recognition System connection.

CONT_ERR_INVALID

There is an error in the function call method or in the argument. This error will not occur if the function is being used correctly. Write your program so that this error does not occur when development is completed.

UP


26.8.6.9 Forcibly Stop Recognition Processing

Function

osVoiceStopReadData

Forcibly stop voice recognition processing by the Voice Recognition System

Syntax

#include <ultra64.h>
s32 osVoiceStopReadData(OSVoiceHandle *hd);

Arguments

hd
Voice Recognition System control structure

Description

The osVoiceStopReadData() function is for forcibly stopping recognition processing once recognition by the Voice Recognition System has been started. hd is the Voice Recognition System control structure. The Voice Recognition System must be initialized with the osVoiceInit() function before the osVoiceStopReadData() function is called.

The returned value is an error code. A 0 (zero) is returned when processing ends normally. If an error occurs, this function has the following error codes.

CONT_ERR_NO_CONTROLLER

Nothing is connected to the controller port.

CONT_ERR_DEVICE

Something other than the Voice Recognition System is connected to the controller port.

CONT_ERR_VOICE_NO_RESPONSE

There was no response from the Voice Recognition System. There may be a problem with the hardware.

CONT_ERR_CONTRFAIL

There was a data transmission failure. There is a problem in the Voice Recognition System connection.

CONT_ERR_INVALID

There is an error in the function call method or in the argument. This error will not occur if the function is being used correctly. Write your program so that this error does not occur when development is completed.

UP


26.8.6.10 Adjust Input Gain

Function

osVoiceControlGain

Adjust the input gain of the Voice Recognition System

Syntax

#include <ultra64.h>
s32 osVoiceControlGain(OSVoiceHandle *hd, s32 analog, s32 digital);

Arguments

hd
Voice Recognition System control structure
analog
Transmission system analog gain
digital
Transmission system digital gain

Description

The osVoiceControlGain() function is for adjusting the gain of the input voice in the Voice Recognition System. The strength of the input voice signal can be changed by adjusting the gain. If the input voice is too strong, try decreasing the gain to decrease the voice level (normally, there is no particular need to change the gain).

hd is the Voice Recognition System control structure. The Voice Recognition System must be initialized with the osVoiceInit() function before the osVoiceControlGain() function is called.

analog is the analog gain of the transmission system. The analog gain is for adjusting the strength of the voice signal which is input from the microphone. The following values are available.

analog

Transmission system analog gain

0

0 dB (default)

1

-3 dB

digital is the digital gain of the transmission system. The digital gain is for adjusting the strength of the digital signal converted from the analog voice signal. The following values are available.

digital

Transmission system digital gain

0

0 dB (default)

1

-0.4 dB

2

-0.8 dB

3

-1.2 dB

4

-1.6 dB

5

-2.0 dB

6

-2.4 dB

7

-2.8 dB

The returned value is an error code. A 0 (zero) is returned when processing ends normally. If an error occurs, this function has the following error codes.

CONT_ERR_INVALID

There is an error in the function call method or in the argument. This error will not occur if the function is being used correctly. Write your program so that this error does not occur when development is completed.

UP


26.8.7 Examples Using Voice Recognition System Functions

Typical methods of using the various Voice Recognition System functions are described below. Please see Figure 27-8-1 for illustrated procedure and flow chart.

Typically, there are 5 types of processing which are performed:

(1) Initialize the Voice Recognition System using the osVoiceInit() function
(2) Initialize the registered word dictionary using the osVoiceClearDictionary() function
(3) Register words to the registered word dictionary using the osVoiceSetWord() function
(4) Start voice recognition using the osVoiceStartReadData() function
(5) Acquire voice recognition results using the osVoiceGetReadData() function

Detailed descriptions of these 5 processes are given in Section 26.8.7.1 "Flow of Voice Recognition Processing". Processing in which errors are returned as the return values for the various functions is explained in Section 26.8.7.2 "Error Processing".

UP


26.8.7.1 Flow of Voice Recognition Processing

(1) Voice Recognition System Initialization Processing

The osVoiceInit() function is called in the processing here. The osVoiceInit() function initializes the Voice Recognition System.

It is recommended that you first check what is connected to which port prior to initialization because standard controllers,etc., other than the Voice Recognition System may be inserted into the controller ports as well. This check can be accomplished with the osContStartQuery() function and the osContGetQuery() function. The Voice Recognition System is connected if the value of the member variable errno of the OSContStatus structure is 0 (zero), and if the AND (logical product) of the value for type and CONT_TYPE_MASK is CONT_TYPE_VOICE.

(2) Initialize Registered Word Dictionary

The osVoiceClearDictionary() function is called in the processing here. The osVoiceClearDictionary() function initializes the registered word dictionary. Initialize the dictionary before registering words using the osVoiceSetWord() function.

(3) Register Words to Registered Word Dictionary

The osVoiceSetWord() function is called in the processing here. The osVoiceSetWord() function registers words which are to be registered in the registered word dictionary. Since calling the osVoiceSetWord() function once registers one word, execute osVoiceSetWord() multiple times to register multiple words. The number of words registered must match the number set by the osVoiceClearDictionary() function. Please note that an error (CONT_ERR_INVALID) will be generated when the osVoiceStartReadData() function is executed if the number of words registered is greater than or less than the specified number of words.

(4) Start Voice Recognition

The osVoiceStartReadData() function is called in the processing here. The osVoiceStartReadData() function is for starting voice recognition processing. After osVoiceStartReadData() has been called, the recognition results can be acquired by calling osVoiceGetReadData(). In addition, call osVoiceStopReadData() to forcibly stop recognition after voice recognition has been started.

(5) Acquire Recognition Results

The osVoiceGetReadData() function is called in the processing here. The osVoiceGetReadData() function is for acquiring recognition results. The recognition results are stored in the OSVoiceData structure. Refer to the following example for the method of acquiring a recognized word from the data in the OSVoiceData structure. Please refer to Section 26.8.6.8 "Get Recognition Result" for details on the OSVoiceData structure.

  • Registered words are stored in a character string matrix. The word numbers of the registered words are 0, 1, 2, ... from the top of the matrix.

u8 *registration_word[] = {
  "Yakiniku",
  "Mario",
      .
      .
      .
  "Pikachu"
};
  • Define OSVoiceData structure variable as follows.
OSVoiceData result;
  • The word group prepared in 1) is registered in the dictionary in order from the top of the matrix, and voice recognition is started. The results are stored in result by acquiring the recognition results.
  • The numbers of the registered words closest to the word which has been recognized are stored in the member variables answer[0]~[4] of result in order of the smallest distance value (in order of similarity). When the word most similar to the recognized word has been acquired, it can be obtained in registration_word[result.answer[0]]. However, if result.answer_num is 0, there were no valid candidates. In this case, processing is repeated from recognition result acquisition.

To continue recognition processing again after the voice has been detected again, repeat processing from (4).

In order to rapidly respond to voice input from the user, you may call the osVoiceGetReadData() function every frame to check for voice input from the user.

UP


26.8.7.2 Error Processing

Perform the processing shown below when an error is returned upon execution of the various functions.

If one of the five errors CONT_ERR_NO_CONTROLLER, CONT_ERR_DEVICE, CONT_ERR_CONTRFAIL, CONT_ERR_VOICE_NO_RESPONSE, or CONT_ERR_INVALID occurs when any of the various functions is executed, display a message and repeat processing starting from (1). Since the two errors CONT_ERR_VOICE_NO_RESPONSE and CONT_ERR_INVALID are errors which are due to software or hardware failures or bugs, they will not normally occur.

If the CONT_ERR_VOICE_WORD error occurs when executing the osVoiceSetWord() function, the word that was being registered at the time contains improper characters. Re-register the proper word.

If the CONT_ERR_VOICE_MEMORY error occurs when executing the osVoiceSetWord() function, the dictionary has overflowed memory and no more words can be registered. However, even if the number of registered words is less than the number which was set by the osVoiceClearDictionary()function in this case, recognition processing can still be performed from (4) on. Consequently, when this error occurs, store the number of words registered up to that point as the number of registered words and shift to the processing at (4). To repeat registration, redo processing from (2).

If the CONT_ERR_NOT_READY error is returned during execution of the osVoiceGetReadData() function, either no voice has been input or recognition processing is still underway. Wait a moment and retry the osVoiceGetReadData() function.

You may also refer to the sample program "voice" which uses Voice Recognition System functions. It is stored under the /usr/src/PR/demos/ directory.

UP


26.8.8 Precautions

(1) Recognition Accuracy

The Voice Recognition System performs pattern characteristic extraction in syllable units to recognize one word. Because of this, there are cases in which recognition accuracy may be slightly inferior to characteristic extraction in word units. Since instances may arise in which the input voice cannot be recognized and the user is prompted to re-input, be particularly careful when real-time responses are required, as during an action game. In these cases, take measures so as to avoid mis-recognition, such as keeping the number of registered words low, or registering only words whose pronunciations are completely different.

For example, keep the words which are registered in the dictionary at that time low, or mask those words registered in the dictionary which are not needed, so that the user selects from the restricted vocabulary. Thus, the recognition success rate becomes very high since recognition is performed only from a limited small number of words.

(2) To Change to a New Recognized Word During Recognition Processing

To newly register a recognized word when the osVoiceStartReadData() function has been called and recognition processing is being executed, be sure to temporarily interrupt recognition processing with the osVoiceStopReadData() function. Then repeat the osVoiceClearDictionary() function.

(3) Registration to Recognized Word Dictionary

Do not register words in the dictionary which contain invalid character combinations which would return an error when entered to the osVoiceCheckWord() function. There are instances in which an error will not be returned and operation of the software will become unstable if the specific character combinations shown below are entered in the dictionary.

(4) Precautions During Voice Input

Depending on the words registered in the dictionary, valid word candidates may be output simply by coughing or breathing into the microphone. Because of this, limit the acceptance of voice input to when a controller button is being pressed, or the like, so as to avoid erroneous recognition. There may also be cases in which the Voice Recognition System is unable to complete preparation to accept voice input when voice input is performed at the same time that the button is pressed. In this case, you may perform the following procedure.

(5) Voice Input Gain Adjustment

Since the voice detection threshold value is determined by the strength of the input signal at the time that recognition processing is started, there are instances when the voice level is high (when voice recognition is started) in which the threshold value becomes high, making it difficult to detect voice input. If this occurs, try decreasing the gain to lower the voice level. Do not change the gain during the game except when it can be assumed that there will be unexpectedly high voice input levels at the start of recognition.

(6) Precautions Regarding Warnings

The warnings which are returned to the warning member variable of the OSVoiceData structure represent the reliability of the recognition results, but do not indicate a serious failure as an error. For instance, even if valid candidates are returned to the answer[] member variable, VOICE_WARN_NOT_FIT (word is not among the recognized words) may be returned as a warning. This will occur when the distance[0] member variable, which expresses the distance value of the No. 1 candidate word, is a value 1600 or greater, even though the answer_num member variable, which expresses the number of valid candidates, returns a value of 1 or more. In this case, the judgment priority for the two member variables depends on the application, but the warning essentially can be ignored. Use of the warning results is up to the discretion of the person creating the application.

UP