FAQ

What is the difference between online system and offline system in terms of accuracy, how results are delivered, speed?

In terms of accuracy, offline system will usually give better result. As when using online system to do speech recognition we reduce steps to process audio data in order to return result back to user in realtime, for offline system, user will need to wait for a period of time for audio to be processed with more steps to make result more accurate.

What audio file types are supported?

Currently we support the following audio file types:

How big the file size I can upload a time, and how many file I can send?

For Online System, you can stream from file with what ever size you want as audio data is send in chunk.

For Offline System, each file uploaded has to be smaller than 20MB, and you can only submit no more than 10 files a day

When using Offline system in WebUI, I see stages displayed (diarization, success,...). What does that mean?

Detail of those stages is describes in this section

In the result returned when using Offline decoder, what is 'score' field in XML file

The score field in XML represents for confident score. Minimum is 0 and maximum is 1, the higher the score, the better result we get

In enterprise edition, there will be more fields attached with the XML file in order to describe more detail about return transcriptions

Explain the three models that are currently hosted. What other models are available in the enterprise version?

Beside the mode of using system, user can choose to trial with 3 models, meaning 3 languages:

  • Singapore English: Can transcribe the audio in which user speaks Singapore English or English.

  • Mandarin: Can transcribe the audio in which user speaks

  • Mandarin Code-switch: Can transcribe the audio in which user speaks English and Mandarin interchangeably

What are conditions that might cause poor results

There're many reason can cause poor results when doing speech recognition:

  • Audio data has too much noise

  • The volume of entire audio is not stable

  • Audio file should be a Mono channel audio

  • Audio has too big or too small sampling rate. We recommend using 16kHz audio.

  • Speaker inside audio has too many overlaps

  • Speaker speaks with language that our system doesn't support

Tips to get good results in the streaming Recording mode

When using streaming recording mode to transcribe live audio data with Online system, audio will be captured from user's microphone input. So here are some tips to get good result:

  • Check your audio input (microphone, laptop speaker) and make sure they work correctly

  • Speaking with enough volume during transcribing

  • Make sure you're stay in good environment, has less noise

  • Sample rate should be 16kHz

Last updated