Phone screen tech talent in 5 minutes with Induckt Use invite code BETA15 for a free trial through July 31

Interrupting Sphinx-4 speech recognition in continuous recognition mode

by timvasil 5/14/2011 8:20:00 PM

Sphinx-4 is a speech recognizer developed at Carnegie Mellon University.  Out of the box, it offers two modes of operation: batch ("frontend") and continuous ("epFrontEnd").  In contiuous mode, it performs decoding live based on, say, microphone input.

Unfortunately for me, epFrontEnd turns the Recognizer.recognize() method into a blocking call, and Sphinx-4's API provides no way of interrupting this method. I find this problematic in various scenarios, such as automated tests.  In such a test, I want to determine whether the recognizer recognizes the command correctly, incorrectly, or misses it entirely.  The "miss" case is the tricky one, as in this case the recognize() method just hangs indefinitely, waiting for more audio input.  

I found a way to work around this problem.  It involves inserting a custom data processor into Sphinx-4's data processing stack.

Here's how to do it in three steps:

Step 1:  Implement a custom data processor 

public class InsertableDataBlocker extends BaseDataProcessor
    List<Data> insertionDatas = new LinkedList<Data>();

    public Data getData() throws DataProcessingException
        if (!insertionDatas.isEmpty())
            throw new InterruptException();
        return getPredecessor().getData();

    public void injectInterrupt()
        insertionDatas.add(new DataEndSignal(0));

Step 2:  Add this data processor to the processing stack

In the Sphinx-4 XML configuration file, place the processor right after the microphone processor in the stack.  

    <component name="epFrontEnd" type="edu.cmu.sphinx.frontend.FrontEnd">
        <propertylist name="pipeline">
            <item>microphone </item>
            <item>insertableDataBlocker </item> 
            <item>dataBlocker </item>
            <item>speechClassifier </item>
            <item>speechMarker </item>
            <item>nonSpeechDataFilter </item>
            <item>preemphasizer </item>
            <item>windower </item>
            <item>fft </item>
            <item>melFilterBank </item>
            <item>dct </item>
            <item>liveCMN </item>
            <item>featureExtraction </item>

Step 3:  Interrupt the recognize() method when desired

ConfigurationManager cm = new ConfigurationManager(getClass().getResource("config.xml"));
InsertableDataBlocker inserter = (InsertableDataBlocker)cm.lookup("insertableDataBlocker");


Java | Speech

Comments are closed



«  October 2016  »

View posts in large calendar

Recent comments