EU flag
This site use Cookies. Read privacy policy
OK
OK
API
Blog
Research
Careers
Contact
Conversational Intelligence

Analyze conversations in your company and sell more, understand users, increase UX

Cognitive Automation

Lower your customer care cost by automating repetitive processes

products

other products

Wordlify
Subtitles
Dictate
Media Monitoring

is here! 🎉

VoiceLab.AI, leader in Conversational AI now brings TRURL, an instruction-following large language model (LLM) which has been fine-tuned for number of business domains such as e-commerce and customer support.

TRURL brings additional support for specialized analytical tasks:

  • list icon orange

    Dialog structure aggregation

  • list icon orange

    Customer support quality control

  • list icon orange

    Sales intelligence and assistance

TRURL can also be implemented effectively on-premise:

  • list icon orange

    We will build a GPT model for you

  • list icon orange

    Trained securely on your infrastructure

  • list icon orange

    Trained on your dataset

Discover Trurl Alpha version!
TRURL hero

Vencode harnesses TRURL to build a company chat system, seamlessly integrating information from provided documents and the website for enhanced communication within the organization.

Discover solution Beta version!
ASR WEB SOCKET API
ASR gRPC API
ASR HTTP API

Select API

Communication with the ASR server using WebSocket

See our GitLab repository for a working example how to use our API. For a full description, read below.

1. The client connects to the ASR address, which /classify/asr ( wss://demo.voicelab.ai/classify/asr) waits for connections.

2. For connection, set and send the appropriate information in the HTTP header when establishing the connection:

  • Information about the audio format that will be streamed to the server. To this end, the customer sets up content-type.
    • Supported arguments:
      audio/l16;rate=8000 – audio samples in PCM 8kHz encoding;
      audio/l16;rate=16000 – audio samples in PCM 16kHz encoding;
      audio/x-alaw-basic – alaw codec;
      audio/basic – mulaw codec;
      audio/flac – FLAC codec.
  • Project identifier with which the client will connect:
    • X-Voicelab-Pid: PID, where PID is the appropriate project number (in our case PID = 109)
  • Project password to which the client will connect:
    • X-Voicelab-Password: PASS, where PASS is the appropriate password (in our case PASS = fbcd6fbb37a10a6d44467918a67d6c54)
  • Configuration name to which the client will connect:
    • X-Voicelab-Conf-Name: CONF-NAME, where CONF-NAME is the appropriate configuration name:
      8000_pl_PL or 16000_pl_PL for Polish language (8kHz or 16kHz sample rate),
      8000_en_US or 16000_en_US for English (8kHz or 16kHz sample rate),
      8000_ru_RU or 16000_ru_RU for Russian (8kHz or 16kHz sample rate),
      16000_de_DE for German (16kHz sample rate),
      8000_it_IT for Italian (8kHz sample rate)

3. The client sends subsequent audio packages in the form of websocket binary messages. After completing the audio stream, send a four-byte binary message of which all four bytes are zeros (in case you would have to send four zeros as the last audio data message, then divide this message into two, e.g. two bytes by two, and then send four zeros bytes terminating transmission), this is the transmission termination mark.

4. ASR returns the recognition as JSON document (websocket text message) of the form:


        {
            "status": "string: OK or ERROR",
            "shift": "string: how many words will come in recognition",
            "words": "array: list of reckognized words",
            "start": "(**) array: list with words' start times",
            "end": "(**) array: list with words' end times",
            "error": "(*) string: type of error",
            "description": "(*) string: short decription of error"
        }
                

fields marked with (*) are optional and appear only if the status is different "OK". If the status is different than "OK", then the connection is terminated. The type of error here will be either "BadRequest" or "Forbidden". the field marked with (**) is optional and appears depending on the server configuration.

  • An example of building recognition from the initial results returned by the ASR server during dictation.
    
            {"status": "OK","shift": "1", "words": ["a"]} 
            {"status": "OK","shift": "0", "words": ["ala"]}
            {"status": "OK","shift": "1", "words": ["pięknie"]}
            {"status": "OK","shift": "2", "words": ["śpi", "je"]}
            {"status": "OK","shift": "-1", "words": ["śpiewa"]}
                    

Final recognition: „ala pięknie śpiewa”.