|
| 1 | +# Google Cloud Speech-to-Text Transform |
| 2 | + |
| 3 | +Description |
| 4 | +----------- |
| 5 | +This plugin converts audio files to text by using Google Cloud Speech-to-Text. |
| 6 | + |
| 7 | +Google Cloud Speech-to-Text enables developers to convert audio to text by applying powerful neural network models. |
| 8 | + |
| 9 | +Credentials |
| 10 | +----------- |
| 11 | +If the plugin is run on a Google Cloud Dataproc cluster, the service account key does not need to be |
| 12 | +provided and can be set to 'auto-detect'. |
| 13 | +Credentials will be automatically read from the cluster environment. |
| 14 | + |
| 15 | +If the plugin is not run on a Dataproc cluster, the path to a service account key must be provided. |
| 16 | +The service account key can be found on the Dashboard in the Cloud Platform Console. |
| 17 | +Make sure the account key has permission to access Google Cloud Spanner. |
| 18 | +The service account key file needs to be available on every node in your cluster and |
| 19 | +must be readable by all users running the job. |
| 20 | + |
| 21 | +Properties |
| 22 | +---------- |
| 23 | +**Audio Field:** Name of the input field which contains the raw audio data in bytes. |
| 24 | + |
| 25 | +**Audio Encoding**: Audio encoding of the data sent in the audio message. All encodings support only 1 channel (mono) |
| 26 | +audio. Only 'FLAC' and 'WAV' include a header that describes the bytes of audio that follow the header. |
| 27 | +The other encodings are raw audio bytes with no header. |
| 28 | + |
| 29 | +**Sampling Rate**: Sample rate in Hertz of the audio data sent in all 'RecognitionAudio' messages. |
| 30 | +Valid values are: 8000-48000. 16000 is optimal. For best results, set the sampling rate of the audio source to |
| 31 | +16000 Hz. If that's not possible, use the native sample rate of the audio source (instead of re-sampling). |
| 32 | + |
| 33 | +**Profanity**: Whether to attempt filtering profanities, replacing all but the initial character in each filtered |
| 34 | +word with asterisks, e.g. "f***". If set to `false`, profanities won't be filtered out. |
| 35 | + |
| 36 | +**Language**: The language of the supplied audio as a [BCP-47](https://www.rfc-editor.org/rfc/bcp/bcp47.txt) |
| 37 | +language tag. Example: "en-US". See [Language Support](https://cloud.google.com/speech/docs/languages) for a list of |
| 38 | +the currently supported language codes. |
| 39 | + |
| 40 | +**Transcription Parts Field**: The field to store the transcription parts. It will be an array of records. Each record |
| 41 | +in the array represents one part of the full audio data and will contain the transcription and confidence for that part. |
| 42 | + |
| 43 | +**Transcription Text Field**: The field to store the transcription of the full audio data. It is generated using the |
| 44 | +transcription for each part with the highest confidence. |
0 commit comments