G10LSPEECH ANALYSIS OR SYNTHESISSPEECH RECOGNITIONSPEECH OR VOICE PROCESSINGSPEECH OR AUDIO CODING OR DECODING This subclass does not cover: devices for the storage of speech or audio signals, which are covered by subclasses G11B and G11C;encoding of compressed speech signals for transmission or storage, which is covered by group H03M7/30.In this subclass non-limiting references (in the sense of paragraph 39 of the Guide to the IPC) may still be displayed in the scheme. G10L13/00 G10L13/00Speech synthesisText to speech systems G10L13/02Methods for producing synthetic speechSpeech synthesisers G10L2013/021Overlap-add techniques G10L13/027Concept to speech synthesisersGeneration of natural phrases from machine-based concepts generation of parameters for speech synthesis out of text G10L13/08 G10L13/033Voice editing, e.g. manipulating the voice of the synthesiser G10L13/0335Pitch control G10L13/04Details of speech synthesis systems, e.g. synthesiser structure or memory management G10L13/047Architecture of speech synthesisers G10L13/06Elementary speech units used in speech synthesisersConcatenation rules G10L13/07Concatenation rules G10L13/08Text analysis or generation of parameters for speech synthesis out of text, e.g. grapheme to phoneme translation, prosody generation or stress or intonation determination G10L2013/083Special characters, e.g. punctuation marks G10L13/086Detection of language G10L13/10Prosody rules derived from textStress or intonation G10L2013/105Duration G10L15/00Speech recognition G10L17/00 takes precedence G10L15/005Language recognition G10L15/01Assessment or evaluation of speech recognition systems G10L15/02Feature extraction for speech recognitionSelection of recognition unit G10L2015/022Demisyllables, biphones or triphones being the recognition units G10L2015/025Phonemes, fenemes or fenones being the recognition units G10L2015/027Syllables being the recognition units G10L15/04SegmentationWord boundary detection G10L15/05Word boundary detection G10L15/06Creation of reference templatesTraining of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice G10L15/14 takes precedence G10L15/063Training G10L2015/0631Creating reference templates; Clustering G10L2015/0633using lexical or orthographic knowledge sources G10L2015/0635updating or merging of old and new templates; Mean values; Weighting G10L2015/0636Threshold criteria for the updating G10L2015/0638Interactive procedures G10L15/065Adaptation G10L15/07to the speaker G10L15/075supervised, i.e. under machine guidance G10L15/08Speech classification or search G10L2015/081Search algorithms, e.g. Baum-Welch or Viterbi G10L15/083Recognition networks G10L15/142, G10L15/16 take precedence G10L2015/085Methods for reducing search complexity, pruning G10L2015/086Recognition of spelled words G10L2015/088Word spotting G10L15/10using distance or distortion measures between unknown speech and reference templates G10L15/12using dynamic programming techniques, e.g. dynamic time warping [DTW] G10L15/14using statistical models, e.g. Hidden Markov Models [HMMs] G10L15/18 takes precedence G10L15/142Hidden Markov Models [HMMs] G10L15/144Training of HMMs G10L15/146with insufficient amount of training data, e.g. state sharing, tying, deleted interpolation G10L15/148Duration modelling in HMMs, e.g. semi HMM, segmental models or transition probabilities G10L15/16using artificial neural networks G10L15/18using natural language modelling G10L15/1807using prosody or stress G10L15/1815Semantic context, e.g. disambiguation of the recognition hypotheses based on word meaning G10L15/1822Parsing for meaning understanding G10L15/183using context dependencies, e.g. language models G10L15/187Phonemic context, e.g. pronunciation rules, phonotactical constraints or phoneme n-grams G10L15/19Grammatical context, e.g. disambiguation of the recognition hypotheses based on word sequence rules G10L15/193Formal grammars, e.g. finite state automata, context free grammars or word networks G10L15/197Probabilistic grammars, e.g. word n-grams G10L15/20Speech recognition techniques specially adapted for robustness in adverse environments, e.g. in noise, of stress induced speech G10L21/02 takes precedence G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue G10L2015/221Announcement of recognition results G10L15/222Barge in, i.e. overridable guidance for interrupting prompts G10L2015/223Execution procedure of a spoken command G10L2015/225Feedback of the input speech G10L2015/226using non-speech characteristics G10L2015/227of the speaker; Human-factor methodology G10L2015/228of application context G10L15/24Speech recognition using non-acoustical features G10L15/25using position of the lips, movement of the lips or face analysis G10L15/26Speech to text systems G10L15/08 takes precedence G10L15/28Constructional details of speech recognition systems G10L15/285Memory allocation or algorithm optimisation to reduce hardware requirements G10L15/30Distributed recognition, e.g. in client-server systems, for mobile phones or network applications G10L15/32Multiple recognisers used in sequence or in parallelScore combination systems therefor, e.g. voting systems G10L15/34Adaptation of a single recogniser for parallel processing, e.g. by use of multiple processors or cloud computing G10L17/00Speaker identification or verification G10L17/02Preprocessing operations, e.g. segment selectionPattern representation or modelling, e.g. based on linear discriminant analysis [LDA] or principal componentsFeature selection or extraction G10L17/04Training, enrolment or model building G10L17/06Decision making techniquesPattern matching strategies G10L17/08Use of distortion metrics or a particular distance between probe pattern and reference templates G10L17/10Multimodal systems, i.e. based on the integration of multiple recognition engines or fusion of expert systems G10L17/12Score normalisation G10L17/14Use of phonemic categorisation or speech recognition prior to speaker recognition or verification G10L17/16Hidden Markov models [HMM] G10L17/18Artificial neural networksConnectionist approaches G10L17/20Pattern transformations or operations aimed at increasing system robustness, e.g. against channel noise or different working conditions G10L17/22Interactive proceduresMan-machine interfaces G10L17/24the user being prompted to utter a password or a predefined phrase G10L17/26Recognition of special voice characteristics, e.g. for use in lie detectorsRecognition of animal voices G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocodersCoding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis in musical instruments G10H G10L2019/0001Codebooks G10L2019/0002Codebook adaptations G10L2019/0003Backward prediction of gain G10L2019/0004Design or structure of the codebook G10L2019/0005Multi-stage vector quantisation G10L2019/0006Tree or treillis structures; Delayed decisions G10L2019/0007Codebook element generation G10L2019/0008Algebraic codebooks G10L2019/0009Orthogonal codebooks G10L2019/001Interpolation of codebook vectors G10L2019/0011Long term prediction filters, i.e. pitch estimation G10L2019/0012Smoothing of parameters of the decoder interpolation G10L2019/0013Codebook search algorithms G10L2019/0014Selection criteria for distances G10L2019/0015Viterbi algorithms G10L2019/0016Codebook for LPC parameters G10L19/0017Lossless audio signal coding; Perfect reconstruction of coded audio signal by transmission of coding error G10L19/24 takes precedence G10L19/0018Speech coding using phonetic or linguistical decoding of the source; Reconstruction using text-to-speech synthesis G10L19/002Dynamic bit allocation for perceptual audio coders G10L19/032 G10L19/005Correction of errors induced by the transmission channel, if related to the coding algorithm G10L19/008Multichannel audio signal coding or decoding using interchannel correlation to reduce redundancy, e.g. joint-stereo, intensity-coding or matrixing G10L19/012Comfort noise or silence coding G10L19/018Audio watermarking, i.e. embedding inaudible data in the audio signal G10L19/02using spectral analysis, e.g. transform vocoders or subband vocoders G10L19/0204using subband decomposition G10L19/0208Subband vocoders G10L19/0212using orthogonal transformation G10L19/0216using wavelet decomposition G10L19/022Blocking, i.e. grouping of samples in timeChoice of analysis windowsOverlap factoring G10L19/025Detection of transients or attacks for time/frequency resolution switching G10L19/028Noise substitution, i.e. substituting non-tonal spectral components by noisy source comfort noise for discontinuous speech transmission G10L19/012 G10L19/03Spectral prediction for preventing pre-echoTemporary noise shaping [TNS], e.g. in MPEG2 or MPEG4 G10L19/032Quantisation or dequantisation of spectral components G10L19/035Scalar quantisation G10L19/038Vector quantisation, e.g. TwinVQ audio G10L19/04using predictive techniques G10L19/06Determination or coding of the spectral characteristics, e.g. of the short-term prediction coefficients G10L19/07Line spectrum pair [LSP] vocoders G10L19/08Determination or coding of the excitation functionDetermination or coding of the long-term prediction parameters G10L19/083the excitation function being an excitation gain G10L25/90 takes precedence G10L19/087using mixed excitation models, e.g. MELP, MBE, split band LPC or HVXC G10L19/09Long term prediction, i.e. removing periodical redundancies, e.g. by using adaptive codebook or pitch predictor G10L19/093using sinusoidal excitation models G10L19/097using prototype waveform decomposition or prototype waveform interpolative [PWI] coders G10L19/10the excitation function being a multipulse excitation G10L19/107Sparse pulse excitation, e.g. by using algebraic codebook G10L19/113Regular pulse excitation G10L19/12the excitation function being a code excitation, e.g. in code excited linear prediction [CELP] vocoders G10L19/125Pitch excitation, e.g. pitch synchronous innovation CELP [PSI-CELP] G10L19/13Residual excited linear prediction [RELP] G10L19/135Vector sum excited linear prediction [VSELP] G10L19/16Vocoder architecture G10L19/167Audio streaming, i.e. formatting and decoding of an encoded audio signal representation into a data stream for transmission or storage purposes G10L19/173Transcoding, i.e. converting between two coded representations avoiding cascaded coding-decoding G10L19/18Vocoders using multiple modes G10L19/20using sound class specific coding, hybrid encoders or object based coding G10L19/22Mode decision, i.e. based on audio signal content versus external parameters G10L19/24Variable rate codecs, e.g. for generating different qualities using a scalable representation such as hierarchical encoding or layered encoding G10L19/26Pre-filtering or post-filtering G10L19/265Pre-filtering, e.g. high frequency emphasis prior to encoding G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility G10L19/00 takes precedence G10L21/003Changing voice quality, e.g. pitch or formants G10L21/007characterised by the process used G10L21/01Correction of time axis G10L21/013Adapting to target pitch G10L2021/0135Voice conversion or morphing G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation reducing echo effects in line transmission systems H04B3/20; echo suppression in hands-free telephones H04M9/08 G10L21/0208Noise filtering G10L2021/02082the noise being echo, reverberation of the speech G10L2021/02085Periodic noise G10L2021/02087the noise being separate speech, e.g. cocktail party G10L21/0216characterised by the method used for estimating noise G10L2021/02161Number of inputs available containing the signal or the noise to be suppressed G10L2021/02163Only one microphone G10L2021/02165Two microphones, one receiving mainly the noise signal and the other one mainly the speech signal G10L2021/02166Microphone arrays; Beamforming G10L2021/02168the estimation exclusively taking place during speech pauses G10L21/0224Processing in the time domain G10L21/0232Processing in the frequency domain G10L21/0264characterised by the type of parameter measurement, e.g. correlation techniques, zero crossing techniques or predictive techniques G10L21/0272Voice signal separating G10L21/028using properties of sound source G10L21/0308characterised by the type of parameter measurement, e.g. correlation techniques, zero crossing techniques or predictive techniques G10L21/0316by changing the amplitude G10L21/0324Details of processing therefor G10L21/0332involving modification of waveforms G10L21/034Automatic adjustment G10L21/0356for synchronising with other signals, e.g. video signals G10L21/0364for improving intelligibility G10L2021/03643Diver speech G10L2021/03646Stress or Lombard effect G10L21/038using band spreading techniques G10L21/0388Details of processing therefor G10L21/04Time compression or expansion G10L21/043by changing speed G10L21/045using thinning out or insertion of a waveform G10L21/047characterised by the type of waveform to be thinned out or inserted G10L21/049characterised by the interconnection of waveforms G10L21/055for synchronising with other signals, e.g. video signals G10L21/057for improving intelligibility G10L2021/0575Aids for the handicapped in speaking G10L21/06Transformation of speech into a non-audible representation, e.g. speech visualisation or speech processing for tactile aids G10L15/26 takes precedence G10L2021/065Aids for the handicapped in understanding G10L21/10Transforming into visible information G10L2021/105Synthesis of the lips movements from speech, e.g. for talking heads G10L21/12by displaying time domain information G10L21/14by displaying frequency domain information G10L21/16Transforming into a non-visible representation devices or methods enabling ear patients to replace direct auditory perception by another kind of perception A61F11/04 G10L21/18Details of the transformation process G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 muting semiconductor-based amplifiers when some special characteristics of a signal are sensed by a speech detector, e.g. sensing when no signal is present, H03G3/34 G10L25/03characterised by the type of extracted parameters G10L25/06the extracted parameters being correlation coefficients G10L25/09the extracted parameters being zero crossing rates G10L25/12the extracted parameters being prediction coefficients G10L25/15the extracted parameters being formant information G10L25/18the extracted parameters being spectral information of each sub-band G10L25/21the extracted parameters being power information G10L25/24the extracted parameters being the cepstrum G10L25/27characterised by the analysis technique G10L25/30using neural networks G10L25/33using fuzzy logic G10L25/36using chaos theory G10L25/39using genetic algorithms G10L25/45characterised by the type of analysis window G10L25/48specially adapted for particular use G10L25/51for comparison or discrimination G10L25/54for retrieval G10L25/57for processing of video signals G10L25/60for measuring the quality of voice signals G10L25/63for estimating an emotional state G10L25/66for extracting parameters related to health condition detecting or measuring for diagnostic purposes A61B5/00 G10L25/69for evaluating synthetic or decoded voice signals G10L25/72for transmitting results of analysis G10L25/75for modelling vocal tract parameters G10L25/78Detection of presence or absence of voice signals switching of direction of transmission by voice frequency in two-way loud-speaking telephone systems H04M9/10 G10L2025/783based on threshold decision G10L2025/786Adaptive threshold G10L25/81for discriminating voice from music G10L25/84for discriminating voice from noise G10L25/87Detection of discrete points within a voice signal G10L25/90Pitch determination of speech signals G10L2025/903using a laryngograph G10L2025/906Pitch tracking G10L25/93Discriminating between voiced and unvoiced parts of speech signals G10L25/90 takes precedence G10L2025/932Decision in previous or following frames G10L2025/935Mixed voiced class; Transitions G10L2025/937Signal energy in various frequency bands G10L99/00Subject matter not provided for in other groups of this subclass