Third Quarterly Progress Report

February 1, 1996 through April 30, 1996

NIH contract N01-DC-5-2103

Speech Processors for Auditory Prostheses

Prepared by

Dewey T. Lawson, Blake S. Wilson, Mariangeli Zerbi, and Charles C. Finley

Center for Auditory Prosthesis Research

Research Triangle Institute

Research Triangle Park, NC 27709


Contents

I. Introduction

II. 22 Electrode Percutaneous Study: results for the first five subjects

III. Plans for the Next Quarter

IV. Acknowledgments

Appendix A: Summary of Reporting Activity for this Quarter


I. Introduction

One of the principal objectives of this project is to design, develop, and evaluate speech processors for implantable auditory prostheses. Ideally, the processors will represent the information content of speech in a way that can be perceived by implant patients. Another principal objective is to develop new test materials for the evaluation of speech processors, given the growing number of cochlear implant subjects enjoying levels of performance too high to be sensitively measured by existing tests.

Work in the present quarter included:

  1. Speech reception and evoked potential studies with Nucleus percutaneous subject NP4 (February 7-11, February 14-16, and March 25-27. These sessions constituted the second of three two-week visits by this subject.)
  2. Speech reception and evoked potential studies with Nucleus percutaneous subject NP3 (March 4-8 and 11-15: the final two-week visit scheduled for this subject.)
  3. Speech reception and evoked potential studies with Ineraid subject SR15 (March 18-22). This was the first visit to our laboratory by this subject, who was selected for her quite low levels of speech reception with her clinical processor.
  4. Speech reception and evoked potential studies with Ineraid subject SR9 (April 15-19 and 22-26). This subject was selected for further studies on the basis of quite low levels of speech reception with her clinical processor.
  5. Presentations of project results in an invited lecture at the Sixth Symposium on Cochlear Implants in Children, Miami Beach, FL, February 2-3 (Wilson); in a poster at the Nineteenth Midwinter Research Meeting of the Association for Research in Otolaryngology, in St. Petersburg Beach, FL, February 4-8 (Finley); and in an invited lecture at the annual meeting of the American Association for Audiology, in Salt Lake City, April 20-21 (Lawson).
  6. Evaluation of plans for the recording, validation, and calibration of new speech test materials with project consultant William Rabinowitz, at RTI, February 15-16.
  7. Design, procurement, and installation of new audio mixing equipment for the speech processors laboratory at RTI, in support both of speech reception studies and the recording of new speech test materials.
  8. Bench evaluation at RTI of the portable research processors developed in collaboration with investigators in Geneva and Boston.
  9. Continued development of the evoked potential laboratory at RTI.
  10. Continued work on the recording of new speech materials for testing speech processor performance.
  11. Continued analysis of speech reception and evoked potential data from prior studies.
  12. Continued preparation of manuscripts for publication.
  13. Additional visits by the following colleagues:

In this report we present results from the first five subjects in our study of seven patients implanted with a percutaneous version of the Nucleus 22 electrode array. Results from additional studies with those subjects and the studies with Ineraid subjects SR15 and SR9 outlined above will be included in subsequent reports.

II. 22 Electrode Percutaneous Study: results for the first five subjects

With NIH support, our group is participating with Duke University Medical Center and Cochlear Corporation in a study of patients implanted with a percutaneous research version of the standard Nucleus 22 electrode array. Five patients (NP1 - NP5) have been selected thus far and implanted by DUMC surgeon Debara Tucci. It has been agreed that a total of seven patients will be included in the study. Each subject agrees to participate in three two-week research visits to our laboratory.

During the percutaneous phase of the studies each patient's everyday processor is a monopolar variation of the standard clinical SPEAK strategy. That processor is fitted and maintained by DUMC audiologist Patricia Roush and evaluated by our group along with various CIS and other research designs. The subjects also participate in our intracochlear evoked potential studies.

A core protocol of processor comparisons is carried out with each of these subjects. As outlined in QPR1 for this project, the contents of the protocol evolved significantly in the course of the early patient visits, as initial results were obtained. In this report we will review the final design of the protocol and present its results for each of the first five subjects.

The sixteen core protocol processors to be compared are outlined in Table I. All of the processors use 33 µs/phase pulses, full wave rectification, twelfth order bandpass filters, fourth order smoothing filters, and our normal preemphasis filter and logarithmic mapping functions. All stimulate chosen intracochlear electrodes with respect to a reference electrode in m. temporalis. In addition, there are reference settings that are departed from only in processors designed to assess the variation of a single parameter. These reference settings include: positive phase leading in balanced biphasic pulses, a pulse rate on each channel of 833 pps, a staggered order of stimulation within each cycle (e.g. 6,3,5,2,4,1 for a six channel processor), a 200 Hz cutoff frequency for smoothing filters, and a 350 Hz to 5500 Hz overall bandpass range allocated to the channels in contiguous bands of logarithmically equal widths.

Five of the sixteen protocol processors use the reference settings to implement different numbers of CIS channels. These are labeled in Table I, according to the number of channels, as 21, 11, 8, 6ref, and 4. [Note that the reference pulse rate for the 21 channel CIS processors is necessarily 721 pps rather than 833, given 33 µs/phase pulse widths.] Seven processors represent single parameter variations with respect to 6ref. These include 6els (differing only in that it utilizes a different set of six electrodes), 6ord (using an apex-to-base rather than staggered order of stimulation), 6sth (using a 400 Hz rather than 200 Hz smoothing filter cutoff frequency), 6pol (using balanced biphasic pulses that begin with the negative rather than the positive phase to the intracochlear electrode), 6rng (with frequency bands covering the range of 350 to 9500 Hz rather than 350 to 5500 Hz), 6slo (with a pulse rate on each channel of 250 pps rather than 833), and 6fst (with a pulse rate of 2525 pps rather than 833). The remaining four processors are what we call "n-of-m" designs, in which a total of m frequency bands are analyzed and only the n electrodes corresponding to the n highest energy bands are stimulated on a given processing cycle. In this protocol, n will be 6 in every case, while m may vary somewhat depending on each subject's number of available electrodes. For the first five subjects, m has been held constant at 18. For each subject the four processors will be identical except that nmfst and nmrngfst will have a pulse rate of 833 pps on each stimulated electrode while nmslo and nmrngslo will have a rate of only 250 pps. As with the 6 channel CIS processors, presence of "rng" in a label indicates the use of the 350 to 9500 Hz extended frequency range.

This set of protocol processors has been chosen to support a wide array of comparisons of interest. The effects of varying the number of CIS channels are explored through comparisons among 21, 11, 8, 6ref, and 4. The sensitivity of performance to choices among available electrodes may be probed by comparing 6ref and 6els. The effects of various single parameter variations are studied in comparisons of performance between 6ref and, in turn, 6ord, 6sth, 6pol, 6rng, 6fst, and 6slo. The nmrngslo processor is designed to be equivalent in some respects to the clinical SPEAK processor, which also analyzes an extended frequency range, selects a subset of the analyzed bands for stimulation on each processing cycle, and stimulates the corresponding electrodes at a variable rate that averages approximately 250 pps. Comparisons are available with a basic six channel CIS processor at the same rate (6slo), an n-of-m processor at the same rate but without an extended frequency range (nmslo), and n-of-m processors running at a rate substantially higher than possible for the present SPEAK strategy (nmfst, nmrngfst). The latter processors also may be compared with a CIS processor running at the same rate (6ref). Depending on performance test results with individual subjects, various features of the protocol designs can be combined in additional processors for evaluation. The performance of the monopolar clinical SPEAK processor is tested during each of the three visits to our laboratory, to provide data on learning effects. A bipolar SPEAK processor is fitted during the last of the three research visits and its performance tested after extended use outside the laboratory.

After completion of the three two-week research visits to our laboratory, it is anticipated that each subject will undergo a second surgery to receive a standard clinical transcutaneous device. Only subject NP1 has undergone that second surgery to date; she is doing well with her clinical bipolar SPEAK processor.

Table I.

Key to Processor Labels:

21, 11, 8, 4, 6ref: 21, 11, 8, 4, and 6 channel, reference parameters

6els: 6 channel, alternate electrode choice

6ord: 6 channel, apex-to-base stimulation order

6sth: 6 channel, 400 Hz smoothing cutoff

6pol: 6 channel, reversed polarity

6rng: 6 channel, extended freq range

6fst: 6 channel, fast rate (2525 pps)

6slo: 6 channel, slow rate (250 pps)

nmfst: n-of-m (6-of-18) channel, fast rate (833 pps)

nmslo: n-of-m (6-of-18) channel, slow rate (250 pps)

nmrngfst: n-of-m (6-of-18) channel, extended freq range, fast rate

nmrngslo: n-of-m (6-of-18) channel, extended freq range, slow rate

[When 6 channel and n-of-m channel processors are grouped separately and no confusion will result, the "6" and "nm" label prefixes may be omitted.]

Tables II through VI summarize contemporaneous 24 or 16 consonant identification data from subjects NP1 through NP5 for the sixteen protocol processors and the monopolar version of the clinical SPEAK processor each had used daily for approximately one and a half years. [Not all of the subjects have high enough levels of performance to justify use of the 24 consonant tests. 16 consonant tests have been used with three of the first five subjects – NP3, NP4 and NP5.] Results for tests using male and female talkers are listed separately, in descending order of overall information transmission in each case. For subject NP1 these results (in Table II) are the same as in Table IV of QPR1, except that results for corrected versions of two of the processors, nmfst and nmslo, were obtained after submission of QPR1. To facilitate comparisons, Tables II through VI have separate columns for (1) single parameter variations among 6 channel CIS processors, (2) speed and frequency range variations among n-of-m processors, (3) comparisons among otherwise similar processors differing in the number of CIS channels, and (4) the monopolar clinical processor in everyday use by each subject. Experience with all tested processors except the clinical SPEAK processor was limited to a brief period of informal conversation and loudness adjustment prior to testing.

Appendix 1 to this report includes both percent overall information transmission and percent correct scores with standard deviations of the mean for each subject and each protocol processor. Each percent correct or IT value is based on presentation of a minimum of 10 randomized blocks of the 16 or 24 consonant tokens, sound alone, from video disc recordings. Multiple exemplars of each token were used, and there was no feedback as to correct or incorrect responses. Overall information transmission is generally a more meaningful indicator of processor performance than percent correct score, though the two measures are highly correlated [see QPR 4 for Project N01-DC-9-2401, 1990 and the scatter plots included as Appendix 2 to this report]. While overall IT is not a linear function of percent correct, and certainly not every step in this ranking represents a significant difference, one to two percent differences in IT often do correspond to significant differences in percent correct scores.

Specific evidence of test - retest reliability for these overall IT scores is included for subject NP4 in Table V. For that subject, apparent non-simultaneous channel interactions forced us to reduce the minimum stimulation levels in order to fit 11 and 21 channel CIS processors without the perception of background noise. These arbitrary minimum levels were set 3 dB below the single channel thresholds normally used, and processors with these altered levels are labeled with asterisks in Table V and Appendix 1. Otherwise identical 4, 6, and 8 channel processors with both sets of minimum stimulation levels were compared to assess the extent of impact of the interaction effects. While a significant difference was noted between the two 8 channel processors for the male voice, evaluations of the two versions of the 4 and 6 channel processors – on different days – yielded overall information transmission scores that were identical for the male voice consonants and differed by only one percent for the female voice.

Table II.

Table III.

Table IV.

Table V.

Table VI.

While the protocol study is not yet complete, and important comparisons lack statistical significance in the absence of data from the final two subjects, some strong patterns have emerged already. Such findings need not await the additional year and a half required to complete the study.

All five subjects thus far enjoy good to excellent performance in terms of the cochlear implant population as a whole. [This is indicated by identification scores for CUNY sentences in quiet and at +10 dB with respect to multitalker speech babble and for CNC and NU6 monosyllabic words. The results of these open set tests for a subset of the studied processors will be discussed in a subsequent QPR.] Nevertheless, Tables II through VI reflect substantial variation among these five subjects: in median level of performance with the protocol processors [from 77% overall IT for 16 consonants by subject NP3 to 86% overall IT for 24 consonants by subject NP2], in range of performance variation across the protocol processors [from a range of 13% in overall IT for subject NP1 to a range of 33% for subject NP5], in performance differences between male and female voices [e.g., little difference for subject NP1 to relatively large differences for subject NP5], and in ranking of relative performance among the protocol processors. In the remainder of this report we shall discuss some patterns emerging from these various data, generally following the order of the four comparison groups represented by the columns in Tables II through VI.

1. Single Parameter Variations with respect to a Reference 6-channel CIS Processor

In Table VII we have collected the changes in overall information transmission associated with changes in each of six CIS processor parameters. The reference processor for each subject is the 6ref processor. Data for male and female voice medial consonant tokens are presented separately for each of the five subjects studied thus far. A positive change indicates that the parametric variation produced an improvement in performance with respect to the reference processor.

Table VII. Changes in Overall IT with Changes in Single CIS Parameters

NP1

NP1

NP2

NP2

NP3

NP3

NP4

NP4

NP5

NP5

m

f

m

f

m

f

m

f

m

f

rng

+7

+9

+1

+1

+3

+3

+2

+9

+3

+9

sth

-3

-2

+2

-1

+2

+2

+1

-2

+5

+2

ord

+2

+3

-2

-3

+2

0

0

0

+7

0

fst

-1

-1

-3

-2

+2

+1

+2

+1

-2

-1

pol

+1

+2

-2

-6

-2

-2

+3

-2

+11

+6

slo

-6

-2

-6

-3

+1

-2

-3

-5

+2

+4

In terms of average change in overall information transmission across these ten conditions, the six parametric changes rank as follows: rng (+4.7), ord and pol (+0.9), sth (+0.6), fst (-0.4), and slo (-2.1), suggesting that the rng change is likely to produce a significant improvement in processor performance, while the slo option is likely to reduce performance.

If we assume that two percent approximates the minimum significant difference in overall information transmission for these comparisons, we obtain the results shown in Table VIII for the proportion of cases in which each parametric manipulation produced a significant change:

Table VIII. Prevalence of Changes in Performance due to Parametric Manipulations

Manipulation

Improvement

Decrement

rng

80%

0

sth

50%

30%

ord

40%

20%

fst

20%

30%

pol

40%

50%

slo

20%

70%

Thus, extending the overall frequency range analyzed by a six channel CIS processor (from 350 - 3500 Hz to 350 - 9500 Hz) produced an improvement in performance in 8 of the 10 cases. On the other hand, reducing the pulse rate (from 833 pps to 250 pps on each channel) had a 70 % likelihood of reducing processor performance. [We note that even if the criterion for a significant difference were increased to 3% in overall IT, significant improvements would be found for rng in 70% of our cases and significant decreases in performance for slo in 60% of the cases.]

In terms of seeking an optimal fitting under the time constraints of a clinical setting, the extended overall frequency range clearly would be one parametric setting to try. After that, these results might suggest increasing the upper frequency cutoff point for the envelope smoothing filter from 200 to 400 Hz (sth), which produced an overall average change in IT of +0.6%, and an average change of 2.6% in the 50% of cases for which a significant improvement was obtained). Both changing from staggered to apex-to-base order of stimulation (ord) and reversing the polarity of the biphasic pulses (pol) produced +0.9% changes in IT overall and a 2% or greater improvement in 40% of the cases. For ord the average improvement when a significant improvement was obtained was 3.5%, while for pol the corresponding value [strongly influenced by the results for subject NP5] was 5.5%.

2. Variations among n-of-m Processors

Certain processors from this group have not yet been evaluated with subjects NP2 and NP3. One comparison for which the full 10 conditions are in hand is the effect of extended frequency range for the 833 pps n-of-m processors (rngfstvs. fst). For three of the subjects (NP2, NP4, and NP5) the extended frequency range version performed as well or significantly better for both male and female voices; the average improvement in those cases was 3.7%. For subject NP1 the extended range resulted in poorer performance for both male and female voices; average decrement 2.5%. For the remaining subject, NP3, extending the overall frequency range produced a marked (6%) improvement for the female voice, but an even larger (8%) decrement for the male voice.

Nine of the intended ten comparisons are available at present between the 833 pps normal frequency range n-of-m processor (fst) and the corresponding 250 pps version (slo). Use of the slower pulse rate produced a significant decrease in performance in six of the nine conditions tested; with an average decrement of 5.3% for those six. The slower rate produced a significant improvement in performance (4%) in one case of the nine.

3. Processor Performance vs. the Number of CIS Channels and Electrode Selection

The set of protocol processors designed to be as similar as possible except for number of CIS channels includes 4, 6, 8, 11, and 21 channels. That set of data is complete for the first five subjects, except for a 21 channel processor for subject NP2. As the number of channels decreases, the number of options for assignment of channels to electrodes increases. For 4 and 6 channel processors there are many potential choices.

In selecting electrodes for 4, 6, and 8 channel processors, we were guided by data from formal electrode discrimination tests in which each subject was asked to rank sequential stimuli from various pairs of electrodes in terms of perceived pitch. We also consulted dynamic range data for each electrode at the appropriate pulse rate(s) and pulse duration. In order to obtain some indication of the sensitivity of processor performance to the exact choice of electrodes, we tested at least two different sets of 6 electrodes for each subject. (These are identified as 6 and 6' in the "n-chan" columns of Tables II through VI. They are the same processors identified as ref and els in the "6-chan" columns. A third case was tested for subject NP5, labeled 6" and els', respectively.) In selecting sets of 11 electrodes, the choice between all even and all odd-numbered electrodes was made on the basis of which involved fewer limitations in terms of available dynamic range and channel discrimination data. Selection of the single electrode to be omitted from the 21 channel CIS processors was based on similar criteria. With the exception of one alternative set of 6 electrodes for subject NP3, every processor's channels spanned at least 15 electrodes, with most spanning 19 or more. The choices we have made for each of the first five subjects in this study are tabulated in Appendix 3 to this report. The electrode numbering system used in this study is apex-to-base, with electrode number one assigned to the apicalmost electrode in the array.

Only for subject NP1 did we observe any performance advantage in using more than eight CIS channels.

Figures 1 through 5 plot overall percent information transmission scores for male and female voice medial consonant data as a function of number of CIS channels. There is a separate plot for each of the five subjects. Notice that the differences in overall IT scores associated with different choices of 6-electrode sets are typically comparable to the differences associated with varying the number of CIS channels between 4 and 21. Those processor pairs corresponding to statistically significant differences in performance are identified in Appendix 4 of this report (based on ANOVA analyses of the block percent correct scores and post hoc comparisons among the means, as indicated by a significant ANOVA result for each of the five subjects).

Based on the results of these comparisons, we evaluated some additional processors with subjects NP4 and NP5: otherwise similar CIS processors with 1, 2, and 3 channels. In each case the selected electrodes were subsets of those used in the subject's 4-channel processor. More than one of those electrodes were evaluated in single channel processors for both subjects, to gauge performance sensitivity to electrode choice vis a vis number of channels.

Taken together, these results (for monopolar stimulation via the Nucleus 22 electrode array) indicate that (1) additional CIS channels become much less likely to produce significant improvements in processor performance once the number of channels exceeds four, (2) different choices of electrodes can produce significant differences in performance for CIS processors with as many as 6 channels, and (3) the principal potential benefit of additional implanted electrodes (beyond 4 to 6) may be the availability of alternative sites of stimulation rather than the availability of additional channels.

Figure 1.

Figure 2.

Figure 3.

Figure 4.

Figure 5.

4. Relative Performance of CIS, n-of-m, and monopolar SPEAK Processors

For two of these five subjects a 6 channel CIS processor performs significantly better than the best n-of-m processor tested (the 6rng was better for the female voice for subject NP1 and better for the male voice for NP2). For subject NP3, n-of-m processors were at least a good as any of the protocol CIS processors. For each of the remaining two subjects, CIS processors tended to be better for male voice while an n-of-m was better for female. Among current high speed pulsatile processors, then, both CIS and n-of-m offer significant benefits to some patients.

A monopolar version of the clinical SPEAK processor had been used on an everyday basis for an extended period by each of these five subjects. Its performance was compared with the protocol CIS and n-of-m processors with which the subjects had had only limited experience in the laboratory. It is hoped that wearable hardware can be supplied to one or more of the subjects of this study to allow extended daily use of a CIS and/or n-of-m processor.

Thus far in our study, one or more of the protocol CIS or n-of-m processors has performed significantly better than the chronic use monopolar SPEAK processor in nine of ten conditions [NP2's SPEAK processor supported her best performance with female voice]. A comparison of 6rng and nmrngfst with SPEAK, for instance, yields the differences in percent overall information transmission shown in Table IX [a positive difference corresponds to a higher score for 6rng or nmrngfst]. The average difference in overall IT is +6.4% for 6rng and +5.7% for nmrngfst.

Table IX. Differences in Overall IT vs.SPEAK

NP1m

NP1f

NP2m

NP2f

NP3m

NP3f

NP4m

NP4f

NP5m

NP5f

6rng

+9

+11

+4

-2

+3

+8

+4

+4

+12

+11

nmrngfst

+5

+7

+1

-1

-2

+14

0

+10

+9

+14

We have conducted a one way control ANOVA analysis comparing the relative overall IT performance of four protocol processors: 6ref, 6rng, nmrngfst, and SPEAK. The analysis is based on differences in percent overall IT among the four processors for each of the five subjects. The ANOVA indicates significant differences among the processors (p < 0.01). Post hoc comparison of the means indicates that performance is significantly better with 6rng than with SPEAK, significantly better with nmrngfst than with SPEAK, and significantly better with 6rng than with 6ref. These results are consistent with there being benefits both to the faster 833 pps rate of 6ref, 6rng, and nmrngfst, and to the extended overall frequency ranges of 6rng, nmrngfst, and SPEAK.

Appendix 1.

Summaries of Medial Consonant Identification Results for Protocol Processors











Appendix 2.

Relationship between Percent Correct and Overall Information Transmission Scores

24 Consonant Data: NP1-NP2

16 Consonant Data: NP3-NP5

Appendix 3.

Selected Sets of Electrodes for CIS Processors with Various Numbers of Channels

Appendix 4.

Statistically Significant Differences in CIS Performance vs. Number of Channels

III. Plans for the Next Quarter

Our plans for the next quarter include the following:

1. A site visit for the project by Drs. Terry Hambrecht and William Heetderks (July 23).

2. Presentation of project results in two invited lectures at the Third European Symposium on Paediatric Cochlear Implantation, in Hannover, Germany (June 6-8).

3. Speech reception and evoked potential studies with Nucleus percutaneous subjects NP5 (weeks beginning on May 13 and May 20), NP4 (weeks beginning on June 3 and June 10), and NP2 (July 8-10).

4. Recording of tokens for new speech tests.

5. Completion of new current sources, for use in studies to evaluate very high rates of stimulation (e.g., 10000 pulses/s on each channel) in multichannel CIS processors.

6. Continued development of the Evoked Potentials Laboratory, including incorporation of a 22-bit A/D converter (in part to allow recording of both stimulus pulse artifact and evoked potentials in the linear range of the recording system).

7. Continued development of a new type of compression function for use in CIS processors, designed to mimic principal features of the noninstantaneous compression found in normal hearing at the interface between sensory hair cells and adjacent neurons.

8. Speech reception and evoked potential studies with Ineraid subject SR2 (July 22-26).

9. Possible continued studies with our local patient having standard Nucleus implants on both sides.

10. Possible application of one or more of the Geneva/MEEI/RTI portable processors in continuing studies to evaluate possible learning effects with extended use of CIS processors.

11. Continued analysis of speech reception and evoked potential data from prior studies.

12. Continued preparation of manuscripts for publication.

IV. Acknowledgments

We gratefully acknowledge the support by Cochlear Corporation of the device, surgical, and audiological costs of the 22 electrode percutaneous study described in this report, and of the travel and subsistence expenses of the subjects while participating in our studies.

We thank subjects NP1, NP2, NP3, NP4, NP5 for their participation in those studies, and subjects SR15 and SR9 for their participation in other studies conducted this quarter.

Appendix A. Summary of Reporting Activity for this Quarter

Reporting activity for the last quarter, covering the period from February 1 to April 30, 1996, included the following presentations:

Wilson BS: Strategies for representing speech information with cochlear implants. Invited lecture, Sixth Symposium on Cochlear Implants in Children, Miami Beach, FL, February 2-3, 1996.

Finley CC, Wilson BS: Spatial distribution of stimulus field and intracochlear evoked potentials as recorded from unstimulated electrodes of implanted cochlear prostheses. Nineteenth Midwinter Research Meeting, Association for Research in Otolaryngology, St. Petersburg Beach, FL, February 4-8, 1996.

Lawson DT: Cochlear implant research at Research Triangle Institute and Duke University Medical Center. Invited lecture, Annual Meeting of the American Association for Audiology, Salt Lake City, UT, April 20-21, 1996.