Second Quarterly Progress Report
 
 

January 1 through March 31, 1999

NIH Project N01-DC-8-2105
 
 

Speech Processors for Auditory Prostheses
 
 
 
 

Prepared by

Dewey Lawson, Blake Wilson and Mariangeli Zerbi
 
 

Center for Auditory Prosthesis Research

Research Triangle Institute

Research Triangle Park, NC 27709


CONTENTS


I. Introduction

II. Measures of performance over time following substitution of CIS for CA speech processors

III. Plans for the next quarter

IV. Announcements

V. Acknowledgments

Appendix 1: Summary of reporting activity for this quarter


 I. Introduction

The main objective of this project is to design, develop, and evaluate speech processors for implantable auditory prostheses. Ideally, such processors will represent the information content of speech in a way that can be perceived and utilized by implant patients. An additional objective is to record responses of the auditory nerve to a variety of electrical stimuli in studies with patients. Results from such recordings can provide important information on the physiological function of the nerve, on an electrode-by-electrode basis, and also can be used to evaluate the ability of speech processing strategies to produce desired spatial and/or temporal patterns of neural activity.

Work in this second quarter included:

In this report we present an update of results from ongoing studies to measure performance over time following substitution of a CIS for a CA speech processor. Initial results from these studies have been presented in QPR 1 of our prior project (NIH N01-DC-5-2103). The initial results included speech reception measures for Ineraid subjects SR3 and SR10. The present report includes subsequent measures for those subjects and measures to date for Ineraid subjects SR9, SR15 and SR16. Results from the various studies conducted in the present quarter, outlined above, will be presented in future reports.

 

II. Measures of performance over time following substitution of CIS for CA speech processors

In addition to our many studies comparing various processing strategies acutely in the laboratory, we have conducted some chronic studies of possible learning effects with long-term use of wearable processors. In this report we update results from such a study using the Med-El CIS-LINK hardware platform. In collaboration with Stefan Brill and other colleagues at the University of Innsbruck we have been able to employ a variety of different envelope smoothing filters in such processors, as well as a variety of different mapping law functions. We also, of course, have had access to all the parametric adjustments of the standard clinical fitting system.

The four patients currently participating in chronic use studies with wearable processors from our laboratory are SR3 fitted in April 1995, SR15 and SR16 fitted in June 1997, and SR9 fitted in August 1997. This group was selected to represent a wide range of initial performance with wearable CIS processors - from 20% correct sound alone on a 16 consonant test to 75% on a 24 consonant test. All of these subjects had used their Ineraid compressed analog (CA) processors for years prior to our fitting them with continuous interleaved sampling (CIS) processors running on CIS-LINK devices. Each had been exposed briefly to a variety of CIS processors during one or more previous visits to our laboratory, and has continued to participate in other acute studies during brief visits to our, and in some cases other, laboratories.

Each subject was tested in our laboratory at the time of first fitting with a wearable CIS processor (and during subsequent visits), using consonant identification tests and a variety of open set tests of appropriate difficulty. At least ten presentations of each consonant token were included for each condition evaluated, and there was no feedback as to correct or incorrect responses. The laboratory consonant tests were identical to those we have employed for many years, using the Iowa videodisc recordings. Two subjects whose initial performance was sufficiently good were given a set of tape recorded 24 consonant identification tests to be self-administered at two-week intervals for 16 weeks. Each such prerecorded test was preceded by a recorded segment that guided the subject to an appropriate setting of the tape player's output level (the subject was instructed to use the same speech processor settings as for a conversation in a quiet room). The battery-powered tape player had no tone control, and its output was directly connected to the portable processor's auxiliary input using an impedance matching cable assembly. The speech processor's microphone was disabled during the consonant tests. Answer sheets were mailed to our laboratory for analysis. These take-home tests, developed only to provide some guidance as to the timing of return visits, have agreed quite well with laboratory measurements and provide a finer grained assessment of early learning after changing from a CA to a CIS processor. The two subjects selected to participate in the take-home tests had professional backgrounds (registered nurse, Ph.D. mathematician) that made them excellent candidates for such self-administered testing.

Our approach has been to try to provide each subject with the highest level of chronic performance possible at each point in the study. Accordingly, when acute comparisons in our laboratory have indicated that some alternative processor design might benefit a subject and such an alternative design could be realized on the CIS-LINK hardware, we have not hesitated to change the processor in chronic use. On such occasions, comparison testing with both processors was repeated upon the subject's next visit to our laboratories. Table I summarizes parametric information about all the processor designs involved in these chronic use studies.

Table I. Parameters for Chronic Processors.
 

Subject Code

Processor ID

Day Use Began

Nature of Change

Number of Channels

Pulse Rate (p/s)

Pulse Duration (m s/phase)

Stimul. Order

LPF 

cutoffreq. (Hz)

Channel Dyn. Range [min-max] (dB)

SR3

7

0

6

1026

80

stag.

400

12-20

8

200

lower thresholds, channels 1-4

"

"

"

"

"

15-24

8a

823

lower LPF cutoff freq.

"

"

"

"

200

"

8

1135

raise LPF cutoff

"

"

"

"

400

"

SR9

7b

0

5

833

40

stag.

400

11-20

9b1

218

add channel

6

"

"

"

"

10-19

7b

372

remove channel

5

"

"

"

"

11-20

SR10

1

0

6

1170

70

a-b

400

10-12

98a

1512

rate, duration, stim order, LPF

"

1626

40

stag.

200

7-12

SR15

124

0

3

523

40

1,4,2

200

7-9

1xBP

252

remove channel

2*

558

"

alt.

"

"

124c

531

lower thresholds and MCLs

3

523

"

1,4,2

"

7-10

SR16

B

0

5

500

40

stag.

200

9-11

E4

264

raise rate, LPF cutoff freq.

"

2424

"

"

400

11-15

H4

589

raise most MCLs

"

"

"

"

"

11-16

* a two-channel "Breeuwer / Plomp" design; see text.

Included in Table I are parameters for an additional subject, SR10. That subject had shown a succession of dramatic improvements in performance in a series of acute studies with CIS processors in our laboratories before being fitted with a CIS-LINK device in August 1994 by Michael Dorman at theUniversity of Utah. During SR10’s subsequent visits to our laboratories we have, from time to time, measured his performance with that device, using the same tests and methods as with the chronic use subjects who received their wearable devices from us.

It will be convenient in this report to present our results in two parts. We first will discuss the data for those subjects whose level of performance allowed comparisons based on identification of 24 medial consonants. Then we will turn to subjects who were evaluated with similar tests involving only 16 consonants.

Subjects with relatively high levels of performance.

Consonant identification test data for subject SR3 are summarized in Figures 1 and 2, which show percent correct identification and percent overall information transmission, respectively, for consonant tokens uttered by male and female voices. Among the features of these data are (1) rapid and relatively smooth improvement in performance over the first few months of experience with the new processing strategy, as indicated by the take-home test results and confirmed by the laboratory tests at the beginning and end of that period; (2) evidence of continued improvement beyond the first year of experience; and (3) lack of any indication of further improvement in the third year.

Figure 1. Consonant identification scores as a function of duration of experience with a chronic CIS processor. Subject SR3. Each percent correct score represents at least 10 presentations of each of 24 medial consonant tokens with a standard deviation of the mean of ±2%. The symbols distinguish data for male and female talkers.

Figure 2. Percent overall information transmission scores for the consonant identification data of Figure 1
 
 

Figure 3 shows monosyllabic word identification results over the same period of experience and exhibits essentially the same features, while more strongly suggesting a decrease in performance during the third year. Both whole word and individual phoneme scores are included. The data at about 825 and about 1135 days of experience include comparisons of two processor variations (8 and 8a, with different cutoff frequencies for the low-pass smoothing filters; see Table I).

The earlier change for this subject (from processor 7 to processor 8, at day 200) amounted only to the use of revised pulse amplitude values for threshold and most comfortable levels of stimulation in each channel, increasing the minimum channel dynamic range from 12 to 15 dB and the maximum channel dynamic range from 20 to 24 dB.

Figure 3. Monosyllabic word identification as a function of duration of experience with a chronic CIS processor. Subject SR3, male talkers. Symbols distinguish among words correct and phonemes correct scores, and whether each 50 word list used is from NU6 or CNC recordings.
 
 
Corresponding data for subject SR16 – but covering a period of only about 20 months -- are shown in Figures 4 through 6. Again, the take-home tests indicate rapid progress over the first three to four months, especially for the female voice. For this subject, however, there is no evidence for further improvements in performance after the first year of experience. Processors B and E4 are both included in the consonant data at about 265 days, and processors E4 and H4 were compared during the most recent visit at about 590 days, with the latter designs performing slightly better in each case. SR16 subsequently has requested a return to processor E4 for chronic use.

Figure 4. Consonant identification scores as a function of duration of experience with a chronic CIS processor, ±2%. Subject SR16.

Figure 5. Percent overall information transmission scores for the consonant identification data of Figure 4.
 
 

Figure 6. Monosyllabic word identification as a function of duration of experience with a chronic CIS processor. Subject SR16, male talkers. Symbols distinguish among words correct and phonemes correct scores, and NU6 and CNC word lists.
 
Figures 7 though 9 include similar data for subject SR10. As noted above, this subject’s initial fitting with a CIS-LINK chronic device was done and documented elsewhere. Our data indicate continued improvements in performance over his second, third, and perhaps even fourth years of experience. Processor 1a received the higher consonant scores during the most recent visit, but the monosyllabic word results shown for that visit were obtained with processor 1.

Figure 7. Consonant identification scores as a function of duration of experience with a chronic CIS processor, ±2%. Subject SR10.

Figure 8. Percent overall information transmission scores for the consonant identification data of Figure 7.

Figure 9. Monosyllabic word identification as a function of duration of experience with a chronic CIS processor. Subject SR10, male talkers. Symbols distinguish among words correct and phonemes correct scores, and NU6 and CNC word lists.

All three subjects discussed thus far – SR3, SR16, and SR10 – clearly enjoy excellent overall performance. Each of them showed significant improvements with experience using his or her chronic device. Improvement was quite rapid over the first three to four months in the two cases in which finer grained data are available from take-home consonant identification tests. In one case improvement with experience seems to have been completed in the first year, while in the other two cases it extended into the second and third years, respectively.

Subjects with relatively low levels of performance.

We turn now to the two subjects whose processors have supported lower overall levels of performance, making it appropriate to evaluate progress with tests using only 16 medial consonants. Here the picture is quite different.

Consonant identification data for subject SR9 are shown in Figures 10 and 11. Over one year of chronic experience with a CIS processor there has been no indication of improved performance.


   

Figure 10. Consonant identification scores as a function of duration of experience with a chronic CIS processor. Subject SR9. Each percent correct score represents at least 10 presentations of each of 16 medial consonant tokens with a standard deviation of the mean of ±3-4%. The symbols distinguish data for male and female talkers.

 
 

Figure 11. Percent overall information transmission scores for the consonant identification data of Figure 10.
 
While SR9’s level of performance was sufficient to allow use of monosyllabic word identification tests, there was no clear evidence of improved performance with chronic experience in those data either, as shown in Figure 12.

Figure 12. Monosyllabic word identification data as a function of duration of experience with a chronic CIS processor. Subject SR9, male talkers. Symbols distinguish among word and phoneme correct scores for NU6 word lists.

Addition of a 6th channel (substitution of processor 9b1 for processor 7b, compared in consonant tests at about 210 and 370 days) provided no long term advantage. Recent acute studies with this subject, however, to be reported in a later QPR, have indicated some potential for improvement with alternative processor designs outside the capabilities of the wearable device used in the present chronic study.

As shown in Figures 13 and 14, consonant recognition scores for subject SR15 also have failed to demonstrate clear improvement in performance with chronic use experience. With identification scores for 16 medial consonant tokens near 20%, monosyllabic word identification tests remain inappropriate for this subject.

Figure 13. Consonant identification scores as a function of duration of experience with a chronic CIS processor. Subject SR15. Each percent correct score represents at least 10 presentations of each of 16 medial consonant tokens with a standard deviation of the mean of ±3-4%. The symbols distinguish data for male and female talkers.

Figure 14. Percent overall information transmission scores for the consonant identification data of Figure 13.
 
Extensive laboratory comparisons of a wide range of different processing strategies with this subject indicated that attempts to provide more information, whether temporally or spatially, often resulted in reduced rather than improved performance. At the same time, analysis of her consonant confusions suggested that her performance would benefit enormously from such basic information as a reliable voiced/unvoiced indication. Accordingly, we decided to give this subject long term experience with a processor designed to convey only a very limited amount of information, but information chosen in terms of maximum potential benefit. Initially, this approach resulted in slow (523 p/s) sparse (40 us/phase pulses) stimulation by a three channel CIS processor using electrodes 1, 2, and 4.

After about 250 days we substituted a two channel design based on the work of Breeuwer and Plomp ["Speechreading supplemented with frequency-selective sound-pressure information," J. Acoust. Soc. Am.76, 686-691 (1984).] We were able to obtain a reasonable approximation to the desired two non-contiguous frequency bands (364 – 707 Hz and 2235 – 4470 Hz) by programming the CIS-LINK device as if for a three-channel processor, and specifying very small amplitudes for pulses to the (unused) middle channel associated with the (ignored) intervening band. The bands analyzed by this design (processor 1xBP) were 350 – 877 Hz and 2196 – 5500 Hz. After about 280 days experience with processor 1xBP, we compared it with processor 124c in consonant tests at about day 530 of the study. Transmission of voicing information was indeed better with the two-channel processor, but still quite modest (18% and 19% for male and female talkers, respectively, vs. 7% and 9% with the three-channel design). While there was no change in the 40% overall information transmission for the male talker, there was some improvement for the female talker both in overall information transmission (40% vs. 34%) and percent correct identification (25% vs. 15% ±3%) using processor 1xBP.

Table II contains all percent correct and information transmission scores for medial consonant identification tests with all processors and all subjects discussed in this report, along with all monosyllabic word scores. A fuller context for voiced/unvoiced attribute performance of SR15’s processors may be found there. As another example of additional insights available from detailed information transmission analysis, note the dramatic difference in duration attribute scores between male and female talkers for subject SR16.

Summary

The two subjects with the highest levels of performance in common (SR3 and SR16, with NU6 word scores in excess of 50%) also shared substantial improvements in performance with chronic use experience, including particularly rapid improvement over their first few months with the new processing strategy.

Performance by a third subject (SR10) came into the same range after extended experience. All three of these subjects showed substantial improvements over the first year, with two of them continuing to show significant improvements through the second year. Performance improvements for the third subject continued at least through the third year of experience.

Neither of the two subjects with relatively poor levels of performance (SR 9 with NU6 word scores less than 10% and SR15 with still poorer performance) showed significant sustained improvement with chronic use of a wearable processor, though each showed substantial performance differences in laboratory acute studies with various processor designs.

Table II. Detailed Results of Consonant and Word Identification Tests

Data for all subjects and all processors presented in the Figures and/or discussed in the text. From left to right, the columns contain: (1) identification code for the research subject, (2) days of chronic experience with CIS processor at time of test, (3) processor identification code, (4) number of different consonants included in medial consonant identification tests (16 for some subjects, 24 for others, chosen on the basis of overall level of performance), (5 – 12) medial consonant identification data for a male talker using University of Iowa videodisc recordings, (5) percent correct consonant identification, (6) percent overall information transmission, (7) percent voicing information transmission, (8) percent envelope information transmission, (9) percent frication information transmission, (10) percent place of articulation information transmission, (11) percent duration information transmission, (12) percent frication information transmission, (13 – 20) medial consonant identification data for a female talker using University of Iowa videodisc recordings, (13) percent correct consonant identification, (14) percent overall information transmission, (15) percent voicing information transmission, (16) percent envelope information transmission, (17) percent frication information transmission, (18) percent place of articulation information transmission, (19) percent duration information transmission, (20) percent frication information transmission, (21 – 22) monosyllabic word identification data for a male talker using Cochlear Corporation audio tape recordings of NU #6 word lists, (21) percent word identification, (22) percent phoneme identification, (23 – 24) monosyllabic word identification data for a male talker using Cochlear Corporation audio tape recordings of CNC word lists, (23) percent word identification, (24) percent phoneme identification

Subject

Day

Proc.

Cons.

M %c

M Ovl

M Voi

M Env

M Fri

M Pla

M Dur

M Nas

F %c

F Ovl

F Voi

F Env

F Fri

F Pla

F Dur

F Nas

NU6 w

NU6 p

CNC w

CNC p

SR3

0

5

24

79

85

63

78

86

77

90

79

57

71

70

63

57

40

26

70

56

78

50

77

0

5

24

81

86

76

81

94

76

90

54

66

78

84

79

66

54

30

75

14

7

24

80

86

78

85

94

73

90

65

65

81

88

79

57

57

29

75

26

7

24

79

86

87

86

94

76

82

59

61

79

74

76

55

53

32

77

41

7

24

80

87

73

84

92

78

94

78

63

79

90

84

49

48

36

80

55

77

24

84

91

80

87

100

88

100

68

67

79

78

81

56

55

27

77

71

7

24

82

89

85

87

100

79

86

68

65

81

86

84

58

57

20

73

83

7

24

82

89

79

86

100

81

94

70

67

83

87

84

61

59

36

77

101

7

24

84

89

86

89

100

85

90

78

70

83

90

87

62

64

38

80

109

7

24

85

91

79

87

100

89

100

62

71

83

85

86

59

66

36

62

66

80

64

84

201

8

24

90

94

100

93

100

88

100

88

70

81

86

87

56

68

34

63

70

85

66

82

263

8

24

87

91

87

89

100

87

94

82

71

82

80

82

54

69

45

59

79

89

70

86

823

8

24

89

92

97

86

86

85

85

70

78

86

83

84

63

73

43

72

80

89

74

90

826

8a

24

93

95

97

97

96

94

100

100

71

85

76

86

70

63

30

78

80

90

901

8a

24

92

95

97

95

100

92

100

87

78

86

85

87

67

69

41

88

1124

8a

24

89

92

85

90

94

90

100

83

72

84

90

90

57

61

39

72

64

83

1135

8a

24

88

92

90

98

96

84

94

85

80

89

93

91

70

67

43

100

66

84

1135

8

24

74

87

1136

8

24

70

84

SR9

0

7b

16

71

75

91

91

53

50

49

85

1

7b

16

39

56

31

39

7

27

13

63

6

26

8

81

208

7b

16

62

68

54

68

37

44

62

90

39

55

24

45

24

26

15

62

8

35

3

78

218

9b1

16

59

70

84

91

45

39

23

100

42

54

57

57

20

16

16

54

6

80

362

9b1

16

60

68

65

77

60

46

30

58

35

56

47

64

15

21

14

36

6

28

3

82

372

7b

16

62

70

95

98

45

41

39

64

53

66

65

76

39

38

34

69

8

26

SR10

466

1

24

70

81

68

76

88

48

55

100

65

76

56

73

64

35

23

93

42

68

40

70

985

1

24

80

84

100

88

78

65

78

87

71

80

90

92

78

54

28

100

1512

1

24

80

85

96

92

74

63

87

100

71

83

100

93

79

54

31

100

56

77

1512

98a

24

86

89

100

94

79

70

94

93

71

81

97

88

88

51

38

100

SR15

0

124

16

20

38

5

16

15

11

44

15

21

31

7

12

8

6

7

13

248

124

16

20

48

5

20

7

6

49

9

16

33

2

15

2

5

5

10

530

1xBP

16

23

39

18

24

12

9

57

17

25

40

19

25

9

9

9

16

531

124c

16

21

40

7

26

18

11

49

13

15

34

9

11

5

7

10

9

SR16

0

B

24

74

85

80

89

96

55

94

83

65

79

70

73

61

49

37

60

1

B

24

58

78

12

B

24

71

85

74

86

92

57

86

85

67

78

70

70

69

50

37

77

26

B

24

84

90

78

91

100

80

94

93

66

79

57

73

60

54

36

68

40

B

24

80

90

77

90

94

77

94

100

71

82

62

71

55

47

26

95

54

B

24

83

89

81

88

94

71

100

79

70

81

75

80

68

51

32

88

69

B

24

85

90

83

93

85

77

100

100

74

82

62

77

71

53

35

100

82

B

24

78

89

86

89

90

74

100

88

73

82

55