Right, which is why I'm wondering if the odd effects are a result of that interpolation. A recording with a sufficiently high sample rate would be less likely to experience odd interpolation effects. The higher the sample right, the more you could "stretch out" the sound without any noticeable problems.
A low sample rate and stretching out the sound basically results in the audio equivalent of the pixelation seen in overly-enlarged images.
That's an apt analogy, I even thought about saying something about trying to blow up half-tone newsprint as an analogy before I settled on a picket fence.
The higher the sample rate, the more closely you're approaching the analog waveform with more smaller discrete digital steps. Depending on the interpolation used, it could be that your algorithm is creating digital, analog, or pseudo-analog artifacts. A lot, maybe even most of these use short-time Fourier transforms to do this sort of stuff, but there's still a ton of variation of what they do with it.
http://en.wikipedia.org/wiki/Short-time_Fourier_transformA lot of it can have to do with the "window" of time the short time or short term Fourier transform is using. They could be trying to isolate several frequency values and then play them out longer, or they could be taking an average frequency value for that timeslice and repeating it, or maybe even just taking x milliseconds and repeating it seven times before moving onto the next slice and calling that a 1/7th playback speed.
If you read a book and every sentence was just repeated seven times before moving onto the next one, I'm not sure anyone would think the book was actually 7x longer.
The Fourier transform is an incredibly useful tool for analyzing analog spectrum info for many things with a waveform, a lot of what SETI@home is doing is fast Fourier transforms to quickly sift through the raw radio data and look for spikes of carrier signals etc. The problem is that no matter what you did to measure the sounds of the waveforms, once you try to "stretch the data out", it's making up what isn't there.
Equally "made up" but a more accurate way to both time-shift and frequency shift an audio track might be to try and discretely pick out every frequency you can, and then produce each one for a longer period of time and mash it all back together, but I've no idea if that would actually sound any better. And I'll guess that it's possibly prohibitive in terms of computing power or time. (Considering what's done with image and video information these days, even in realtime, while apples to oranges, it might just be prohibitive to write in terms of the code...) OTOH, it might still be trivially easy for CPU's and DAC's to do because of all the increases in power, unless the sample rate is unreasonably high. Like something several times higher than CD-quality audio.