Greg – ISOmusica

Utilizing the finest audio to MIDI conversion AI tools, performances of simpler musical pieces are rendered with satisfying accuracy. However, as the complexity of the music and the nuance of the artist’s performance increase, the limitations of current AI technology in producing MIDI files that do justice to the original performances become glaringly apparent. In order to discuss this in more detail, I’ll focus on a specific example many piano enthusiasts might be familiar with.

As a sign of his appreciation for having become an American citizen in 1945, Vladimir Horowitz worked up a piano arrangement of John Phillip Sousa’s march “The Stars and Stripes Forever!” which he mesmerized audiences with for about six seasons. The audiences can be heard cheering when he begins playing it and even gasping in the most brilliant passages where Horowitz uses a three-handed effect using nearly the entire span of the keyboard to accompany himself in presenting Sousa’s main trio melody three times in grander and grander fashion. There are surviving recordings of a handful of live performances from 1945 to 1951, all monaural, and mostly poorly recorded by today’s standards. Even with those limited recordings, using AI transcription, some editing, and a Disklavier (or similar) piano, we can play back something very much like what those audience members heard in Carnegie Hall 75 years ago.

The role of the editor is paramount in this process, much like it was a century ago with reproducing piano rolls. The editor’s decisions shape the final product. I processed six of Horowitz’s performances using my preferred AI application. The output file for each performance ranged from remarkably accurate to an unrecognizable mess; even the best one was clearly far from perfect. The direction from there depends on four main areas the editor needs to consider.

Firstly, and separately from the AI matter completely, the editor needs to decide what the goal is. One extreme is to work toward creating a faithful recreation of a specific performance as precisely as possible including any false notes, memory lapses, and errors in the original performance as was done by Zenph in their remarkable Art Tatum and Oscar Peterson issues for example. Another extreme might be something like the piano rolls that were heavily quantized from live performances since the end goal in that case was something with a constant tempo intended for dancing along with. My preferred approach is to create something new and something enjoyable using all available resources including my own creativity to achieve something very much in the style of some original performance. In my work on “The Stars and Stripes Forever!” I chose to correct false notes because I expect most performing artists would prefer not to have played them in the first place.

Secondly, delving deeper into AI-related editing matters, with the original audio recording(s) available for frequent comparison, the editor will need to watch for notes missed by the AI and notes that the AI inserts that are harmonic overtones of other notes that the AI will sometimes misinterpret as played notes. An understanding of the harmonic overtone series comes in handy when editing a MIDI file when you don’t have a reference score available. Having a score as a guide is a great help. In this case there are several ‘by ear’ transcriptions that fans have made over the decades (that all differ from each other) and an informal manuscript of segments Horowitz himself penned. Horowitz’s sketch includes some textural variants that he never included in any of his recorded performances. I chose to include these unfamiliar curiosities in my edited file as a novelty and to share a new perspective on Horowitz’s thoughts about his arrangement that even many fans might not be aware of.

The third area to consider is dynamic contrast. With exceptions, I would describe the dynamic range of the raw output files that I have seen to be generally flat and lacking lifelike dynamic contrasts. In exceptional cases I have been impressed by the quality of how chords are voiced remarkably accurately, suggesting that the future of this could be very exciting as the AI continues to improve by accepting a wider range of input files of varying quality. One of Horowitz’s outstanding characteristics was his dynamic shading, especially in the pianissimo range, which is exceptionally difficult to emulate as a human sitting at the keys, as an AI, or as a MIDI editor! Getting very comfortable with all of the MIDI note velocity editing tools available in your digital audio workstation of choice is as important as the editor’s creativity needed in determining how to apply those tools.

Finally, the use of the damper or sustain pedal is necessary in most classical piano performances from the Classical era onward as well as in many other styles of playing. The ability of AI to interpret nuanced damper pedaling and the ability of a reproducing piano of any kind to faithfully reproduce that pedaling is exceptionally difficult to achieve and deserving of an entire writeup of its own. I’ll just say the technology is not there yet. The best to expect here from the raw output is on/off approximations. Considering the fine degree of regulation needed in the final reproducing instrument to fully appreciate the nuanced use of the damper pedal, coupled with the lack of standardization across different brands and even among pianos of the same brand in varying states of regulation, it is not immediately apparent to me, as an editor, why extensive time should be devoted to this aspect. Accepting a less perfect end result in exchange for a predictable playback experience across devices is a compromise I am willing to make. For this project, I included some una corda pedal in passages where it sounded like Horowitz might have used it and the vast majority of damper pedaling was as interpreted by AI. I edited it in places where phrases were clearly blurring and notes not being held that can clearly be heard in the original performance. When using AI that cannot interpret damper pedaling, or when it fails to do so accurately, I have connected a MIDI damper pedal to my computer and played back the MIDI file in record mode while pedaling along with the music as an alternative to manually editing pedaling with the mouse/keyboard which is tedious and yields unnatural sounding results.

We are not at the point where the AI alone or even with quite a lot of help can perfectly duplicate an artist’s live performance, and we certainly cannot capture the electricity or intimacy that a performing artist can generate from the stage. However, by joining AI capabilities with human creativity and sensitivity, we can continue to explore and enjoy music in new, exciting ways.

Disklavier encoded file for playback on a software controlled piano
John Phillip Sousa: “The Stars and Stripes Forever!” in the style of Vladimir Horowitz. Including some rare variants found in unreleased recordings and an unpublished manuscript.

https://www.youtube.com/watch?v=LhuRZ8c7gYI

One fan’s notated version of Horowitz’s performance. https://www.fishlet.com/wp-content/uploads/2018/10/SS_Forever.pdf