Leveraging Electric Guitar Tones and Effects to Improve Robustness in Guitar Tablature Transcription Modeling

Hegel Pedroza¹,

Wallace Abreu²,

Ryan M. Corey³,

Iran. R. Roman⁴

1. National Autonomous University of Mexico
2. Federal University of Rio de Janeiro
3. Discovery Partners Institute & University of Illinois Chicago
4. New York University

Paper Zenodo Code

Abstract

Guitar tablature transcription (GTT) aims at automatically generating symbolic representations from real solo guitar performances. Due to its applications in education and musicology, GTT has gained traction in recent years. However, GTT robustness has been limited due to the small size of available datasets. Researchers have recently used synthetic data that simulates guitar performances using pre-recorded or computer-generated tones and can be automatically generated at large scales. The present study complements these efforts by demonstrating that GTT robustness can be improved by including synthetic training data created using recordings of real guitar tones played with different audio effects. We evaluate our approach on a new evaluation dataset with professional solo guitar performances that we composed and collected, featuring a wide array of tones, chords, and scales.

Training Data

Our synthetic guitar solo performances, melodies and chords consist of notes played with varied guitar tones and audio effects, maximizing tone diversity. We achieve this by randomly selecting tones from a vast bank of examples when generating an audio track. This strategy aligns with our hypothesis that such diversity will enhance the model’s robustness. It allows the model to concentrate on pitch content and guitar string+fret inference, while disregarding specific timbre qualities. Hear a comparison of GuitarSet to GuitarSetFx and GuitarProFX below and see the CQT spectrogram of their early seconds!

GuitarSet Track

GuitarSetFx track

GuitarProFx Track

EGSet12

EGSet12 is a new evaluation set with twelve original solo electric guitar performances (31.65 seconds avg. duration, totaling 379.8 seconds). These pieces were composed by a professional musician and guitar player for this project, showcasing the full tonal range of the electric guitar across diverse melodies and chord complexities. EGSet12 encompasses a broad spectrum of styles, including pop, funk, jazz and twelve-tone, reflecting varied tonalities, keys, rhythms, and modes. Hear some samples below!

Track 02

Track 06

Track 09

Results

The following results show increased model robustness on multi-pitch and tablature prediction metrics via our proposed method.

	Multi-pitch estimation			Tablature estimation
	F₁	P	R	F₁	P	R	TDR
TabCNN^[1]	0.638±0.060	0.819±0.080	0.530±0.067	0.447±0.071	0.565±0.089	0.375±0.067	0.695±0.075
+ GuitarSetFX	0.740±0.055	0.835±0.085	0.679±0.052	0.557±0.088	0.619±0.100	0.518±0.084	0.755±0.106
+ GuitarProFX	0.719±0.061	0.839±0.082	0.647±0.068	0.585±0.084	0.658±0.073	0.541±0.087	0.819±0.075

Table 1: TabCNN performance on EGSet12. Each cell is metric averaged across the twelve tracks (± denotes standard deviation). Top row: performance as trained by Wiggins & Kim. Bottom rows: performance when training data includes simulated tracks.

Figure 1: Each column is a two-second EGSet12 excerpt, comparing predictions made by models trained using GuitarSet, with and without GuitarProFX, against ground truth. Each circle is a tracked note on the guitar fretboard over time and vertical lines indicate musical beats.

In the video we illustrate the performance of TabCNN trained with GuitarSet versus our method, using GuitarSet + GuitarProFx, when evaluated on EGSet12.