Musical Plagiarism Detection: An Approach with Artificial Intelligence (Al) and Machine Learning (ML)  (Detecção de plágio musical: uma abordagem com inteligência artificial (IA) e aprendizado de máquina (AM))

Autores

  • João V. N. Ribeiro Faculdade São Paulo Tech School - SPTech Autor
  • Juan G. Silva Faculdade São Paulo Tech School - SPTech Autor
  • Mateus F. Cunha Faculdade São Paulo Tech School - SPTech Autor
  • Yan C. Santos Faculdade São Paulo Tech School - SPTech Autor
  • Alexander Barreira Faculdade São Paulo Tech School - SPTech Autor
  • Marise Miranda Faculdade São Paulo Tech School - SPTech Autor https://orcid.org/0000-0002-1775-4541

Palavras-chave:

Musical plagiarism, Machine Learning, TF-IDF, FFT, STFT, MFCC, PCP, Copyright, AWS, Modular Architecture

Resumo

https://doi.org/10.5281/zenodo.15678650

This study explores the development of an automated system for detecting musical plagiarism, a journey intertwined with advanced signal processing techniques, machine learning (ML), and textual analysis. Each step, each note, reflects the complexity and sensitivity of the topic. Data collection, a cornerstone of this process, requires a robust, diverse, and dynamic database that embraces different musical genres, historical periods, and harmonic and melodic variations. As highlighted by Smith (1997), diversity in datasets is crucial to avoid biases in analysis and to enhance the models' generalization capacity.

The system's architecture, designed to be modular and scalable, employs AWS services with the precision of a conductor: EC2 orchestrates model hosting, S3 stores data like valuable scores, and Lambda executes asynchronous processing. This structure not only supports massive data volumes but also enables real-time analysis and the addition of new features seamlessly. Every line of code respects the guidelines of the Copyright Law (Law 9,610/98), which allows the use of protected works for academic and research purposes.

The initial results offer promising chords. The combination of MFCC and PCP, key tools in tonal analysis, proved powerful in identifying harmonic similarities with high accuracy. However, the system also hits a few dissonant notes: applying techniques like STFT and KNN to long-duration music reveals computational limitations, emphasizing the need for a more robust infrastructure and fine-tuned parameterization. Logan (2000) had already pointed out the value of MFCC in adapting audio analysis to human perception, and here, its impact resonates.

Additionally, the system boldly expands its repertoire with added features. One of these is the classification of musical genres based on song lyrics, a task led by the Naive Bayes Multinomial and TF-IDF vectorization. Early performances show impressive results in categorizing genres such as Country, Rock, and Rap, proving that statistical models can still be skilled soloists in the textual domain.

But not everything is in harmony. The high computational cost of Support Vector Machines (SVM)-based models and the challenges of parameterizing tools like PCP for larger datasets are dissonances yet to be resolved. Even so, the progress made suggests that integrating hybrid models and deep neural networks will bridge these limitations. Recent studies, such as those by Logan (2000) and Serra et al. (2008), already point in this direction, and the future seems promising.

In addition to the technical challenges, it is necessary to consider the legal and ethical impacts of the system in the musical landscape. A notable case is the lawsuit between Robin Thicke and Marvin Gaye over the song Blurred Lines, which highlighted how harmonic similarity can become a central point in plagiarism disputes. In 2015, a jury determined that the song infringed the copyrights of "Got to Give It Up," initially resulting in a fine of $7.4 million, later reduced to $5.3 million, in addition to 50% of future royalties being allocated to Gaye's family (THE GUARDIAN, 2018). The solution proposed in this article, by providing a detailed and impartial technical analysis, could significantly contribute to clarifying the nature of similarities between songs, promoting greater fairness in judicial decisions. When applied responsibly and without harming the rights of artists, the system has the potential to protect both original creations and the integrity of the creative process, encouraging artistic development and innovation in the musical field.

Publicado

2025-06-16