Considerering Pure GPU Model for an Audio Fingerprinting System
Abstract
The demand for protecting, managing and indexing digital audio is growing quickly. As a viable solution for this, fingerprinting is receiving increased attention. An audio fingerprinting system extracts feature vectors (called fingerprint) from a query audio, finds matching in a database (DB), and retrieves the appropriate audio signals associated with the matching fingerprint in the DB.
An audio fingerprint is a compact low level content-based digest of an audio signal. It provides the ability to identify short, unlabeled audio signals in a fast and reliable way. There are several practical requirements which a successful audio fingerprinting system should satisfy. First, it should be able to identify corrupted audio signals in spite of degradations. Second, it should be able to identify the signals of only a few seconds long. Finally, it should be computationally efficient, both in calculating of the fingerprints and in searching for the best match in the DB. Besides, an audio fingerprinting system should be scalable, i. e., it has to operate well with very large DBs. A good option is to apply high performance techniques in the solution.
The Graphics Processing Unit (GPU) provides high performance computing through the threading model. Its main characteristics are high computational power, constant development and low cost and provides a kit of programming called CUDA. It provides a GPU-CPU interface, thread synchronization, data types, among others.
CUDA supports several types of memory that can be used to achieve high execution speeds in applications.
The global memory is large but slow and tends to have long access latencies and finite access
bandwidth, whereas the shared memory is on-chip memory, small and fast. The variables that reside in this type of memory can be accessed at very high speed in a highly parallel manner. Other memories are constant and texture memory which are read-only.
In this work, we propose implement the whole audio fingerprint system in a pure GPU model, using all properties offered by GPU: shared memory, constant memory, atomic functions, coalescing access, among others. We show different optimizations through the use of CUDAmemory hierarchy. We achieve to reduce the total number of accesses to the global memory using shared memory and to improve considerably the performance. Finally, the experimental results are presented.
An audio fingerprint is a compact low level content-based digest of an audio signal. It provides the ability to identify short, unlabeled audio signals in a fast and reliable way. There are several practical requirements which a successful audio fingerprinting system should satisfy. First, it should be able to identify corrupted audio signals in spite of degradations. Second, it should be able to identify the signals of only a few seconds long. Finally, it should be computationally efficient, both in calculating of the fingerprints and in searching for the best match in the DB. Besides, an audio fingerprinting system should be scalable, i. e., it has to operate well with very large DBs. A good option is to apply high performance techniques in the solution.
The Graphics Processing Unit (GPU) provides high performance computing through the threading model. Its main characteristics are high computational power, constant development and low cost and provides a kit of programming called CUDA. It provides a GPU-CPU interface, thread synchronization, data types, among others.
CUDA supports several types of memory that can be used to achieve high execution speeds in applications.
The global memory is large but slow and tends to have long access latencies and finite access
bandwidth, whereas the shared memory is on-chip memory, small and fast. The variables that reside in this type of memory can be accessed at very high speed in a highly parallel manner. Other memories are constant and texture memory which are read-only.
In this work, we propose implement the whole audio fingerprint system in a pure GPU model, using all properties offered by GPU: shared memory, constant memory, atomic functions, coalescing access, among others. We show different optimizations through the use of CUDAmemory hierarchy. We achieve to reduce the total number of accesses to the global memory using shared memory and to improve considerably the performance. Finally, the experimental results are presented.
Full Text:
PDFAsociación Argentina de Mecánica Computacional
Güemes 3450
S3000GLN Santa Fe, Argentina
Phone: 54-342-4511594 / 4511595 Int. 1006
Fax: 54-342-4511169
E-mail: amca(at)santafe-conicet.gov.ar
ISSN 2591-3522