학과소식 읽기(Development of Parametric Filter Banks for Sound Feature Extraction)

학과소식

HOME > 학과활동 > 학과소식

학과소식 게시글의 상세 화면
Development of Parametric Filter Banks for Sound Feature Extraction
작성일 2024-07-12 조회수 52 작성자 채상우
첨부 : Development of Parametric Filter Banks for Sound Feature Extraction.pdf
1. 제목 : Development of Parametric Filter Banks for Sound Feature Extraction (소리 특징 추출을 위한 파라메트릭 필터 뱅크 개발) 2. DOI : 10.1109/ACCESS.2023.3321798 3. 저널 : IEEE Access 4. 주요내용 : A kind of learnable parametric filter banks is proposed in this paper. Parametric filter banks refer to selecting learnable parameters from the original filter banks and learning a parameter filter banks that adapts to the current dataset through the learning ability of a neural network. We use three types of filter banks, including the popular Mel filter banks, the Gammatone filter banks that mimics the response of the human auditory filter in the cochlea, and our own Gaussian filter banks. The performance evaluation of parametric filter banks is conducted on a speech recognition dataset called Audio-MNIST which contains the spoken digit pronunciation and a self-created news speech dataset called Ten Languages which include ten different language countries. Comparative experiments are conducted on both Convolutional Neural Network (CNN) and Full-Connected Neural Network (FCNN) for classification. The experimental results show that the parametric filter banks outperforms the original filter banks in the comparative experiment, and the parametric Gammatone filter banks achieves the highest accuracy of 98.77% and 92.14% on the Audio MNIST dataset and test data in ten languages. In order to further confirm the performance of the model, the number of class data in the dataset is different. We also use weighted average F1-score as the evaluation metric, with a maximum of 0.99 and 0.92. 본 논문에서는 학습 가능한 파라메트릭 필터 뱅크를 제안한다. 파라메트릭 필터 뱅크는 원래 필터 뱅크에서 학습 가능한 매개변수를 선택하고, 신경망의 학습 능력을 통해 현재 데이터셋에 적응하는 필터 뱅크를 학습하는 것을 의미한다. 세 가지 유형의 필터 뱅크를 사용했으며, 이는 인기 있는 멜 필터 뱅크, 달팽이관에서 인간 청각 필터의 반응을 모방한 Gammatone 필터 뱅크, 그리고 자체 개발한 Gaussian 필터 뱅크를 포함한다. 파라메트릭 필터 뱅크의 성능 평가는 음성 인식 데이터셋인 Audio-MNIST(발음된 숫자가 포함된 데이터셋)와 자체 제작한 뉴스 음성 데이터셋인 Ten Languages(10개의 다른 언어를 사용하는 국가들로 구성된 데이터셋)에서 수행되었다. 비교 실험은 분류를 위해 Convolutional Neural Network(CNN)와 Full-Connected Neural Network(FCNN)에서 수행되었다. 실험 결과는 파라메트릭 필터 뱅크가 비교 실험에서 원래 필터 뱅크보다 뛰어나다는 것을 보여주었으며, 파라메트릭 Gammatone 필터 뱅크는 Audio-MNIST 데이터셋에서 98.77%, 10개 언어의 테스트 데이터에서 92.14%의 최고 정확도를 달성했다. 모델의 성능을 더욱 확실히 하기 위해 데이터셋에서 클래스 데이터의 수가 다른 경우에도 가중 평균 F1-스코어를 평가 지표로 사용했으며, 최대 0.99와 0.92를 기록했다. 5. 기대 효과 In this paper, learnable parametric filter banks are proposed. The experiments are conducted on two different datasets with durations of approximately 9.5 and 8 hours, respectively. The datasets include digital sounds and news sounds of ten countries. From the results of the obtained accuracy and weighted average F1-score, parametric filter banks constructed by the learning ability of neural networks can better adapt to different types of data compared to manually designed original filter banks. Moreover, it enhances adaptability and generalization performance, thereby better reduces the risk of overfitting. 본 논문에서는 학습 가능한 파라메트릭 필터 뱅크를 제안한다. 실험은 약 9.5시간과 8시간의 두 가지 다른 데이터셋에서 수행되었다. 데이터셋은 10개국의 디지털 사운드와 뉴스 사운드를 포함한다. 얻어진 정확도와 가중 평균 F1-스코어의 결과로부터, 신경망의 학습 능력에 의해 구성된 파라메트릭 필터 뱅크는 수작업으로 설계된 원래의 필터 뱅크에 비해 다양한 유형의 데이터에 더 잘 적응할 수 있음을 알 수 있다. 또한, 이는 적응성과 일반화 성능을 향상시켜 과적합의 위험을 줄이는 데 도움이 된다. 6.향후 계획 Due to the variability in ECG characteristics among individuals, doctors may misdiagnose during hospital examinations, which is very dangerous. In the field of machine learning, there has been extensive research on ECG signals, including 1. ECG Database, 2. Preprocessing, 3. Deep Learning Methodology, 4. Evaluation Paradigm, and 5. Performance Metric. Therefore, my future plans mainly focus on the classification and recognition of ECG signals. 심전도(ECG)의 특성이 사람마다 다르기 때문에 병원 검사 시 의사들이 오진하는 경우가 발생할 수 있으며, 이는 매우 위험하다. 머신러닝 분야에서도 심전도 신호에 대한 연구가 많이 이루어지고 있으며, 1. ECG 데이터베이스, 2. 전처리, 3. 딥러닝 방법론, 4. 평가 패러다임, 5. 성능 지표 등의 영역에서 연구가 진행되고 있다. 따라서 앞으로의 계획은 주로 심전도 신호의 분류 및 인식에 집중할 것이다. 관심 계신 사람 Survey 논문 보세요! (Survey:(2023)Deep Learning-Based ECG Arrhythmia Classification A Systematic Review)