[B! audio] yu4uのブックマーク

GitHub - asteroid-team/torch-audiomentations: Fast audio data augmentation in PyTorch. Inspired by audiomentations. Useful for deep learning.

yu4u 2022/05/10

audio

リンク

GitHub - fkubota/spectrogram-tree: ディレクトリ内に音データがあってそのスペクトログラムをすばやく確認したい時にさくっと使えるツール

You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session. Dismiss alert

yu4u 2021/04/02

audio

リンク

torchaudioとtorchlibrosaの実行速度に違いはあるのか？ - 備忘録

はじめに PyTorchには音声系データを処理するのに便利なtorchaudioというライブラリが存在する。 pytorch.org 一方、音声系データの処理に便利なlibrosaというパッケージが存在する。 librosa.org さらにtorchlibrosaという、librosa内部の行列計算まわりをPyTorchで置き換えたパッケージが存在する。 github.com ここで一つ疑問：「で、結局どれ使えばいいの？（どれが早いの？）」この疑問が気になったので、CPU実行とGPU実行における実行時間を比較検証するための簡易的なスクリプトを書いて調べてみたということである。環境 OS、ハードウェア、ドライバ Ubuntu 18.04 GeForce RTX 2070 CUDA 11.2 CUDA Driver 460.32.03 ソフトウェア Python 3.6.9 libros

yu4u 2021/03/26

audio

リンク

鳥コンペ反省会資料

6位の解法です。

yu4u 2021/03/18

audio
kaggle

リンク

GitHub - koukyo1994/kaggle-rfcx: A repository for Rainforest Connection Species Audio Detection Challenge

yu4u 2021/03/04

audio

リンク

GitHub - zcaceres/spec_augment: 🔦 A Pytorch implementation of GoogleBrain's SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition

yu4u 2020/07/28

audio

リンク

深層生成モデルによるメディア生成

2. 自己紹介亀岡弘和（かめおかひろかず） 略歴： 2007 東京大学大学院情報理工学系研究科システム情報学専攻博士課程修了 2007 日本電信電話株式会社入社 NTTコミュニケーション科学基礎研究所配属 2011 東京大学大学院情報理工学系研究科システム情報学専攻客員准教授 2015 NTTコミュニケーション科学基礎研究所特別研究員 2016 国立情報学研究所客員准教授 2019 筑波大学大学院システム情報工学研究科客員准教授 専門：  音声・音楽などの音響信号を対象とした信号処理や機械学習  計算論的聴覚情景分析，音源分離，音声合成・変換など 3. 本講義の目的と目標 深層学習(AI)研究に触れる  深層学習(AI)の研究の面白さや凄さを体感する  特に深層生成モデルと呼ぶ近年発展が著しい分野を扱う 温故知新（故きを温ねて新しきを知る）  深層生成モデ

yu4u 2020/02/10

gan
audio

リンク

音響信号に対する異常音検知技術と応用

2019年電子情報通信学会ソサイエティ大会チュートリアル AT-2. 異常検知と教師なし学習の理論と応用の講演資料です。

yu4u 2019/09/20

リンク

https://www.robots.ox.ac.uk/~vgg/data/voxceleb/competition.html

yu4u 2019/05/22

リンク

VoxCeleb

VoxCeleb is an audio-visual dataset consisting of short clips of human speech, extracted from interview videos uploaded to YouTube 7,000 + speakers VoxCeleb contains speech from speakers spanning a wide range of different ethnicities, accents, professions and ages. Utterance Lengths 1 million + utterances All speaking face-tracks are captured "in the wild", with background chatter, laughter, overl

yu4u 2019/05/22

audio
speech

リンク

【Python】 Constant-Q 変換 (対数周波数スペクトログラム) - 音楽プログラミングの超入門（仮）

関連記事高速な Constant-Q 変換【Python】高速な Constant-Q 変換 (with FFT) - 音楽プログラミングの超入門（仮）導入：対数周波数スペクトログラム Pythonで短時間フーリエ変換（STFT）と逆変換 - 音楽プログラミングの超入門（仮）上記の記事で，音響信号を周波数スペクトルの時間変化を表すスペクトログラムに変換する短時間フーリエ変換を扱いました．簡単にアルゴリズムを復習すると，音響信号を一定の幅で切り取ってフーリエ変換するという処理を少しずつずらして行っていくことでスペクトログラムを得ていました．ここで，音響信号を一定の幅で切り取ってフーリエ変換するということについて少し考えてみましょう．切り取られた信号が以下のようなものであることを考えます．窓幅が 256 サンプル(fs=4[kHz])で、ここでは信号を 1000[Hz]，400[

yu4u 2018/12/26

丁度素晴らしい記事を見つけてしまった。なるほど、固定長のmelとかlog-scaleのFFTだと低周波がなまっちゃうのが改善されるのか

リンク

Robust Audio Adversarial Example for a Physical Attack

This document summarizes and cites research on adversarial examples against speech recognition systems. It discusses papers that generated audio adversarial examples to target attacks on speech-to-text models, characterized temporal dependencies in audio adversarial examples, and developed approaches for creating targeted audio adversarial examples against black box speech recognition systems.Read

yu4u 2018/11/29

リンク

分布あるいはモーメント間距離最小化に基づく統計的音声合成

1. 06/12/2018©Shinnosuke Takamichi, The University of Tokyo 分布あるいはモーメント間距離最小化に基づく統計的音声合成東京大学助教高道慎之介 (@forthshinji) ステアラボ人工知能セミナー招待講演 (2018/10/12) 2. /47 自己紹介  経歴 – 奈良先端大博士後期課程修了 (2016) – 東京大学助教 (兼担：同大学 DMMラボ連携講座特任助教)  研究テーマ – 音声合成変換 / speech synthesis, voice conversion – 音声信号処理 / speech signal processing – 音声なりすまし検出 / anti-spoofing – 深層学習 / deep learning – 音声コミュニケーション拡張 / augmented spee

yu4u 2018/10/17

voice
audio

リンク

Mthesis_takamichi

yu4u 2017/10/25

audio
slide

リンク

GitHub - tyiannak/pyAudioAnalysis: Python Audio Analysis Library: Feature Extraction, Classification, Segmentation and Applications

You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session. Dismiss alert

yu4u 2017/05/19

リンク

音声／音楽生成・音響処理分野におけるEnd-to-End系の論文情報とか各種スライド情報とかを忘れないうちにメモ - 備忘録

なんだか最近、当該分野でEnd-to-End系の論文が急に増えたなぁということで、忘れないうちに自分用にメモ。面白そうな論文情報も含めて。もうね、正直言ってお腹いっぱいなんですけど、流れには逆らえないですね。ほとんどarXivなので、信頼性は担保されておらず、あくまで参考までに。気が向いたら一言コメントつけます。 ※音声認識系はあえて外しました。 Paper Neural Audio Synthesis of Musical Notes with WaveNet Autoencoders URL https://arxiv.org/abs/1704.01279 Blog & Demo NSynth: Neural Audio Synthesis Google Brain and DeepMind’s work Tacotron: A Fully End-to-End Text-To-Spe

yu4u 2017/04/07

リンク

AudioSet

A sound vocabulary and datasetAudioSet consists of an expanding ontology of 632 audio event classes and a collection of 2,084,320 human-labeled 10-second sound clips drawn from YouTube videos. The ontology is specified as a hierarchical graph of event categories, covering a wide range of human and animal sounds, musical instruments and genres, and common everyday environmental sounds. By releasing

yu4u 2017/03/09

リンク

GitHub - librosa/librosa: Python library for audio and music analysis

You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session. Dismiss alert

yu4u 2017/02/13

リンク

iPhoneアプリ開発と留学 : [iPhone 開発][iPad 開発] Core Audio を使うときに役立ちそうなホームページor資料

2010年12月20日15:20 カテゴリiPhone開発iPad開発 [iPhone 開発][iPad 開発] Core Audio を使うときに役立ちそうなホームページor資料ただのリンクの羅列ですが以下に列挙します。 - My Codex Leicester http://nagano.monalisa-au.org/?page_id=808 - 実践! iPhoneアプリ開発 : 楽器アプリの作り方 (1) - iPhoneのオーディオフレームワーク http://journal.mycom.co.jp/column/iphone/010/index.html - Objective-Audio http://objective-audio.jp/cat54/core-audio-iphone/ - A tasty pixel http://atastypixel.com/blo