[B! audio] [2ページ] imyutaroのブックマーク

GitHub - archinetai/audio-ai-timeline: A timeline of the latest AI models for audio generation, starting in 2023!

You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session. Dismiss alert

imyutaro 2023/02/08

audio
music

リンク

GitHub - bmcfee/pyrubberband: python wrapper for rubberband

imyutaro 2023/02/07

audio

リンク

GitHub - facebookresearch/demucs: Code for the paper Hybrid Spectrogram and Waveform Source Separation

Important: As I am no longer working at Meta, this repository is not maintained anymore. I've created a fork at github.com/adefossez/demucs. Note that this project is not actively maintained anymore and only important bug fixes will be processed on the new repo. Please do not open issues for feature request or if Demucs doesn't work perfectly for your use case :) This is the 4th release of Demucs

imyutaro 2023/02/07

リンク

GitHub - facebookresearch/encodec: State-of-the-art deep learning based audio codec supporting both mono 24 kHz audio and stereo 48 kHz audio.

You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session. Dismiss alert

imyutaro 2023/01/28

リンク

GitHub - acids-ircam/RAVE: Official implementation of the RAVE model: a Realtime Audio Variational autoEncoder

imyutaro 2023/01/28

リンク

Home

imyutaro 2023/01/18

audio

リンク

GitHub - nttcslab/eval-audio-repr: EVAR ~ Evaluation package for Audio Representations

imyutaro 2023/01/13

audio

リンク

https://twitter.com/gyakuse/status/1611364936193830917?s=12&t=E_qldvNkhHqHOa3Fz3_1iA

imyutaro 2023/01/07

speech
audio

リンク

GitHub - microsoft/unilm: Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities

You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session. Dismiss alert

imyutaro 2023/01/06

audio
speech

リンク

GitHub - b04901014/FG-transformer-TTS: Official implementation for the paper Fine-grained style control in transformer-based text-to-speech synthesis.

imyutaro 2023/01/06

audio
speech

リンク

Introducing Whisper

Whisper is an automatic speech recognition (ASR) system trained on 680,000 hours of multilingual and multitask supervised data collected from the web. We show that the use of such a large and diverse dataset leads to improved robustness to accents, background noise and technical language. Moreover, it enables transcription in multiple languages, as well as translation from those languages into Eng

imyutaro 2022/12/31

audio
dl

リンク

GitHub - nttcslab/msm-mae: Masked Spectrogram Modeling using Masked Autoencoders for Learning General-purpose Audio Representations

imyutaro 2022/12/31

audio

リンク

msm-mae/misc/Note_viz_and_play_reconstruction.ipynb at main · nttcslab/msm-mae

imyutaro 2022/12/31

audio

リンク

利用できるAudio Modelのまとめ｜Beluga

音声モデルを利用する機会があったのでPyTorchとHugging Faceで利用できるAudio Model(音声モデル)とそれのリファレンスである論文のリストをまとめました。 Hugging FaceとはTransF ormerを用いた機械学習モデルを利用できるライブラリであり、米国企業であるHugging Face, Inc.が提供しています。 PyTorchで2022年の12月5日時点で利用できるAudio Modelは以下の通りで5モデル存在していました。 ConvTasNet : Conv-TasNet: Surpassing Ideal Time–Frequency Magnitude Masking for Speech Separation DeepSpeech : Deep Speech: Scaling up end-to-end speech recognition

imyutaro 2022/12/26

リンク

A Complete Guide to Audio Datasets

imyutaro 2022/12/16

リンク

GitHub - fkubota/spectrogram-tree: ディレクトリ内に音データがあってそのスペクトログラムをすばやく確認したい時にさくっと使えるツール

You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session. Dismiss alert

imyutaro 2021/07/17

audio
tool

リンク

環境音認識のコンペティションDCASE2020で世界1位を獲得しました

LINE株式会社は、2023年10月1日にLINEヤフー株式会社になりました。LINEヤフー株式会社の新しいブログはこちらです。 LINEヤフー Tech Blog DataLabsのSpeech teamに所属している小松です。環境音認識に関する基礎研究を行っています。環境音認識とは我々の身の回りで起こる多種多様な音、たとえば咳や話し声、物音などを機械に自動的に検出・認識させる技術です。この技術は音に関する分野の中で最もホットで急成長しているトピックの一つであり、環境音を専門に扱う国際コンペティション/ワークショップ、DCASEも毎年開催されています。そのコンペティション部門であるDCASE2020 Challengeのtask 4に、LINEは昨年度のインターン成果 [1] を主軸にした名古屋大学、ジョンズ・ホプキンス大学との合同チームで参加し、世界1位を獲得することができました。本