[3ページ] datasetの人気記事 139件 - はてなブックマーク

81 - 120 件 / 139件

新着順人気順

絞り込み

検索対象
ブックマーク数
期間
セーフサーチ

datasetの検索結果81 - 120 件 / 139件

LAION
- 7 users
- laion.ai
- テクノロジー
- 2021/09/13
Large-scale Artificial Intelligence Open Network TRULY OPEN AI. 100% NON-PROFIT. 100% FREE. LAION, as a non-profit organization, provides datasets, tools and models to liberate machine learning research. By doing so, we encourage open public education and a more environment-friendly use of resources by reusing existing datasets and models. IMPORTANT NOTICE Current LAION-5B Safety Review
- dataset
GitHub - activeloopai/deeplake: Database for AI. Store Vectors, Images, Texts, Videos, etc. Use with LLMs/LangChain. Store, query, version, & visualize any AI data. Stream data in real-time to PyTorch/TensorFlow. https://activeloop.ai
- 7 users
- github.com/activeloopai
- テクノロジー
- 2021/02/23
Deep Lake is a Database for AI powered by a storage format optimized for deep-learning applications. Deep Lake can be used for: Storing data and vectors while building LLM applications Managing datasets while training deep learning models Deep Lake simplifies the deployment of enterprise-grade LLM-based products by offering storage for all data types (embeddings, audio, text, videos, images, pdfs,
- dataset
- AI
機械学習用データセット一覧（フリー素材）
- 6 users
- phy-lum.com
- テクノロジー
- 2020/02/06
表示：著作権者の表示義務有り営利目的（非営利）：利用は非営利に限る改変（改変禁止）：一切の編集を禁じる継承：頒布をする場合は、元のライセンスを受け継ぐ必要あり人の行動のデータセット Google DeepMind Youtubeから収集した人間の行動に関するデータセット https://deepmind.com 利用条件：表示人の行動のデータセット University of Central Florida サーフィン、メイク、髭剃り、などの認識用のデータセット http://crcv.ucf.edu/ 利用条件：特記無し。ページ中央部に連絡先が載っています。動きのデータセット MIT-IBM Watson AI Lab モーションに関するデータセット。人間以外にも、犬、パンダ、流れる水、アニメーションも含まれて居ます。 http://moments.csail.mit.ed
- 機械学習
- データ
- 学習
- dataset
- python
GitHub - tsuruoka-lab/BSD: The Business Scene Dialogue corpus
- 6 users
- github.com/tsuruoka-lab
- テクノロジー
- 2020/08/05
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session. Dismiss alert
- dataset
Releasing PAWS and PAWS-X: Two New Datasets to Improve Natural Language Understa
- 6 users
- ai.googleblog.com
- テクノロジー
- 2019/10/03
Philosophy We strive to create an environment conducive to many different types of research across many different time scales and levels of risk. Learn more about our Philosophy Learn more
- deep learning
- nlp
- dataset
- google
- language
- ai
トップ研究者のリスト – Shimodaira Lab
- 6 users
- stat.sys.i.kyoto-u.ac.jp
- 学び
- 2022/05/14
論文の被引用数等をベースにした世界の研究者データが公開されています．そのデータからトップ研究者のリストを作成してみました．研究の各分野について，世界の上位２０名と，日本の上位２０名のリスト（PDFファイル）です．こんな感じ．それぞれの研究者の分野は，メイン（分野１）とサブ（分野２）の両方で登録されているので，同じ人がリストに２回出てくることもあります．研究者データでは，各研究者がどの分野で論文を書いているかの成分値が自動推定されて，その上位２分野が利用されています．成分値が低い研究者でも，他分野での被引用数が多い場合は上位に掲載されることがあります．そこで，成分値が１５％以上となっている場合に限定したリストも作成しました．世界と日本で上位３００名のリスト（エクセルファイル）です． PDFファイルのダウンロード（成分値は考慮せず上位２０名）
- 論文
- 世界
- ランキング
- dataset
- 研究
- 日本
学習をせずにCNNの精度がわかる？データセットの複雑度を測る新たな指標CSGの登場！
- 6 users
- ai-scholar.tech
- テクノロジー
- 2019/10/30
3つの要点 ✔️CNNの精度と相関のある、データセットの複雑度である指標Cummulative Spectral Gradient (CSG)の提案 ✔️CSGを用いることで、訓練データを大幅に削減可能 ✔️従来手法と比較して、CSGはCNNの精度と高い相関はじめに近年、機械学習手法を評価するために、様々な大規模データセットが作成されています。最も有名なデータセットの１つはImageNetでしょう。このデータセットには1000クラス・1000万枚の画像が含まれています。このデータセットは画像分類のタスクを解く手法を評価するために作成されたものです。この他にも、多種多様なデータセットが作成され、評価に用いられています。上記に述べたように、様々なデータセットが作成されていますが、データセットはどのような基準で作成されているのでしょうか。その１つの基準は「複雑度」です。つまり、現状の機械学
Kickstarting AI for Code: Introducing IBM’s Project CodeNet | IBM Research Blog
- 6 users
- research.ibm.com
- テクノロジー
- 2021/05/12
Project CodeNet is a large dataset aimed at teaching AI to code. Project CodeNet is a large dataset aimed at teaching AI to code that consists of some 14M code samples, about 500M lines of code, in 55+ different programming languages. "Software is eating the world,” US entrepreneur Marc Andreessen famously wrote in 2011. Fast-forward to today – software is in financial services and healthcare, sma
「登記情報提供サービス」と「登記簿図書館」で無料で出来ること・有料で出来ること - 刑裁サイ太のゴ３ネタブログ
- 6 users
- keisaisaita.hatenablog.jp
- 学び
- 2021/08/06
はじめに色々ありましたが，初心に立ち返って，弁護士業務の解説をしようと思いまして，このテーマを取り上げました。標題の「登記情報提供サービス」と「登記簿図書館」について語ります。あちこちで布教しているのですが，あまり広がっているように思えません。そこで，皆さんにご紹介するために筆を執りました。なお，私には本記事を執筆したことにより一銭も入りませんのでどうぞご安心ください。べ，別に泣いてないのでどうぞご安心ください。登記情報提供サービス概説電気通信回線による登記情報の提供に関する法律（平成11年法律第226号）の第4条第1項の業務を行う者（指定法人）に指定されているのが「一般財団法人民事法務協会」で，その「一般財団法人民事法務協会」が運営しているのが「登記情報提供サービス」です。 www1.touki.or.jp ざっくりと言うと，不動産登記，商業登記，動産・債権譲渡登記をネット
- dataset
- data
- service
- あとで読む
Software/Data - Yahoo! JAPANの研究開発 - ヤフー株式会社
- 6 users
- randd.yahoo.co.jp
- テクノロジー
- 2022/06/02
yskip: Incremental Skip-gram Model with Negative Sampling 概要 skip-gram model with negative samplingの逐次学習アルゴリズムのC++実装です。技術解説（Yahoo! JAPAN Tech Blog）: https://techblog.yahoo.co.jp/oss/yskip/ 論文: Incremental Skip-gram Model with Negative Sampling（外部サイト）提供方法 LSTM-VAE for text modeling 詳しくは "Better Exploiting Latent Variables in Text Modeling" をご覧ください。 Data VFD Dataset (Japanese) 概要言語処理のトップ会議EMNLPにて
- yahoo
- dataset
- japan
AWS COVID-19 パブリックデータレイクの探索 | Amazon Web Services
- 6 users
- aws.amazon.com
- テクノロジー
- 2020/05/07
Amazon Web Services ブログ AWS COVID-19 パブリックデータレイクの探索 AWS COVID-19 のデータレイク — 新型コロナウイルス (SARS-CoV-2) とこれに関連する病気である COVID-19 の広がりおよび特性についての、またはそれに関する最新のデータセットが収集され、一元化されたリポジトリが現在利用可能になりました。詳細については、COVID-19 データの分析用のパブリックデータレイクをご参照ください。世界的には、このデータを収集するためにいくつかの取り組みが進行中であり、AWS はパートナーと協力して、この重要なデータを自由に利用できる状態にし、最新の状態に保てるように尽力しています。このデータは、質問、独自のデータセットとの混合、独自のデータレイクへの新しい洞察の取り込みを行うためにすぐに利用できます。AWS は、パンデミック監視
- covid19
- aws
- dataset
Cost-efficient and scalable ML-experiments in AWS with spot-instances, Kubernetes and Horovod
- 5 users
- blog.rosebud.ai
- テクノロジー
- 2020/03/25
UPDATE (February 27, 2020): I thank everyone for the interest, questions and suggestions during ScaledML 2020 poster session. The poster PDF is available for download here. In the coming days I will be updating this blog post with the most recent version of the k8s manifests we use for training. At Rosebud AI we invent new tools for authoring and editing visual content. We combine established comp
- MLOps
- dataset
- machinelearning
- storage
- cloud
- aws
- あとで読む
半教師あり学習を用いた精密農業のための雑草密度と分布推定
- 5 users
- arxiv-check-250201.firebaseapp.com
- テクノロジー
- 2020/11/05
Weed Density and Distribution Estimation for Precision Agriculture using Semi-Supervised Learning 雑草の制御されていない成長は、作物の収量と品質に深刻な影響を与える可能性があります。除草剤を無制限に使用すると、生物多様性が変化し、環境汚染を引き起こします。代わりに、雑草が蔓延している地域を特定することで、これらの地域の選択的な化学処理を支援できます。農場の画像分析の進歩により、雑草を特定するためのソリューションが生まれました。ただし、これらのアプローチの大部分は、手動で注釈を付けた大量の画像を必要とする教師あり学習方法に基づいています。結果として、これらの監視されたアプローチは、多種多様な植物種が栽培されているため、個々の農民にとって経済的に実行不可能です。この論文では、自律型ロボットから取得
- 雑草
- 統計
- AI
- 開発
- news
level-5.global
- 5 users
- level-5.global
- テクノロジー
- 2019/07/24
This domain may be for sale!
- dataset
- 自動運転
Twitter日本語評判分析データセット
- 5 users
- www.db.info.gifu-u.ac.jp
- テクノロジー
- 2019/11/27
ツイートの評判情報をクラウドソーシングにより分析し，分析結果を公開しています．データのダウンロードデータはこちらです．データはbz2で圧縮されています．ツイートの本文は含まれていません． 2015年から2016年ごろのツイートを対象にしています．データ内容携帯電話などのツイートを中心に，534,962件のツイートがの分析が行われています．このツイート量は，他のデータセットと比較しても多いです．作成者の知る限り最も規模が大きく，種類数の大きなデータセットです．最低 4 名以上の作業者により評価を行い，多数決を行った結果です．だいたい5名以上の作業者により評価を行っています．データの構造CSV ファイルで記述しています．列番号は以下の内容に該当します．ツイートのIDです．10000から始まる番号です．ジャンルIDです．次のジャンルがあります．10000: エクスペリア，Xperi
- NLP
- dataset
- Twitter
unarXive 2022: All arXiv Publications Pre-Processed for NLP, Including Structured Full-Text and Citation Network
- 5 users
- arxiv.org
- テクノロジー
- 2023/03/29
Large-scale data sets on scholarly publications are the basis for a variety of bibliometric analyses and natural language processing (NLP) applications. Especially data sets derived from publication's full-text have recently gained attention. While several such data sets already exist, we see key shortcomings in terms of their domain and time coverage, citation network completeness, and representa
- dataset
- あとで読む
Kaggleデータセットまとめ - Qiita
- 5 users
- qiita.com/hiro6000
- テクノロジー
- 2019/10/19
Fintech Santander Customer Transaction Prediction https://www.kaggle.com/c/santander-customer-transaction-prediction/data Kaggle datasets in finance category (ファイナンス系kaggleデータ一覧) https://www.kaggle.com/tags/finance Bitcoin Price Prediction (LightWeight CSV) https://www.kaggle.com/team-ai/bitcoin-price-prediction Uniqlo (FastRetailing) Stock Price Prediction https://www.kaggle.com/daiearth22/uniqlo
- kaggle
- qiita
- dataset
- Python
- プログラミング
The Pile: An 800GB Dataset of Diverse Text for Language Modeling
- 5 users
- arxiv.org
- テクノロジー
- 2021/01/14
Recent work has demonstrated that increased training dataset diversity improves general cross-domain knowledge and downstream generalization capability for large-scale language models. With this in mind, we present \textit{the Pile}: an 825 GiB English text corpus targeted at training large-scale language models. The Pile is constructed from 22 diverse high-quality subsets -- both existing and new
- dataset
palmerpenguins R data package
- 5 users
- allisonhorst.github.io
- 世の中
- 2021/01/27
The goal of palmerpenguins is to provide a great dataset for data exploration & visualization, as an alternative to iris.
- dataset
ウェブから能動学習の観点で有益なデータを取得する
- 5 users
- speakerdeck.com/joisino
- テクノロジー
- 2023/06/09
Active Learning from the Web (WWW 2023) https://arxiv.org/abs/2210.08205 の紹介スライドです。 GitHub: https://github.com/joisino/seafaring 人工知能学会全国大会 (JSAI 2023) の発表で使用したスライドです。 https://confit.atlas.jp/guide/event/jsai2023/subject/4L3-GS-4-01/tables
- dataset
- あとで読む
MUSDB18 | SigSep
- 5 users
- sigsep.github.io
- テクノロジー
- 2019/12/02
# MUSDB18 The musdb18 is a dataset of 150 full lengths music tracks (~10h duration) of different genres along with their isolated drums, bass, vocals and others stems. musdb18 contains two folders, a folder with a training set: "train", composed of 100 songs, and a folder with a test set: "test", composed of 50 songs. Supervised approaches should be trained on the training set and tested on both s
- dataset
人間参加型（human-in-the-loop）機械学習とは？
- 5 users
- www.telusinternational.com
- テクノロジー
- 2019/07/11
The requested URL was rejected. Please consult with your administrator. Your support ID is: 11431019930562606202
- dataset
表面欠陥の大規模データセットが登場機械の自動検査に期待あり【AI×製造】（論文） | AIDB
- 5 users
- ai-data-base.com
- テクノロジー
- 2021/04/13
非効率な工作機械の欠陥検出産業界の物体分類問題は、深層学習の登場以降注目を集めています。しかし、多くの分野では深層学習を適用させるためのデータセットがまだまだ不足しています。産業界における分類タスクの１つに、工作機械の部品の表面検査があります。予期しない機械の故障を防ぐことができるため、この検査を高精度・低コストで行うことは非常に関心が高まっています。工作機械の部品の欠陥検出における課題に対して、どのような研究が行われているのでしょうか。ドイツにあるカールスルーエ工科大学のTobias Schlagenhaufらの研究を紹介します。研究者らは、表面の異常検出を行うシステムの開発のための有効なデータセットを作成しました。 ▼論文情報タイトル：Industrial Machine Tool Component Surface Defect Dataset 著者：Tobias Sch
- dataset
- あとで読む
ZOZO研究所、ファッションの流行変化を検証する大規模データセットと実装基盤をオープンソースとして公開〜実データを活用し、分布シフト研究の促進を目指す〜 - ニュース - 株式会社ZOZOテクノロジーズ
- 4 users
- press-tech.zozo.com
- 暮らし
- 2021/09/02
株式会社ZOZOテクノロジーズ（本社：千葉県千葉市代表取締役社長：久保田竜弥、代表取締役CINO：金山裕樹）の研究開発組織「ZOZO研究所」は、当所研究員が研究において使用している大規模データセット「Shift15M」および実装基盤をオープンソースとして公開したことをお知らせいたします。「Shift15M」は、ファッションアプリ「IQON」（※1）に投稿されたコーディネート（※2）を基に構成された大規模データセットです。本データセットは、IQONのサービス提供期間である2010年から2020年までに投稿されたコーディネート約255万件のほか、これらのコーディネートを構成する約1,500万件（※3）のアイテムに関する特徴量、アイテムカテゴリに関するデータやコーディネート投稿への「いいね」数などの関連データも含みます。併せて公開する実装基盤では、コーディネートデータの年ごとに異なる傾向を
- dataset
- fashion
OSCAR
- 4 users
- oscar-project.org
- 暮らし
- 2020/09/27
Open Source Project on Multilingual Resources for Machine Learning The OSCAR project (Open Super-large Crawled Aggregated coRpus) is an Open Source project aiming to provide web-based multilingual resources and datasets for Machine Learning (ML) and Artificial Intelligence (AI) applications. The project focuses specifically in providing large quantities of unannotated raw data that is commonly use
- dataset
GitHub - robvanvolt/DALLE-datasets: This is a summary of easily available datasets for generalized DALLE-pytorch training.
- 4 users
- github.com/robvanvolt
- テクノロジー
- 2021/05/24
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session. Dismiss alert
- dataset
- *あとで読む
COCO dataset：セグメンテーションなどに使える大規模なカラー写真の画像データセット
- 4 users
- atmarkit.itmedia.co.jp
- テクノロジー
- 2021/09/08
COCO dataset：セグメンテーションなどに使える大規模なカラー写真の画像データセット：AI・機械学習のデータセット辞典データセット「COCO」について説明。約33万枚のカラー写真（教師ラベル付きは20万枚以上）の画像データとアノテーション（＝教師ラベル）が無料でダウンロードでき、物体検知／セグメンテーションや、キーポイント検出／姿勢推定、キャプション作成などに利用できる。
- dataset
- photo
OpenAssistant/oasst1 · Datasets at Hugging Face
- 4 users
- huggingface.co
- テクノロジー
- 2023/04/16
'Jew' or 'rabbi'"},"role":{"kind":"string","value":"assistant"},"lang":{"kind":"string","value":"en"},"review_count":{"kind":"number","value":3,"string":"3"},"review_result":{"kind":"string","value":"true"},"deleted":{"kind":"string","value":"false"},"rank":{"kind":"number","value":1,"string":"1"},"synthetic":{"kind":"string","value":"false"},"model_name":{"kind":"null"},"detoxify":{"kind":"string
- dataset
MVTec Anomaly Detection Dataset: MVTec Software
- 4 users
- www.mvtec.com
- 世の中
- 2019/08/14
MVTec AD is a dataset for benchmarking anomaly detection methods with a focus on industrial inspection. It contains over 5000 high-resolution images divided into fifteen different object and texture categories. Each category comprises a set of defect-free training images and a test set of images with various kinds of defects as well as images without defects. Pixel-precise annotations of all anoma
- dataset
青木俊介 on Twitter: "スマートフォン使った人間関係解析は10年くらい前からかなり研究されてきたんだけど、デンマークから超良質なオープンデータがついにきたー！学生700人を対象にBluetooth近接情報, 通話記録, FB友人関係, テキスト履歴,… https://t.co/03w9zVXxZV"
- 4 users
- twitter.com/aoshun7
- 学び
- 2019/12/15
スマートフォン使った人間関係解析は10年くらい前からかなり研究されてきたんだけど、デンマークから超良質なオープンデータがついにきたー！学生700人を対象にBluetooth近接情報, 通話記録, FB友人関係, テキスト履歴,… https://t.co/03w9zVXxZV
- dataset
- データ
- 資料
TOPproject
- 4 users
- tsukazaki-ai.github.io
- 暮らし
- 2019/07/26
Hiroki Masumoto, Chief Artificial Intelligence Engineer and ophthalmologist of Tsukazaki Hospital h.masumoto@tsukazaki-eye.net Last updated: August 26, 2019 The dataset is available from the following URL: Entering your name, email address and affiliation and checking the Purpose checkbox are necessary on the following page to download the dataset. If you succeed the submit your information, the e
- データ
- medical
- 医療
Know Your Data
- 4 users
- knowyourdata.withgoogle.com
- テクノロジー
- 2021/05/20
Know Your Data helps researchers, engineers, product teams, and decision makers understand datasets with the goal of improving data quality, and helping mitigate fairness and bias issues.
- Google
- dataset
- tool
The C4 Multilingual Dataset · allenai/allennlp · Discussion #5265
- 3 users
- github.com/allenai
- テクノロジー
- 2021/08/01
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session. Dismiss alert
- dataset
bigcode/the-stack · Datasets at Hugging Face
- 3 users
- huggingface.co
- テクノロジー
- 2022/10/29
Terms of Use for The Stack\nThe Stack dataset is a collection of source code in over 300 programming languages. We ask that you read and acknowledge the following points before using the dataset: \n\nThe Stack is a collection of source code from repositories with various licenses. Any use of all or part of the code gathered in The Stack must abide by the terms of the original licenses, including a
- dataset
GitHub - RUCAIBox/RecSysDatasets: This is a repository of public data sources for Recommender Systems (RS).
- 3 users
- github.com/RUCAIBox
- テクノロジー
- 2021/06/18
Amazon: Amazon Review Data includes reviews (ratings, text, helpfulness votes) and product metadata (descriptions, category information, price, brand, and image features), which includes a previous version in 2014 and an updated version in 2018. Our processed datasets are detailed here. Amazon 2014: This dataset contains product reviews and metadata from Amazon, including 24 categories and 142.8 m
- dataset
Cherry blossom phenology and temperature reconstructions at Kyoto | 生態気象学研究グループ
- 3 users
- atmenv.envi.osakafu-u.ac.jp
- 世の中
- 2021/03/31
Cherry blossom phenology and temperature reconstructions at Kyoto | 生態気象学研究グループ Historical Series of Phenological data for Cherry Tree Flowering at Kyoto City (and March Mean Temperature Reconstructions) I have searched and collected the phenological data for full flowering date of cherry tree (Prunus jamasakura) from many diaries and chronicles written by Emperors, aristocrats, goveners and monks
- dataset
- 日本
Pervasive Label Errors in Test Sets Destabilize Machine Learning Benchmarks
- 3 users
- arxiv.org
- テクノロジー
- 2021/10/13
We identify label errors in the test sets of 10 of the most commonly-used computer vision, natural language, and audio datasets, and subsequently study the potential for these label errors to affect benchmark results. Errors in test sets are numerous and widespread: we estimate an average of at least 3.3% errors across the 10 datasets, where for example label errors comprise at least 6% of the Ima
- dataset
PandaSet Open Datasets - Scale
- 3 users
- scale.com
- 暮らし
- 2020/05/28
Scene #1Scene #2Scene #3Scene #4Scene #5Scene #6Scene #7Scene #8
- dataset
GitHub - webdataset/webdataset: A high-performance Python-based I/O system for large (and small) deep learning problems, with strong support for PyTorch.
- 3 users
- github.com/webdataset
- テクノロジー
- 2022/03/19
WebDataset format files are tar files, with two conventions: within each tar file, files that belong together and make up a training sample share the same basename when stripped of all filename extensions the shards of a tar file are numbered like something-000000.tar to something-012345.tar, usually specified using brace notation something-{000000..012345}.tar WebDataset can read files from local
https://twitter.com/ogawa_yutaro_22/status/1421961964025049088
- 3 users
- twitter.com/ogawa_yutaro_22
- テクノロジー
- 2021/08/03