タイトル「*dataset」を検索 - はてなブックマーク

81 - 120 件 / 237件

新着順人気順

絞り込み

検索対象
ブックマーク数
期間
セーフサーチ

*datasetの検索結果81 - 120 件 / 237件

GitHub - visual-layer/fastdup: fastdup is a powerful free tool designed to rapidly extract valuable insights from your image & video datasets. Assisting you to increase your dataset images & labels quality and reduce your data operations costs at an unpar
- 2 users
- github.com/visual-layer
- テクノロジー
- 2023/03/08
An unsupervised and free tool for image and video dataset analysis. fastdup is founded by the authors of XGBoost, Apache TVM & Turi Create - Danny Bickson, Carlos Guestrin and Amir Alush. Explore the docs » Features · Report Bug · Blog · Quickstart · Enterprise Edition · About us 🚀 Introducing VL Profiler! 🚀 We're excited to announce our new cloud product, VL Profiler. It's designed to help you
CCMatrix: A billion-scale bitext dataset for training translation models
- 2 users
- ai.facebook.com
- テクノロジー
- 2020/02/08
CCMatrix: A billion-scale bitext dataset for training translation models What it is:CCMatrix is the largest dataset of high-quality, web-based bitexts for training translation models. With more than 4.5 billion parallel sentences in 576 language pairs pulled from snapshots of the CommonCrawl public dataset, CCMatrix is more than 50 times larger than the WikiMatrix corpus that we shared last year.
- NLP
- dataset
How to use the CrUX BigQuery dataset | Chrome UX Report | Chrome for Developers
- 2 users
- developer.chrome.com
- テクノロジー
- 2020/01/15
The raw data of the Chrome UX Report (CrUX) is available on BigQuery, a database on Google Cloud. Using BigQuery requires a GCP project and basic knowledge of SQL. In this guide, learn how to use BigQuery to write queries against the CrUX dataset to extract insightful results about the state of user experiences on the web: Understand how the data is organized Write a basic query to evaluate an ori
- techfeed
- あとで読む
GitHub - mdeff/fma: FMA: A Dataset For Music Analysis
- 2 users
- github.com/mdeff
- テクノロジー
- 2020/06/08
Michaël Defferrard, Kirell Benzi, Pierre Vandergheynst, Xavier Bresson. International Society for Music Information Retrieval Conference (ISMIR), 2017. We introduce the Free Music Archive (FMA), an open and easily accessible dataset suitable for evaluating several tasks in MIR, a field concerned with browsing, searching, and organizing large music collections. The community's growing interest in f
GitHub - yahoojapan/VFD-Dataset
- 2 users
- github.com/yahoojapan
- テクノロジー
- 2020/11/17
We propose a visually-grounded first-person dialogue (VFD) dataset with verbal and non-verbal responses. The VFD dataset provides manually annotated (1) first-person images of agents, (2) utterances of human speakers, (3) eye-gaze locations of the speakers, and (4) the agents' verbal and non-verbal responses. All utterances and responses are represented in Japanese. The images with eye-gaze locati
Driving Dataset | a2d2.audi
- 2 users
- www.a2d2.audi
- 世の中
- 2020/04/21
We have published the Audi Autonomous Driving Dataset (A2D2) to support startups and academic researchers working on autonomous driving. Equipping a vehicle with a multimodal sensor suite, recording a large dataset, and labelling it, is time and labour intensive. Our dataset removes this high entry barrier and frees researchers and developers to focus on developing new technologies instead. The da
- dataset
ArtEmis Dataset V2.0
- 2 users
- www.artemisdataset-v2.org
- テクノロジー
- 2022/09/09
Datasets that capture the connection between vision, language, and affection are limited, causing a lack of understanding of the emotional aspect of human intelligence. As a step in this direction, the ArtEmis dataset was recently introduced as a large-scale dataset of emotional reactions to images along with language explanations of these chosen emotions. We observed a significant emotional bias
- dataset
Split a dataset created by Tensorflow dataset API in to Train and Test?
- 2 users
- stackoverflow.com
- テクノロジー
- 2019/10/28
Collectives™ on Stack Overflow Find centralized, trusted content and collaborate around the technologies you use most. Learn more about Collectives Teams Q&A for work Connect and share knowledge within a single location that is structured and easy to search. Learn more about Teams
Announcing the Objectron Dataset
- 2 users
- ai.googleblog.com
- テクノロジー
- 2020/11/10
Philosophy We strive to create an environment conducive to many different types of research across many different time scales and levels of risk. Learn more about our Philosophy Learn more
- 機械学習
- あとで読む
Off-Policy Evaluationの基礎とOpen Bandit Dataset & Pipelineの紹介
- 2 users
- speakerdeck.com/usaito
- テクノロジー
- 2021/03/09
発表概要：テック企業における機械学習応用の場面においては、機械学習による予測をそのまま用いるのではなく、「それぞれのユーザーにどのファッションアイテムを推薦すべきか？」などの意思決定を下すための情報として用いることが多い。このような場合に、予測精度をオフライン評価指標として用いてしまうと、最終的なモデ…
GitHub - JSCIG/dataset: TC39 Proposal Dataset
- 2 users
- github.com/JSCIG
- テクノロジー
- 2020/11/17
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session. Dismiss alert
- ECMAScript
- data
- api
Dataset Condensation with Gradient Matching
- 2 users
- openreview.net
- 学び
- 2021/01/18
Abstract: As the state-of-the-art machine learning methods in many fields rely on larger datasets, storing datasets and training models on them become significantly more expensive. This paper proposes a training set synthesis technique for data-efficient learning, called Dataset Condensation, that learns to condense large dataset into a small set of informative synthetic samples for training deep
Dataset - Schema.org Type
- 2 users
- schema.org
- 暮らし
- 2019/12/06
A downloadable form of this dataset, at a specific location, in a specific format. This property can be repeated if different variations are available. There is no expectation that different downloadable distributions must contain exactly equivalent information (see also DCAT on this point). Different distributions might include or exclude different subsets of the entire dataset, for example.
GitHub - f/honst: Fixes your dataset according to your rules.
- 2 users
- github.com/f
- テクノロジー
- 2020/12/22
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session. Dismiss alert
- library
- JavaScript
GitHub - AaronWard/covidify: Covidify - corona virus report and dataset generator for python 📈 [no longer being updated]
- 2 users
- github.com/AaronWard
- テクノロジー
- 2020/03/05
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session. Dismiss alert
- COVID-19
"Excavating AI" Re-excavated: Debunking a Fallacious Account of the JAFFE Dataset
- 2 users
- arxiv.org
- テクノロジー
- 2022/06/06
Twenty-five years ago, my colleagues Miyuki Kamachi and Jiro Gyoba and I designed and photographed JAFFE, a set of facial expression images intended for use in a study of face perception. In 2019, without seeking permission or informing us, Kate Crawford and Trevor Paglen exhibited JAFFE in two widely publicized art shows. In addition, they published a nonfactual account of the images in the essay
- ネタ
Distilling BERT Using an Unlabeled Question-Answering Dataset
- 2 users
- towardsdatascience.com
- 暮らし
- 2020/11/25
Photo by Alfons Morales on UnsplashThe data labeling process is quite complicated, especially for tasks such as Machine Reading Comprehension (Question Answering). In this post, I want to describe one of the techniques we used to adapt the question-answering model to a specific domain using a limited amount of labeled data — Knowledge Distillation. It turned out that we can use it not only to “com
GitHub - zhenglinpan/SakugaDataset: Official Repository for Sakuga-42M Dataset
- 2 users
- github.com/zhenglinpan
- テクノロジー
- 2024/05/15
This is the official GitHub repository of Sakuga-42M Dataset. Sakuga-42M Dataset is the first large-scale cartoon animation dataset, comprising 42 million keyframes. We hope that our efforts in providing this fundamental dataset could somehow alleviate the data scarcity that has haunted this research domain for years and make it possible to introduce large-scale models and approaches that lead to
GitHub - motokimura/PyTorch_Gaussian_YOLOv3: PyTorch implementation of Gaussian YOLOv3 (including training code for COCO dataset)
- 2 users
- github.com/motokimura
- テクノロジー
- 2019/12/01
The benchmark results below have been obtained by training models for 500k iterations on the COCO 2017 train dataset using darknet repo and our repo. Gaussian YOLOv3 implemented in our repo achieved 30.4% in COCO AP[IoU=0.50:0.95], which is 2.6 ~ 2.7 point higher than the score of YOLOv3 implemented in darknet and our repo. This gain is smaller than 3.1, the one reported in the Gaussian YOLOv3 pap
- 機械学習
GitHub - google-research-datasets/Objectron: Objectron is a dataset of short, object-centric video clips. In addition, the videos also contain AR session metadata including camera poses, sparse point-clouds and planes. In each video, the camera moves arou
- 2 users
- github.com/google-research-datasets
- テクノロジー
- 2020/11/10
Objectron is a dataset of short, object-centric video clips. In addition, the videos also contain AR session metadata including camera poses, sparse point-clouds and planes. In each video, the camera moves around and above the object and captures it from different views. Each object is annotated with a 3D bounding box. The 3D bounding box descri… License
- Google
Tracking excess mortality across countries during the COVID-19 pandemic with the World Mortality Dataset
- 2 users
- elifesciences.org
- 世の中
- 2021/08/23
Comparing the impact of the COVID-19 pandemic between countries or across time is difficult because the reported numbers of cases and deaths can be strongly affected by testing capacity and reporting policy. Excess mortality, defined as the increase in all-cause mortality relative to the expected mortality, is widely considered as a more objective indicator of the COVID-19 death toll. However, the
- 日本
Satellite Imagery Dataset Machine Learning AI | Drone Images
- 2 users
- www.anolytics.ai
- テクノロジー
- 2020/09/18
Training Data for Satellite Imagery Image Annotation to Create Satellite Image Dataset Contact Us Talk to Sales Satellite Imagery Dataset for Machine Learning and AIAI based models developed through machine learning for Aerial view need satellite imagery dataset to train the model for right detection. Anolytics provides satellite imagery data sets with annotated images to make the varied object
- space
- dataset
gpt-3/dataset_statistics/languages_by_word_count.csv at master · openai/gpt-3
- 2 users
- github.com/openai
- テクノロジー
- 2020/10/15
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session. Dismiss alert
5 Different Ways To Reduce Noise In An Imbalanced Dataset
- 2 users
- analyticsindiamag.com
- テクノロジー
- 2021/11/15
Image source: datascience.aero Experienced data science and machine learning experts know that imbalanced class distribution is one of the most frequently encountered problems in data science. It occurs when the number of observations belonging to one class is significantly lower than those belonging to the other classes. What Is Data Imbalance? Imbalanced dataset occurs when one set of classes ar
Welcome! | Million Song Dataset
- 2 users
- millionsongdataset.com
- 学び
- 2022/02/10
The Million Song Dataset is a freely-available collection of audio features and metadata for a million contemporary popular music tracks. Its purposes are: To encourage research on algorithms that scale to commercial sizes To provide a reference dataset for evaluating research As a shortcut alternative to creating a large dataset with APIs (e.g. The Echo Nest's) To help new researchers get started
- dataset
GitHub - unicamp-dl/mMARCO: A multilingual version of MS MARCO passage ranking dataset
- 2 users
- github.com/unicamp-dl
- テクノロジー
- 2023/02/21
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session. Dismiss alert
Waymo Open Dataset: Sharing our self-driving data for research
- 2 users
- blog.waymo.com
- テクノロジー
- 2019/08/22
Back to all posts Data is a critical ingredient for machine learning. Our vehicles have collected over 10 million autonomous miles in 25 cities; this rich and diverse set of real world experiences has helped our engineers and researchers develop Waymo’s self-driving technology and innovative models and algorithms. Today, we are inviting the research community to join us with the release of the Way
- dataset
- Google
Spectral Metric for Dataset Complexity Assessment
- 2 users
- arxiv.org
- 学び
- 2019/10/02
In this paper, we propose a new measure to gauge the complexity of image classification problems. Given an annotated image dataset, our method computes a complexity measure called the cumulative spectral gradient (CSG) which strongly correlates with the test accuracy of convolutional neural networks (CNN). The CSG measure is derived from the probabilistic divergence between classes in a spectral c
1 dataset. 100 visualizations.
- 2 users
- 100.datavizproject.com
- 学び
- 2023/02/28
Scandinavia as a whole gained new sitesSweden stayed the country with the most sitesSweden used to have over half of all sites and now has under halfDenmark gained the most new sites and Sweden the fewestDenmark got most of its sites after 2004 while Sweden and Norway did beforeDenmark surpassed Norway in number of sites
- あとで読む
Open for Research: COVID-19 Literature Dataset
- 2 users
- www.linkedin.com
- 世の中
- 2020/03/18
It’s more important than ever to come together, as companies, non-profits, governments, scientists, and clinicians, to bring our best information and technologies to bear on challenges with COVID-19. Today, we announced a collaboration with colleagues to create the COVID-19 Open Research Dataset (CORD-19) from a coalescence of scientific articles about the coronavirus group of viruses for use by t
LIBRE-dataset
- 2 users
- sites.google.com
- 暮らし
- 2020/02/23
We proudly present the Nagoya University and TierIV multiple 3D LiDARs dataset "LIBRE": LiDAR Benchmarking and Reference, a first-of-its-kind dataset featuring several different 3D LiDAR sensors, covering a range of manufacturers, models, and laser configurations. Our dataset include LiDAR data from different environments and configurations. Static targets, where objects were placed at known dista
- dataset
GitHub - google-research-datasets/clang8: cLang-8 is a dataset for grammatical error correction.
- 2 users
- github.com/google-research-datasets
- テクノロジー
- 2021/06/10
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session. Dismiss alert
- dataset
- google
Overture Maps Foundation Releases Its First World-Wide Open Map Dataset – Overture Maps Foundation
- 2 users
- overturemaps.org
- テクノロジー
- 2023/07/27
Blog Overture Maps Foundation Releases Its First World-Wide Open Map Dataset Initial dataset establishes a baseline for four important layers of open map data, including newly released open data for nearly 60 million places worldwide. SAN FRANCISCO, July 26, 2023 — The Overture Maps Foundation (OMF), a collaborative effort to enable current and next-generation interoperable open map products, toda
GitHub - hikariming/virus-mask-dataset: 人员佩戴口罩检测数据集
- 2 users
- github.com/hikariming
- テクノロジー
- 2020/03/03
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session. Dismiss alert
MIT DriveSeg Dataset for Dynamic Driving Scene Segmentation | MIT AgeLab
- 2 users
- agelab.mit.edu
- 世の中
- 2020/06/24
Skip to main content
- dataset
CoVoST V2: Expanding the largest, most diverse multilingual speech-to-text translation dataset
- 2 users
- ai.meta.com
- テクノロジー
- 2020/07/24
CoVoST V2: Expanding the largest, most diverse multilingual speech-to-text translation dataset What the research is: CoVoST V2 expands on our CoVoST dataset, a speech-to-text translation (ST) corpus targeted at multilingual translation. This new release makes available the largest multilingual ST dataset to date. CoVoST V2 will facilitate translating 21 languages into English, as well as English i
- 機械学習
- dataset
Habitat-Matterport 3D Semantics Dataset
- 2 users
- aihabitat.org
- テクノロジー
- 2022/10/14
The Habitat-Matterport 3D Semantics Dataset (HM3DSem) is the largest-ever dataset of 3D real-world and indoor spaces with densely annotated semantics that is available to the academic community. HM3DSem v0.2 consists of 142,646 object instance annotations across 216 3D-spaces from HM3D and 3,100 rooms within those spaces. The HM3D scenes are annotated with the 142,646 raw object names, which are m
- AI
pyTorchのtransforms,Datasets,Dataloaderの説明と自作Datasetの作成と使用 - Qiita
- 2 users
- qiita.com/mathlive
- テクノロジー
- 2020/07/30
pyTorchのtransforms,Datasets,Dataloaderの説明と自作Datasetの作成と使用Python機械学習DeepLearningDatasetPyTorch 2019/9/29 投稿 2019/11/8 やや見やすく編集(主観) 0. この記事の対象者 pythonを触ったことがあり,実行環境が整っている人 pyTorchをある程度触ったことがある人 pyTorchとtorchvisionのtransforms,Datasets,dataloaderを深く理解したい人既存のDatasetから自作のDatasetを作成したい人 1. はじめに昨今では機械学習に対してpython言語による研究が主である.なぜならpythonにはデータ分析や計算を高速で行うためのライブラリ(moduleと呼ばれる)がたくさん存在するからだ. その中でも今回はpyTorchと呼ば
GitHub - amazon-science/esci-data: Shopping Queries Dataset: A Large-Scale ESCI Benchmark for Improving Product Search
- 2 users
- github.com/amazon-science
- テクノロジー
- 2023/02/28
We introduce the “Shopping Queries Data Set”, a large dataset of difficult search queries, released with the aim of fostering research in the area of semantic matching of queries and products. For each query, the dataset provides a list of up to 40 potentially relevant results, together with ESCI relevance judgements (Exact, Substitute, Complement, Irrelevant) indicating the relevance of the produ
- search
Google、“Dataset Search”を正式公開
- 2 users
- current.ndl.go.jp
- 学び
- 2020/01/31
2020年1月23日、Googleが“Dataset Search”の正式公開を発表しています。正式公開にあわせ、表・画像・テキストといったデータセットの種類や無料で提供されているかなどでのフィルタリング機能や、地理分野のデータセットの場合は地図上で表示できるといった新らしい機能が追加されているほか、データセットに関する記述の品質も改善し、モバイル端末でも利用可能となったとしています。含まれるデータセットで最も多い分野は地球科学・生物学・農学で、利用可能な政府のオープンデータの数では200万件を超す米国のものが多く、データ形式では600万件を超す表形式のものが最も多いと紹介されています。 Discovering millions of datasets on the web（Google, 2020/1/23） https://www.blog.google/products/sear
- エレクトリック
- science