並び順

ブックマーク数

期間指定

  • から
  • まで

81 - 120 件 / 237件

新着順 人気順

*datasetの検索結果81 - 120 件 / 237件

  • GitHub - visual-layer/fastdup: fastdup is a powerful free tool designed to rapidly extract valuable insights from your image & video datasets. Assisting you to increase your dataset images & labels quality and reduce your data operations costs at an unpar

    An unsupervised and free tool for image and video dataset analysis. fastdup is founded by the authors of XGBoost, Apache TVM & Turi Create - Danny Bickson, Carlos Guestrin and Amir Alush. Explore the docs » Features · Report Bug · Blog · Quickstart · Enterprise Edition · About us 🚀 Introducing VL Profiler! 🚀 We're excited to announce our new cloud product, VL Profiler. It's designed to help you

      GitHub - visual-layer/fastdup: fastdup is a powerful free tool designed to rapidly extract valuable insights from your image & video datasets. Assisting you to increase your dataset images & labels quality and reduce your data operations costs at an unpar
    • CCMatrix: A billion-scale bitext dataset for training translation models

      CCMatrix: A billion-scale bitext dataset for training translation models What it is:CCMatrix is the largest dataset of high-quality, web-based bitexts for training translation models. With more than 4.5 billion parallel sentences in 576 language pairs pulled from snapshots of the CommonCrawl public dataset, CCMatrix is more than 50 times larger than the WikiMatrix corpus that we shared last year.

        CCMatrix: A billion-scale bitext dataset for training translation models
      • How to use the CrUX BigQuery dataset  |  Chrome UX Report  |  Chrome for Developers

        The raw data of the Chrome UX Report (CrUX) is available on BigQuery, a database on Google Cloud. Using BigQuery requires a GCP project and basic knowledge of SQL. In this guide, learn how to use BigQuery to write queries against the CrUX dataset to extract insightful results about the state of user experiences on the web: Understand how the data is organized Write a basic query to evaluate an ori

        • GitHub - mdeff/fma: FMA: A Dataset For Music Analysis

          Michaël Defferrard, Kirell Benzi, Pierre Vandergheynst, Xavier Bresson. International Society for Music Information Retrieval Conference (ISMIR), 2017. We introduce the Free Music Archive (FMA), an open and easily accessible dataset suitable for evaluating several tasks in MIR, a field concerned with browsing, searching, and organizing large music collections. The community's growing interest in f

            GitHub - mdeff/fma: FMA: A Dataset For Music Analysis
          • GitHub - yahoojapan/VFD-Dataset

            We propose a visually-grounded first-person dialogue (VFD) dataset with verbal and non-verbal responses. The VFD dataset provides manually annotated (1) first-person images of agents, (2) utterances of human speakers, (3) eye-gaze locations of the speakers, and (4) the agents' verbal and non-verbal responses. All utterances and responses are represented in Japanese. The images with eye-gaze locati

              GitHub - yahoojapan/VFD-Dataset
            • Driving Dataset | a2d2.audi

              We have published the Audi Autonomous Driving Dataset (A2D2) to support startups and academic researchers working on autonomous driving. Equipping a vehicle with a multimodal sensor suite, recording a large dataset, and labelling it, is time and labour intensive. Our dataset removes this high entry barrier and frees researchers and developers to focus on developing new technologies instead. The da

                Driving Dataset | a2d2.audi
              • ArtEmis Dataset V2.0

                Datasets that capture the connection between vision, language, and affection are limited, causing a lack of understanding of the emotional aspect of human intelligence. As a step in this direction, the ArtEmis dataset was recently introduced as a large-scale dataset of emotional reactions to images along with language explanations of these chosen emotions. We observed a significant emotional bias

                  ArtEmis Dataset V2.0
                • Split a dataset created by Tensorflow dataset API in to Train and Test?

                  Collectives™ on Stack Overflow Find centralized, trusted content and collaborate around the technologies you use most. Learn more about Collectives Teams Q&A for work Connect and share knowledge within a single location that is structured and easy to search. Learn more about Teams

                    Split a dataset created by Tensorflow dataset API in to Train and Test?
                  • Announcing the Objectron Dataset

                    Philosophy We strive to create an environment conducive to many different types of research across many different time scales and levels of risk. Learn more about our Philosophy Learn more

                      Announcing the Objectron Dataset
                    • Off-Policy Evaluationの基礎とOpen Bandit Dataset & Pipelineの紹介

                      発表概要: テック企業における機械学習応用の場面においては、機械学習による予測をそのまま用いるのではなく、「それぞれのユーザーにどのファッションアイテムを推薦すべきか?」などの意思決定を下すための情報として用いることが多い。このような場合に、予測精度をオフライン評価指標として用いてしまうと、最終的なモデ…

                        Off-Policy Evaluationの基礎とOpen Bandit Dataset & Pipelineの紹介
                      • GitHub - JSCIG/dataset: TC39 Proposal Dataset

                        You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session. Dismiss alert

                          GitHub - JSCIG/dataset: TC39 Proposal Dataset
                        • Dataset Condensation with Gradient Matching

                          Abstract: As the state-of-the-art machine learning methods in many fields rely on larger datasets, storing datasets and training models on them become significantly more expensive. This paper proposes a training set synthesis technique for data-efficient learning, called Dataset Condensation, that learns to condense large dataset into a small set of informative synthetic samples for training deep

                            Dataset Condensation with Gradient Matching
                          • Dataset - Schema.org Type

                            A downloadable form of this dataset, at a specific location, in a specific format. This property can be repeated if different variations are available. There is no expectation that different downloadable distributions must contain exactly equivalent information (see also DCAT on this point). Different distributions might include or exclude different subsets of the entire dataset, for example.

                            • GitHub - f/honst: Fixes your dataset according to your rules.

                              You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session. Dismiss alert

                                GitHub - f/honst: Fixes your dataset according to your rules.
                              • GitHub - AaronWard/covidify: Covidify - corona virus report and dataset generator for python 📈 [no longer being updated]

                                You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session. Dismiss alert

                                  GitHub - AaronWard/covidify: Covidify - corona virus report and dataset generator for python 📈 [no longer being updated]
                                • "Excavating AI" Re-excavated: Debunking a Fallacious Account of the JAFFE Dataset

                                  Twenty-five years ago, my colleagues Miyuki Kamachi and Jiro Gyoba and I designed and photographed JAFFE, a set of facial expression images intended for use in a study of face perception. In 2019, without seeking permission or informing us, Kate Crawford and Trevor Paglen exhibited JAFFE in two widely publicized art shows. In addition, they published a nonfactual account of the images in the essay

                                  • Distilling BERT Using an Unlabeled Question-Answering Dataset

                                    Photo by Alfons Morales on UnsplashThe data labeling process is quite complicated, especially for tasks such as Machine Reading Comprehension (Question Answering). In this post, I want to describe one of the techniques we used to adapt the question-answering model to a specific domain using a limited amount of labeled data — Knowledge Distillation. It turned out that we can use it not only to “com

                                      Distilling BERT Using an Unlabeled Question-Answering Dataset
                                    • GitHub - zhenglinpan/SakugaDataset: Official Repository for Sakuga-42M Dataset

                                      This is the official GitHub repository of Sakuga-42M Dataset. Sakuga-42M Dataset is the first large-scale cartoon animation dataset, comprising 42 million keyframes. We hope that our efforts in providing this fundamental dataset could somehow alleviate the data scarcity that has haunted this research domain for years and make it possible to introduce large-scale models and approaches that lead to

                                        GitHub - zhenglinpan/SakugaDataset: Official Repository for Sakuga-42M Dataset
                                      • GitHub - motokimura/PyTorch_Gaussian_YOLOv3: PyTorch implementation of Gaussian YOLOv3 (including training code for COCO dataset)

                                        The benchmark results below have been obtained by training models for 500k iterations on the COCO 2017 train dataset using darknet repo and our repo. Gaussian YOLOv3 implemented in our repo achieved 30.4% in COCO AP[IoU=0.50:0.95], which is 2.6 ~ 2.7 point higher than the score of YOLOv3 implemented in darknet and our repo. This gain is smaller than 3.1, the one reported in the Gaussian YOLOv3 pap

                                          GitHub - motokimura/PyTorch_Gaussian_YOLOv3: PyTorch implementation of Gaussian YOLOv3 (including training code for COCO dataset)
                                        • GitHub - google-research-datasets/Objectron: Objectron is a dataset of short, object-centric video clips. In addition, the videos also contain AR session metadata including camera poses, sparse point-clouds and planes. In each video, the camera moves arou

                                          Objectron is a dataset of short, object-centric video clips. In addition, the videos also contain AR session metadata including camera poses, sparse point-clouds and planes. In each video, the camera moves around and above the object and captures it from different views. Each object is annotated with a 3D bounding box. The 3D bounding box descri… License

                                            GitHub - google-research-datasets/Objectron: Objectron is a dataset of short, object-centric video clips. In addition, the videos also contain AR session metadata including camera poses, sparse point-clouds and planes. In each video, the camera moves arou
                                          • Tracking excess mortality across countries during the COVID-19 pandemic with the World Mortality Dataset

                                            Comparing the impact of the COVID-19 pandemic between countries or across time is difficult because the reported numbers of cases and deaths can be strongly affected by testing capacity and reporting policy. Excess mortality, defined as the increase in all-cause mortality relative to the expected mortality, is widely considered as a more objective indicator of the COVID-19 death toll. However, the

                                              Tracking excess mortality across countries during the COVID-19 pandemic with the World Mortality Dataset
                                            • Satellite Imagery Dataset Machine Learning AI | Drone Images

                                              Training Data for Satellite Imagery Image Annotation to Create Satellite Image Dataset​ Contact Us   Talk to Sales Satellite Imagery Dataset for Machine Learning and AIAI based models developed through machine learning for Aerial view need satellite imagery dataset to train the model for right detection. Anolytics provides satellite imagery data sets with annotated images to make the varied object

                                                Satellite Imagery Dataset Machine Learning AI | Drone Images
                                              • gpt-3/dataset_statistics/languages_by_word_count.csv at master · openai/gpt-3

                                                You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session. Dismiss alert

                                                  gpt-3/dataset_statistics/languages_by_word_count.csv at master · openai/gpt-3
                                                • 5 Different Ways To Reduce Noise In An Imbalanced Dataset

                                                  Image source: datascience.aero Experienced data science and machine learning experts know that imbalanced class distribution is one of the most frequently encountered problems in data science. It occurs when the number of observations belonging to one class is significantly lower than those belonging to the other classes. What Is Data Imbalance? Imbalanced dataset occurs when one set of classes ar

                                                    5 Different Ways To Reduce Noise In An Imbalanced Dataset
                                                  • Welcome! | Million Song Dataset

                                                    The Million Song Dataset is a freely-available collection of audio features and metadata for a million contemporary popular music tracks. Its purposes are: To encourage research on algorithms that scale to commercial sizes To provide a reference dataset for evaluating research As a shortcut alternative to creating a large dataset with APIs (e.g. The Echo Nest's) To help new researchers get started

                                                    • GitHub - unicamp-dl/mMARCO: A multilingual version of MS MARCO passage ranking dataset

                                                      You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session. Dismiss alert

                                                        GitHub - unicamp-dl/mMARCO: A multilingual version of MS MARCO passage ranking dataset
                                                      • Waymo Open Dataset: Sharing our self-driving data for research

                                                        Back to all posts Data is a critical ingredient for machine learning. Our vehicles have collected over 10 million autonomous miles in 25 cities; this rich and diverse set of real world experiences has helped our engineers and researchers develop Waymo’s self-driving technology and innovative models and algorithms. Today, we are inviting the research community to join us with the release of the Way

                                                          Waymo Open Dataset: Sharing our self-driving data for research
                                                        • Spectral Metric for Dataset Complexity Assessment

                                                          In this paper, we propose a new measure to gauge the complexity of image classification problems. Given an annotated image dataset, our method computes a complexity measure called the cumulative spectral gradient (CSG) which strongly correlates with the test accuracy of convolutional neural networks (CNN). The CSG measure is derived from the probabilistic divergence between classes in a spectral c

                                                          • 1 dataset. 100 visualizations.

                                                            Scandinavia as a whole gained new sitesSweden stayed the country with the most sitesSweden used to have over half of all sites and now has under halfDenmark gained the most new sites and Sweden the fewestDenmark got most of its sites after 2004 while Sweden and Norway did beforeDenmark surpassed Norway in number of sites

                                                            • Open for Research: COVID-19 Literature Dataset

                                                              It’s more important than ever to come together, as companies, non-profits, governments, scientists, and clinicians, to bring our best information and technologies to bear on challenges with COVID-19. Today, we announced a collaboration with colleagues to create the COVID-19 Open Research Dataset (CORD-19) from a coalescence of scientific articles about the coronavirus group of viruses for use by t

                                                                Open for Research: COVID-19 Literature Dataset
                                                              • LIBRE-dataset

                                                                We proudly present the Nagoya University and TierIV multiple 3D LiDARs dataset "LIBRE": LiDAR Benchmarking and Reference, a first-of-its-kind dataset featuring several different 3D LiDAR sensors, covering a range of manufacturers, models, and laser configurations. Our dataset include LiDAR data from different environments and configurations. Static targets, where objects were placed at known dista

                                                                  LIBRE-dataset
                                                                • GitHub - google-research-datasets/clang8: cLang-8 is a dataset for grammatical error correction.

                                                                  You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session. Dismiss alert

                                                                    GitHub - google-research-datasets/clang8: cLang-8 is a dataset for grammatical error correction.
                                                                  • Overture Maps Foundation Releases Its First World-Wide Open Map Dataset – Overture Maps Foundation

                                                                    Blog Overture Maps Foundation Releases Its First World-Wide Open Map Dataset Initial dataset establishes a baseline for four important layers of open map data, including newly released open data for nearly 60 million places worldwide. SAN FRANCISCO, July 26, 2023 — The Overture Maps Foundation (OMF), a collaborative effort to enable current and next-generation interoperable open map products, toda

                                                                    • GitHub - hikariming/virus-mask-dataset: 人员佩戴口罩检测数据集

                                                                      You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session. Dismiss alert

                                                                        GitHub - hikariming/virus-mask-dataset: 人员佩戴口罩检测数据集
                                                                      • MIT DriveSeg Dataset for Dynamic Driving Scene Segmentation | MIT AgeLab

                                                                        Skip to main content

                                                                        • CoVoST V2: Expanding the largest, most diverse multilingual speech-to-text translation dataset

                                                                          CoVoST V2: Expanding the largest, most diverse multilingual speech-to-text translation dataset What the research is: CoVoST V2 expands on our CoVoST dataset, a speech-to-text translation (ST) corpus targeted at multilingual translation. This new release makes available the largest multilingual ST dataset to date. CoVoST V2 will facilitate translating 21 languages into English, as well as English i

                                                                            CoVoST V2: Expanding the largest, most diverse multilingual speech-to-text translation dataset
                                                                          • Habitat-Matterport 3D Semantics Dataset

                                                                            The Habitat-Matterport 3D Semantics Dataset (HM3DSem) is the largest-ever dataset of 3D real-world and indoor spaces with densely annotated semantics that is available to the academic community. HM3DSem v0.2 consists of 142,646 object instance annotations across 216 3D-spaces from HM3D and 3,100 rooms within those spaces. The HM3D scenes are annotated with the 142,646 raw object names, which are m

                                                                            • pyTorchのtransforms,Datasets,Dataloaderの説明と自作Datasetの作成と使用 - Qiita

                                                                              pyTorchのtransforms,Datasets,Dataloaderの説明と自作Datasetの作成と使用Python機械学習DeepLearningDatasetPyTorch 2019/9/29 投稿 2019/11/8 やや見やすく編集(主観) 0. この記事の対象者 pythonを触ったことがあり,実行環境が整っている人 pyTorchをある程度触ったことがある人 pyTorchとtorchvisionのtransforms,Datasets,dataloaderを深く理解したい人 既存のDatasetから自作のDatasetを作成したい人 1. はじめに 昨今では機械学習に対してpython言語による研究が主である.なぜならpythonにはデータ分析や計算を高速で行うためのライブラリ(moduleと呼ばれる)がたくさん存在するからだ. その中でも今回はpyTorchと呼ば

                                                                                pyTorchのtransforms,Datasets,Dataloaderの説明と自作Datasetの作成と使用 - Qiita
                                                                              • GitHub - amazon-science/esci-data: Shopping Queries Dataset: A Large-Scale ESCI Benchmark for Improving Product Search

                                                                                We introduce the “Shopping Queries Data Set”, a large dataset of difficult search queries, released with the aim of fostering research in the area of semantic matching of queries and products. For each query, the dataset provides a list of up to 40 potentially relevant results, together with ESCI relevance judgements (Exact, Substitute, Complement, Irrelevant) indicating the relevance of the produ

                                                                                  GitHub - amazon-science/esci-data: Shopping Queries Dataset: A Large-Scale ESCI Benchmark for Improving Product Search
                                                                                • Google、“Dataset Search”を正式公開

                                                                                  2020年1月23日、Googleが“Dataset Search”の正式公開を発表しています。 正式公開にあわせ、表・画像・テキストといったデータセットの種類や無料で提供されているかなどでのフィルタリング機能や、地理分野のデータセットの場合は地図上で表示できるといった新らしい機能が追加されているほか、データセットに関する記述の品質も改善し、モバイル端末でも利用可能となったとしています。 含まれるデータセットで最も多い分野は地球科学・生物学・農学で、利用可能な政府のオープンデータの数では200万件を超す米国のものが多く、データ形式では600万件を超す表形式のものが最も多いと紹介されています。 Discovering millions of datasets on the web(Google, 2020/1/23) https://www.blog.google/products/sear

                                                                                    Google、“Dataset Search”を正式公開