タイトル「*dataset」を検索 - はてなブックマーク

1 - 6 件 / 6件

新着順人気順

絞り込み

検索対象
ブックマーク数
期間
セーフサーチ

*datasetの検索結果1 - 6 件 / 6件

Largest Dataset Powering AI Images Removed After Discovery of Child Sexual Abuse Material
- 9 users
- www.404media.co
- テクノロジー
- 2023/12/20
AI Largest Dataset Powering AI Images Removed After Discovery of Child Sexual Abuse Material The model is a massive part of the AI-ecosystem, used by Stable Diffusion and other major generative AI products. The removal follows discoveries made by Stanford researchers, who found thousands instances of suspected child sexual abuse material in the dataset. This piece is published with support from Th
- あとで読む
Sakuga-42M Dataset: Scaling Up Cartoon Research
- 4 users
- arxiv.org
- 学び
- 2024/05/17
Hand-drawn cartoon animation employs sketches and flat-color segments to create the illusion of motion. While recent advancements like CLIP, SVD, and Sora show impressive results in understanding and generating natural video by scaling large models with extensive datasets, they are not as effective for cartoons. Through our empirical experiments, we argue that this ineffectiveness stems from a not
AnswerCarefully Dataset – RIKEN-AIP, LIAT
- 4 users
- liat-aip.sakura.ne.jp
- テクノロジー
- 2024/05/22
新着情報 AnswerCarefully Dataset バージョン1.0を公開　(2024/4/30) 概要日本語LLM 出力の安全性・適切性に特化したインストラクション・データAnswerCarefully(AC)データセットVersion 1 を公開します。このデータセットは、英語の要注意回答を集めたDo-Not-Answer データセットの包括的なカテゴリ分類に基づき、人手で質問・回答ともに日本語サンプルを集めたオリジナルのデータセットです。データセットの特徴５つのリスクタイプ（大分類）、12の有害カテゴリ（中分類）、61のサブカテゴリ（小分類）をカバーしています。Version 1は各サブカテゴリにつき10から20のサンプルを含む計945件からなっています。このうち各サブカテゴリから３件ずつ、計183件をテストデータ、残り762件をを開発データとして２つのファイルに分け
RedPajama-Data-v2: an Open Dataset with 30 Trillion Tokens for Training Large Language Models — Together AI
- 3 users
- together.ai
- テクノロジー
- 2023/10/31
RedPajama-Data-v2: an Open Dataset with 30 Trillion Tokens for Training Large Language Models Today, we’re releasing a new version of the RedPajama dataset, with 30 trillion filtered and deduplicated tokens (100+ trillions raw) from 84 CommonCrawl dumps covering 5 languages, along with 40+ pre-computed data quality annotations that can be used for further filtering and weighting. Over the last hal
GitHub - facebookresearch/audio2photoreal: Code and dataset for photorealistic Codec Avatars driven from audio
- 3 users
- github.com/facebookresearch
- テクノロジー
- 2024/01/05
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session. Dismiss alert
GitHub - scrapinghub/article-extraction-benchmark: Article extraction benchmark: dataset and evaluation scripts
- 3 users
- github.com/scrapinghub
- テクノロジー
- 2023/08/28
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session. Dismiss alert
- html
- python
- tool