secondlifeのブックマーク - はてなブックマーク

secondlife id:secondlife

ブックマーク / huggingface.co (62)

Accelerate 1.0.0
secondlife 2024/09/15
リンク
jinaai/jina-colbert-v2 · Hugging Face
secondlife 2024/09/07
“JinaColBERT V2: your multilingual late interaction retriever! ”
リンク
yifeihu/TF-ID-base · Hugging Face
TF-ID: Table/Figure IDentifier for academic papers Model Summary TF-ID (Table/Figure IDentifier) is a family of object detection models finetuned to extract tables and figures in academic papers created by Yifei Hu. They come in four versions:
secondlife 2024/09/06
“TF-ID: Table/Figure IDentifier for academic papers ”
リンク
The 5 Most Under-Rated Tools on Hugging Face
secondlife 2024/08/24
リンク
HuggingFaceFW/fineweb · Datasets at Hugging Face
"},"dump":{"kind":"string","value":"CC-MAIN-2013-20"},"url":{"kind":"string","value":"http://%20jwashington@ap.org/Content/Press-Release/2012/How-AP-reported-in-all-formats-from-tornado-stricken-regions"},"date":{"kind":"string","value":"2013-05-18T05:48:54Z"},"file_path":{"kind":"string","value":"s3://commoncrawl/crawl-data/CC-MAIN-2013-20/segments/1368696381249/warc/CC-MAIN-20130516092621-00000-
secondlife 2024/07/02
高品質・被りを減らしたテキストを抽出したデータセット
リンク
retrieva-jp/bert-1.3b · Hugging Face
secondlife 2024/07/02
encoder(bert)としては大きな1.3bのモデル。
リンク
FineWeb: decanting the web for the finest text data at scale - a Hugging Face Space by HuggingFaceFW
Discover amazing ML apps made by the community
secondlife 2024/06/21
重複を避けてのテキスト抽出手法。fasttext, minhash でのフィルタリング、等々。
リンク
megagonlabs/roberta-long-japanese · Hugging Face
YAML Metadata Error: "datasets[0]" with value "mC4 Japanese" is not valid. If possible, use a dataset id from https://hf.co/datasets. roberta-long-japanese (jumanpp + sentencepiece, mC4 Japanese) This is the longer input version of RoBERTa Japanese model pretrained on approximately 200M Japanese sentences. max_position_embeddings has been increased to 1282, allowing it to handle much longer inputs
secondlife 2024/06/21
max_token=1280 サイズで学習した日本語roberta
リンク
Real-time Whisper WebGPU - a Hugging Face Space by Xenova
Discover amazing ML apps made by the community
secondlife 2024/06/12
YouTubeから学習しているのか、日本語だと「ご視聴ありがとうございました」が、やたら出やすいのが面白いな。
リンク
Training and Finetuning Embedding Models with Sentence Transformers v3
Additionally, you can use SequentialEvaluator to combine multiple evaluators into one, which can then be passed to the SentenceTransf ormerTrainer. If you don't have the necessary evaluation data but still want to track the model's performance on common benchmarks, you can use these evaluators with data from Hugging Face: EmbeddingSimilarityEvaluator with STSb The STS Benchmark (a.k.a. STSb) is a c
secondlife 2024/06/04
Stentence Transformers v3 では Trainer / Loss 周りの変更で、学習させやすく
リンク
Vision Language Models Explained
Finding the right Vision Language Model There are many ways to select the most appropriate model for your use case. Vision Arena is a leaderboard solely based on anonymous voting of model outputs and is updated continuously. In this arena, the users enter an image and a prompt, and outputs from two different models are sampled anonymously, then the user can pick their preferred output. This way, t
secondlife 2024/04/15
zero-shot でも使える Vision LM の紹介と、VLMの探し方、概要、FT方法など。
リンク
GritLM/GritLM-7B · Hugging Face
secondlife 2024/04/12
“GritLM is a generative representational instruction tuned language model. It unifies text representation (embedding) and text generation into a single model achieving state-of-the-art performance on both types of tasks. ”
リンク
Text2SQL using Hugging Face Dataset Viewer API and Motherduck DuckDB-NSQL-7B
secondlife 2024/04/05
duckdb 用の　SQL生成モデル。
リンク
intfloat/multilingual-e5-large-instruct · Hugging Face
secondlife 2024/03/29
指示付きembeddingsモデル。学習データセットに対して、instructをつけ学習することで、より類似するベクトルを作れる
リンク
antoinelouis/colbert-xm · Hugging Face
secondlife 2024/03/28
リンク
nreimers/mmarco-mMiniLMv2-L12-H384-v1 · Hugging Face
secondlife 2024/03/26
cross-encoder-mmarco-mMiniLMv2-L12-H384-v1 の大元?
リンク
BAAI/bge-reranker-v2-m3 · Hugging Face
","cls_token":"","eos_token":"","mask_token":"","pad_token":"","sep_token":"","unk_token":""}},"createdAt":"2024-03-15T13:32:18.000Z","discussionsDisabled":false,"downloads":473434,"downloadsAllTime":3298501,"id":"BAAI/bge-reranker-v2-m3","isLikedByUser":false,"isWatchedByUser":false,"inference":"pipeline-library-pair-not-supported","lastModified":"2024-06-24T14:08:45.000Z","likes":247,"pipeline_t
secondlife 2024/03/23
性能良すぎじゃん
リンク
Export to ONNX
secondlife 2024/03/23
huggingface transformers to onnx
リンク
Binary and Scalar Embedding Quantization for Significantly Faster & Cheaper Retrieval
Improving scalability There are several ways to approach the challenges of scaling embeddings. The most common approach is dimensionality reduction, such as PCA. However, classic dimensionality reduction -- like PCA methods -- tends to perform poorly when used with embeddings. In recent news, Matryoshka Representation Learning (bl ogpost) (MRL) as used by OpenAI also allows for cheaper embeddings.
secondlife 2024/03/23
sentence transformer で int8 量子化方法ととその評価。モデルによってはスコア低下がほぼない。
リンク
Unlocking the conversion of Web Screenshots into HTML Code with the WebSight Dataset
secondlife 2024/03/18
web screenshot + html(tailwind css) の2Mペアのデータセット
リンク
1 2 3 4 次のページ

お知らせ

もっと読む

公式Twitter

@HatenaBookmark
リリース、障害情報などのサービスのお知らせ
@hatebu
最新の人気エントリーの配信

キーボードショートカット一覧

j次のブックマーク

k前のブックマーク

lあとで読む

eコメント一覧を開く

oページを開く

設定を変更しましたx