[B! Compression] [4ページ] yassのブックマーク

yass id:yass

Compressionに関するyassのブックマーク (151)

dictd - a dictionary database server
yass 2013/10/25
"The flat text file contains dictionary entries, and the index contains tab-delimited tuples consisting of the headword, the byte offset at which this entry begins in the flat text file, and the length of the entry in bytes. The offset and length are encoded using base 64 encoding"

dictd

dictzip

index

compression

binary search

dictionary
リンク
Random access to blocked XZ format (BXZF)
yass 2013/10/25
" Using this index we can do from desired (decompressed) offset to the raw offset on disk of the start of the block of interest, and the (decompressed) relative offset into that block. "

compression

xz
リンク
GitHub - tuxor1337/dictzip.js: Read-only mirror of https://framagit.org/tuxor1337/dictzip.js. Pull requests and issues on GitHub cannot be accepted and will be automatically closed.
JavaScript library for handling dictzip compressed files effectively, i.e. it does not uncompress and load into memory the whole data blob, but instead provides an interface for (asynchronous or synchronous) random access to the compressed data. Hence it can handle really huge amounts of data which may occur e.g. when working with local files accessed through the W3C's File API. This implementatio
yass 2013/10/24
dictzip

compression

gzip

javascript
リンク
Mathieu's log » Blog Archive » Dictzip reader in Ruby
yass 2013/10/24
" the original uncompressed file is divided into chunks of equal size. Chunks must be below 64KB. Each chunk is compressed separately, then the number of chunks and the size of each chunk (in its compressed form) is kept in a table in the gzip file header. "

gzip

dictzip

compression

dictionary
リンク
dictzip, dictunzip - compress (or expand) files, allowing random access
DICTZIP(1) DICTZIP(1) NAME dictzip, dictunzip - compress (or expand) files, allowing random access SYNOPSIS dictzip [options] name dictunzip [options] name DESCRIPTION dictzip compresses files using the gzip(1) algorithm (LZ77) in a manner which is completely compatible with the gzip file format. An extension to the gzip file format (Extra Field, described in 2.3.1.1 of RFC 1952) allows extra data
yass 2013/10/24
" The CHLEN field specifies the length of a "chunk" of data. The CHCNT field specifies how many chunks are preset, and the CHCNT words of data specifies how long each chunk is after compression (i.e., in the current compressed file). "

compression

gzip

dictionary

dictzip
リンク
Optimize the encoding and transfer size of text-based assets | Articles | web.dev
Optimize the encoding and transfer size of text-based assets Stay organized with collections Save and categorize content based on your preferences. Next to eliminating unnecessary resource downloads, the best thing you can do to improve page load speed is to minimize the overall download size by optimizing and compressing the rem aining resources. Data compression 101 Once you've set up your websit
yass 2013/10/23
compression

toread
リンク
技術/歴史/zip,gzip,zlib,bzip2 - Glamenv-Septzen.net
id: 495 所有者: msakamoto-sf 作成日: 2009-11-22 17:11:47 カテゴリ: Linux UNIX Windows [ Prev ] [ Next ] [ 技術 ] お仕事絡みで、ZIPファイルの歴史が気になったので調べてみた。前々から何となく「gzipとzlibとzipってどう違うんだろう」とは思っていたのだけれど、WindowsでLhacaやLhaplusなどのアーカイブソフト、あるいはXP以降ならOSの機能としてデフォルトでzip圧縮できるし、Linux/UNIXでも2-3回コマンドラインオプションを試行錯誤してmanページ見ればtar.gz作ったり逆にWindows上で圧縮したzipを適当に解凍できるので「ま、いっか。」で済ませてた。でもせっかくなので、技術的な詳細には突っ込まないが、ざっくりとした歴史や流れをWikipediaを中心に追って
yass 2013/10/20
" 技術的な詳細には突っ込まないが、ざっくりとした歴史や流れをWikipediaを中心に追ってみた。/ ちなみに"zip"というのは、「素早い」とか「弾丸のように」という意味で、Katzの友人であるRobert Mahoneyによる提案らしい。"

compression

zip

gzip

deflate

toread
リンク
ログファイルの圧縮方法
圧縮レベル2と3では、bzip2よりずっと短い所要時間で高い圧縮率が得られています。興味深いのはレベル4で、所要時間が大きく増えたのに圧縮率が下がっています。xzはレベル4からLZ法の一致文字列を探すアルゴリズムが変わるので、これが裏目に出ているようです。 bzip2より2割以上高い圧縮率が得られるレベル7以上では、所要時間は5倍以上になります。ログファイルの圧縮方式が混ざるのは何かと面倒なので、5倍の所要時間でこの程度の圧縮率の差ではxzに変更する気にはなれないです。圧縮率はそうでもないですが、xzの伸張速度の速さはとても魅力的です。デフォルトの圧縮率のファイルを伸張するのに、bzip2が1分22秒かかるのに対してxzは25秒しか掛かりません。ログを集計するときに伸張速度が3倍近く速いのはとても有利です。もし圧縮方法を決め直せるならxzにするかもしれません。適宜レベルを調節してbzi
yass 2013/10/11
" gzipとbzip2の圧縮率はアクセスログでは約2倍違います / デフォルトの圧縮レベルでgzipで圧縮すると195MBに、bzip2では104MBになります。その代りに所要時間はずっと長くなります。Xeon X5660 (2.8GHz)のサーバーで測ると / 12.8倍 "

compression

gzip

bzip2

xz
リンク
LZ4 into Hadoop-MapReduce
After a very fast evaluation, LZ4 has been recently integrated into the Apache project Hadoop - MapReduce. This is an important news, since, in my humble opinion, Hadoop is among the most advanced and ambitious projects to date (an opinion which is shared by some). It also serves as an excellent illustration of LZ4 usage, as an in-memory compression algorithm for big server applications. But firs
yass 2013/10/07
" After a very fast evaluation, LZ4 has been recently integrated into the Apache project Hadoop - MapReduce. "

hadoop

lz4

compression
リンク
A BILLION ROWS PER SECOND Metaprogramming Python for Big Data
Ville Tuulos Principal Engineer @ AdRoll ville.tuulos@adroll.com We faced the key technical challenge of modern Business Intelligence: How to query tens of billions of events interactively? Our solution, DeliRoll, is implemented in Python. Everyone knows that Python is SLOW. You can't handle big data with low latency in Python! Small Benchmark Data: 1.5 billion rows, 400 columns - 660GB. Smaller e
yass 2013/09/29
compression

redmine

python

LLVM

integer

columnar storage
リンク
Metaprogramming Python for Big Data
For many companies, understanding what is going on in your business involves lots of data. But, how do you query 10s of billions of data points? How can a company begin to make sense of so much information? Ville Tuulos, Principle Engineer at AdRoll, a company producing tons of big data, demonstrates how AdRoll uses Python to squeeze every bit of performance out of a single high-end server. They m
yass 2013/09/29
compression

python

LLVM

columnar storage

redshift

integer

video
リンク
IBM Developer
IBM Developer is your one-stop location for getting hands-on training and learning in-demand skills on relevant techno logies such as generative AI, data science, AI, and open source.
yass 2013/09/24
"この記事は基本的なデータ圧縮についての入門記事で、圧縮技術に関する数学的な内容とアルゴリズムを初心者向けに解説しています。"

compression

RLE

LZ
リンク
ORCFile in HDP 2: Better Compression, Better Performance - Cloudera Blog
ORCFile in HDP 2: Better Compression, Better Performance The upcoming Hive 0.12 is set to bring some great new advancements in the storage layer in the forms of higher compression and better query performance. Higher Compression ORCFile was introduced in Hive 0.11 and offered excellent compression, delivered through a number of techniques including run-length encoding, dictionary encoding for stri
yass 2013/09/23
" ORCFile was introduced in Hive 0.11 and offered excellent compression, delivered through a number of techniques including run-length encoding, dictionary encoding for strings and bitmap encoding. "

compression

hive

rcFile

hadoop

RLE

dictionary encoding

bitmap encoding

bitmap index

ORCFile
リンク
Byte-dictionary encoding - Amazon Redshift
In byte dictionary encoding, a separate dictionary of unique values is created for each block of column values on disk. (An Amazon Redshift disk block occupies 1 MB.) The dictionary contains up to 256 one-byte values that are stored as indexes to the original data values. If more than 256 values are stored in a single block, the extra values are written into the block in raw, uncompressed form. Th
yass 2013/09/23
" This encoding is very effective when a column contains a limited number of unique values. This encoding is optimal when the data domain of a column is fewer than 256 unique values. Byte dictionary encoding is especially space-efficient if the column holds long character strings. "

redshift

compression

dictionary encoding

columnar storage
リンク
Text255 and Text32k encodings - Amazon Redshift
Text255 and text32k encodings are useful for compressing VARCHAR columns in which the same words recur often. A separate dictionary of unique words is created for each block of column values on disk. (An Amazon Redshift disk block occupies 1 MB.) The dictionary contains the first 245 unique words in the column. Those words are replaced on disk by a one-byte index value representing one of the 245
yass 2013/09/23
" useful for compressing VARCHAR columns in which the same words recur often. A separate dictionary of unique words is created for each block of column values on disk. The dictionary contains the first 245 unique words in the column. Those words are replaced on disk by a one-byte index value "

redshift

compression

dictionary encoding

columnar storage
リンク
Mostly encoding - Amazon Redshift
Mostly encodings are useful when the data type for a column is larger than most of the stored values require. By specifying a mostly encoding for this type of column, you can compress the majority of the values in the column to a smaller standard storage size. The rem aining values that cannot be compressed are stored in their raw form. For example, you can compress a 16-bit column, such as an INT2
yass 2013/09/23
" a raw integer column, which means that its values consume 4 bytes of storage. However, the current range of values in the column is 0 to 309. Therefore, re-creating and reloading this table with MOSTLY16 encoding for VENUEID would reduce the storage of every value in that column to 2 bytes. "

redshift

compression

columnar storage

integer
リンク
Apache HBase I/O - HFile - Cloudera Blog
Introduction Apache HBase is the Hadoop open-source, distributed, versioned storage manager well suited for random, realtime read/write access. Wait wait? random, realtime read/write access? How is that possible? Is not Hadoop just a sequential read/write, batch processing system? Yes, we’re talking about the same thing, and in the next few paragraphs, I’m going to explain to you how HBase achiev
yass 2013/09/23
" HFile v3 / Pack all keys together at beginning of the block and all the value together at the end of the block. In this way you can use two different algorithms to compress key and values. Compress timestamps using the XOR with the first value and use VInt instead of long. "

HBase

cloudera

hadoop

prefix encoding

diff encoding

columnar storage

compression

xor

HFile

bloom filter
リンク
Delta encoding - Amazon Redshift
Delta encodings are very useful for date time columns. Delta encoding compresses data by recording the difference between values that follow each other in the column. This difference is recorded in a separate dictionary for each block of column values on disk. (An Amazon Redshift disk block occupies 1 MB.) For example, suppose that the column contains 10 integers in sequence from 1 to 10. The firs
yass 2013/09/23
" if the column contains 10 integers in sequence from 1 to 10, the first will be stored as a 4-byte integer (plus a 1-byte flag), and the next 9 will each be stored as a byte with the value 1 / the full original value is stored, with a leading 1-byte flag. "

redshift

compression

delta encoding

integer

columnar storage
リンク
JAWS UG中央線 LT Redshift Compression Encodings（圧縮アルゴリズム）
隠れたデータベースの遅延原因を特定し、そのレスポンスの改善手法紹介 @ dbtech showcase Tokyo 2019
yass 2013/09/23
redshift

compression

RLE

delta encoding
リンク
Redshift Compression Encodings（圧縮アルゴリズム）についてもっと調べてみた
RedshiftのCompression Encodingsの種類について調べていたところ、挙動に気になるところがあったので検証してみました。
yass 2013/09/23
redshift

compression

RLE
リンク
前のページ 1 2 3 4 5 6 7 8 次のページ