サクサク読めて、アプリ限定の機能も多数!
トップへ戻る
体力トレーニング
issues.apache.org
Currently, Lookup plugins [1] don't support JNDI resources. It would be really convenient to support JNDI resource lookup in the configuration. One use case with JNDI lookup plugin is as follows: I'd like to use RoutingAppender [2] to put all the logs from the same web application context in a log file (a log file per web application context). And, I want to use JNDI resources look up to determine
Public signup for this instance is disabled. Go to our Self serve sign up page to request an account. You must log in to access this page. If you think you shouldn't get this message, please contact your Jira administrators.
Hi Legal, There's a hypothetical question on the Apache Cassandra mailing list about potentially expanding Cassandra's storage to be pluggable, specifically using RocksDB. RocksDB has a 3 clause BSD license ( https://github.com/facebook/rocksdb/blob/master/LICENSE ), and a patent grant ( https://github.com/facebook/rocksdb/blob/master/PATENTS ) I know the 3 clause BSD license is fine, but is the w
API Evolution in Spark 2.0 This document describes the high level API changes in Spark 2.0, as well as relationships between all the top level important classes. This document does not discuss APIs in MLlib. Author: Reynold Xin <rxin@databricks.com>, Matei Zaharia <matei@databricks.com> History 20160229: first draft 20160302: update data source compatibility section about using class loade
With InvokerTransformer serializable collections can be build that execute arbitrary Java code. sun.reflect.annotation.AnnotationInvocationHandler#readObject invokes #entrySet and #get on a deserialized collection. If you have an endpoint that accepts serialized Java objects (JMX, RMI, remote EJB, ...) you can combine the two to create arbitrary remote code execution vulnerability. I don't know of
After tackling the general k-Nearest Neighbor model as per https://issues.apache.org/jira/browse/SPARK-2335 , there's an opportunity to also offer approximate k-Nearest Neighbor. A promising approach would involve building a kd-tree variant within from each partition, a la http://www.autonlab.org/autonweb/14714.html?branch=1&language=2 This could offer a simple non-linear ML model that can label n
Based on our observation, majority of Spark workloads are not bottlenecked by I/O or network, but rather CPU and memory. This project focuses on 3 areas to improve the efficiency of memory and CPU for Spark applications, to push performance closer to the limits of the underlying hardware. Memory Management and Binary Processing Avoiding non-transient Java objects (store them in binary format), whi
The underlying abstraction for blocks in spark is a ByteBuffer : which limits the size of the block to 2GB. This has implication not just for managed blocks in use, but also for shuffle blocks (memory mapped blocks are limited to 2gig, even though the api allows for long), ser-deser via byte array backed outstreams (SPARK-1391), etc. This is a severe limitation for use of spark when used on non tr
We normally config spark.port.maxRetries in properties file or SparkConf. But in Utils.scala it read from SparkEnv's conf. As SparkEnv is an object whose env need to be set after JVM is launched and Utils.scala is also an object. So in most cases portMaxRetries will get the default value 16.
HDFS currently has no support for managing or exposing in-memory caches at datanodes. This makes it harder for higher level application frameworks like Hive, Pig, and Impala to effectively use cluster memory, because they cannot explicitly cache important datasets or place their tasks for memory locality.
HBase Tier Based Compaction by Akashnil Dutta 1. Overview The goal of the compaction selection algorithm is to schedule compactions efficiently. The current algorithm takes a set of candidate files as input, and produces a subset as output. If there is no eligible compactions, the output set can be empty. The candidate set is made of all the files in one region which are not already scheduled for
Public signup for this instance is disabled. Go to our Self serve sign up page to request an account.
There are several limitations of the current RC File format that I'd like to address by creating a new format: each column value is stored as a binary blob, which means: the entire column value must be read, decompressed, and deserialized the file format can't use smarter type-specific compression push down filters can't be evaluated the start of each row group needs to be found by scanning user m
Computing aggregates over a cube of several dimensions is a common operation in data warehousing. The standard SQL syntax is "GROUP relation BY dim1, dim2, dim3 WITH CUBE" – which in addition to all dim1-2-3, produces aggregations for just dim1, just dim1 and dim2, etc. NULL is generally used to represent "all". A presentation by Arnab Nandi describes how one might implement efficient cubing in Ma
According to several benchmark sites, LZ4 seems to overtake other fast compression algorithms, especially in the decompression speed area. The interface is also trivial to integrate (http://code.google.com/p/lz4/source/browse/trunk/lz4.h) and there is no license issue.
Now I want to add a complete cost-based optimization for hive. but when I begin the work, I found it very difficult to do using current hive optimization framework. The current code of hive, optimizations are all done after generating DAG of operators. It is a awful design and makes me mad. For example, the map-side optimization, it scans the whole operators' DAG and try to find the operators that
We need a totem for our t-shirt that is yet to be printed. O'Reilly owns the Clyesdale. We need something else. We could have a fluffy little duck that quacks 'hbase!' when you squeeze it and we could order boxes of them from some off-shore sweatshop that subcontracts to a contractor who employs child labor only..... Or we could have an Orca (Big!, Fast!, Killer!, and in a poem that Marcy from Sal
Atilika Inc. (アティリカ株式会社) would like to donate the Kuromoji Japanese morphological analyzer to the Apache Software Foundation in the hope that it will be useful to Lucene and Solr users in Japan and elsewhere. The project was started in 2010 since we couldn't find any high-quality, actively maintained and easy-to-use Java-based Japanese morphological analyzers, and these become many of our design g
Using the Apache Lucene library we can add freetext search to HBase. The advantages of this are: HBase is highly scalable and distributed HBase is realtime Lucene is a fast inverted index and will soon be realtime (see LUCENE-2312) Lucene offers many types of queries not currently available in HBase (eg, AND, OR, NOT, phrase, etc) It's easier to build scalable realtime systems on top of already ar
Limited join functionality for Solr, mapping one set of IDs matching a query to another set of IDs, based on the indexed tokens of the fields. Example: fq= {!join from=parent_ptr to:parent_id} child_doc:query
1 Append/Hflush/Read Design Hairong Kuang, Konstantin Shvachko, Nicholas Sze, Sanjay Radia, Robert Chansler Yahoo! HDFS team 08/06/2009 1. Design challenges With hflush, HDFS needs to make the last block of an unclosed file visible to readers. This presents two challenges: 1. Read consistency. At a given time different replicas of the last block may have different number of bytes. What read consis
Apache currently hosts two different issue tracking systems, Bugzilla and Jira. Projects may use Github for issue tracking instead. To find out how to report an issue for a particular project, please visit the project resource listing or check the relevant project website. Bugzilla Bugzilla (SpamAssassin) Bugzilla (OpenOffice) Jira
The goal is to run all TPC-H (http://www.tpc.org/tpch/) benchmark queries on Hive for two reasons. First, through those queries, we would like to find the new features that we need to put into Hive so that Hive supports common SQL queries. Second, we would like to measure the performance of Hive to find out what Hive is not good at. We can then improve Hive based on those information. For queries
This is a proposal for a system specialized in running Hadoop/Pig jobs in a control dependency DAG (Direct Acyclic Graph), a Hadoop workflow application. Attached there is a complete specification and a high level overview presentation. Highlights A Workflow application is DAG that coordinates the following types of actions: Hadoop, Pig, Ssh, Http, Email and sub-workflows. Flow control operations
次のページ
このページを最初にブックマークしてみませんか?
『issues.apache.org』の新着エントリーを見る
j次のブックマーク
k前のブックマーク
lあとで読む
eコメント一覧を開く
oページを開く