In computer science, fault injection is a testing technique for understanding how computing systems behave when stressed in unusual ways. This can be achieved using physical- or software-based means, or using a hybrid approach.[1] Widely studied physical fault injections include the application of high voltages, extreme temperatures and electromagnetic pulses on electronic components, such as comp
Previously in Jepsen, we discussed MongoDB. Today, we’ll see how last-write-wins in Riak can lead to unbounded data loss. So far we’ve examined systems which aimed for the CP side of the CAP theorem, both with and without failover. We learned that primary-secondary failover is difficult to implement safely (though it can be done; see, for example, ZAB or Raft). Now I’d like to talk about a very di
Hazelcast is a distributed in-memory data grid, providing shared data structures for distributed systems. We show that many of Hazelcast’s distributed data structures are unsafe in the presence of network partitions: updates to maps can be lost, unique IDs may not be unique, atomic objects are not atomic, locks aren’t exclusive, and queues can forget about enqueued elements. Stale and dirty reads
Previously, on Jepsen, we reviewed Elasticsearch’s progress in addressing data-loss bugs during network partitions. Today, we’ll see Aerospike 3.5.4, an “ACID database”, react violently to a basic partition. [Update, 2018-03-07] See the followup analysis of 3.99.0.3 Aerospike is a high-performance, distributed, schema-less, KV store, often deployed in caching, analytics, or ad tech environments. I
People keep asking why Jepsen is written in Clojure, so I figure it’s worth having a referencable answer. I’ve programmed in something like twenty languages. Why choose a Weird Lisp? Jepsen is built for testing concurrent systems–mostly databases. Because it tests concurrent systems, the language itself needs good support for concurrency. Clojure’s immutable, persistent data structures make it eas
People keep asking why Jepsen is written in Clojure, so I figure it’s worth having a referencable answer. I’ve programmed in something like twenty languages. Why choose a Weird Lisp? Jepsen is built for testing concurrent systems–mostly databases. Because it tests concurrent systems, the language itself needs good support for concurrency. Clojure’s immutable, persistent data structures make it eas
In response to You Do It Too: Forfeiting Partition Tolerance in Distributed Systems, I’d like to remind folks of a few things around CAP. Partition intolerance does not mean that partitions cannot happen, it means partitions are not supported. Specifically, partition-intolerant systems must sacrifice invariants when partitions occur. Which invariants? By Gilbert & Lynch, either the system allows n
Chronos is a distributed task scheduler (cf. cron) for the Mesos cluster management system. In this edition of Jepsen, we’ll see how simple network interruptions can permanently disrupt a Chronos+Mesos cluster Chronos relies on Mesos, which has two flavors of node: master nodes, and slave nodes. Ordinarily in Jepsen we’d refer to these as “primary” and “secondary” or “leader” and “follower” to avo
There’s a neat kind of symmetry here: P1 and P2 are duals of each other, preventing a read from seeing an uncommitted write, and preventing a write from clobbering an uncommitted read, respectively. P0 prevents two writes from stepping on each other, and we could imagine its dual r1(x) … r2(x)–but since reads don’t change the value of x they commute, and we don’t need to prevent them from interlea
Update, 2018-08-24: For a more complete, formal discussion of consistency models, see jepsen.io. Network partitions are going to happen. Switches, NICs, host hardware, operating systems, disks, virtualization layers, and language runtimes, not to mention program semantics themselves, all conspire to delay, drop, duplicate, or reorder our messages. In an uncertain world, we want our software to mai
In this Jepsen report, we’ll verify RethinkDB’s support for linearizable operations using majority reads and writes, and explore assorted read and write anomalies when consistency levels are relaxed. This work was funded by RethinkDB, and conducted in accordance with the Jepsen ethics policy. RethinkDB is an open-source, horizontally scalable document store. Similar to MongoDB, documents are hiera
Percona’s CTO Vadim Tkachenko wrote a response to my Galera Snapshot Isolation post last week. I think Tkachenko may have misunderstood some of my results, and I’d like to clear those up now. I’ve ported the MariaDB tests to Percona XtraDB Cluster, and would like to confirm that using exclusive write locks on all reads, as Tkachenko recommends, can recover serializable histories. Finally, we’ll ad
People keep asking why Jepsen is written in Clojure, so I figure it’s worth having a referencable answer. I’ve programmed in something like twenty languages. Why choose a Weird Lisp? Jepsen is built for testing concurrent systems–mostly databases. Because it tests concurrent systems, the language itself needs good support for concurrency. Clojure’s immutable, persistent data structures make it eas
In the previous Jepsen analysis of RethinkDB, we tested single-document reads, writes, and conditional writes, under network partitions and process pauses. RethinkDB did not exhibit any nonlinearizable histories in those tests. However, testing with more aggressive failure modes, on both 2.1.5 and 2.2.3, has uncovered a subtle error in Rethink’s cluster membership system. This error can lead to st
Some folks have asked whether Cassandra or Riak in last-write-wins mode are monotonically consistent, or whether they can guarantee read-your-writes, and so on. This is a fascinating question, and leads to all sorts of interesting properties about clocks and causality. There are two families of clocks in distributed systems. The first are often termed wall clocks, which correspond roughly to the t
RabbitMQ is a distributed message queue, and is probably the most popular open-source implementation of the AMQP messaging protocol. It supports a wealth of durability, routing, and fanout strategies, and combines excellent documentation with well-designed protocol extensions. I’d like to set all these wonderful properties aside for a few minutes, however, to talk about using your queue as a lock
リリース、障害情報などのサービスのお知らせ
最新の人気エントリーの配信
処理を実行中です
j次のブックマーク
k前のブックマーク
lあとで読む
eコメント一覧を開く
oページを開く