分散システムのFault Injectionの話 NTTデータテクノロジーカンファレンス2017で発表する際に用いたプレゼン資料 https://oss.nttdata.com/hadoop/event/201710/index.html Read less
People keep asking why Jepsen is written in Clojure, so I figure it’s worth having a referencable answer. I’ve programmed in something like twenty languages. Why choose a Weird Lisp? Jepsen is built for testing concurrent systems–mostly databases. Because it tests concurrent systems, the language itself needs good support for concurrency. Clojure’s immutable, persistent data structures make it eas
About Jepsen Jepsen is an effort to improve the safety of distributed databases, queues, consensus systems, etc. We maintain an open source software library for systems testing, as well as blog posts and conference talks exploring particular systems’ failure modes. In each analysis we explore whether the system lives up to its documentation’s claims, file new bugs, and suggest recommendations for
This clickable map (adapted from Bailis, Davidson, Fekete et al and Viotti & Vukolic) shows the relationships between common consistency models for concurrent systems. Arrows show the relationship between consistency models. For instance, strict serializable implies both serializability and linearizability, linearizability implies sequential consistency, and so on. Colors show how available each m
Update, 2018-08-24: For a more complete, formal discussion of consistency models, see jepsen.io. Network partitions are going to happen. Switches, NICs, host hardware, operating systems, disks, virtualization layers, and language runtimes, not to mention program semantics themselves, all conspire to delay, drop, duplicate, or reorder our messages. In an uncertain world, we want our software to mai
Previously in Jepsen, we discussed Redis. In this post, we’ll see MongoDB drop a phenomenal amount of data. See also: followup analyses of 2.6.7 and 3.4.0-rc3. MongoDB is a document-oriented database with a similar distribution design to Redis. In a replica set, there exists a single writable primary node which accepts writes, and asynchronously replicates those writes as an oplog to N secondaries
This post covers Elasticsearch 1.1.0. In the months since its publication, Elasticsearch has added a comprehensive overview of correctness issues and their progress towards fixing some of these bugs. Previously, on Jepsen, we saw RabbitMQ throw away a staggering volume of data. In this post, we’ll explore Elasticsearch’s behavior under various types of network failure. Elasticsearch is a distribut
Previously, on Jepsen, we demonstrated stale and dirty reads in MongoDB. In this post, we return to Elasticsearch, which loses data when the network fails, nodes pause, or processes crash. Nine months ago, in June 2014, we saw Elasticsearch lose both updates and inserted documents during transitive, nontransitive, and even single-node network partitions. Since then, folks continue to refer to the
People keep asking why Jepsen is written in Clojure, so I figure it’s worth having a referencable answer. I’ve programmed in something like twenty languages. Why choose a Weird Lisp? Jepsen is built for testing concurrent systems–mostly databases. Because it tests concurrent systems, the language itself needs good support for concurrency. Clojure’s immutable, persistent data structures make it eas
Previously: Hexing the technical interview. In the formless days, long before the rise of the Church, all spells were woven of pure causality, all actions were permitted, and death was common. Many witches were disfigured by their magicks, found crumpled at the center of a circle of twisted, glass-eaten trees, and stones which burned unceasing in the pooling water; some disappeared entirely, or wa
[As of February 23, 2017, CockroachDB Beta Passed Jespen Testing] We at Cockroach Labs absolutely love Aphyr’s work. We are avid readers of the Jepsen series – which some know as a high quality review of the correctness and consistency claims of modern database systems, but which we really know as “Aphyr’s hunting tales about the highest profile bugs in our industry.” Most of us read each new blog
Some folks have asked whether Cassandra or Riak in last-write-wins mode are monotonically consistent, or whether they can guarantee read-your-writes, and so on. This is a fascinating question, and leads to all sorts of interesting properties about clocks and causality. There are two families of clocks in distributed systems. The first are often termed wall clocks, which correspond roughly to the t
Chronos is a distributed task scheduler (cf. cron) for the Mesos cluster management system. In this edition of Jepsen, we’ll see how simple network interruptions can permanently disrupt a Chronos+Mesos cluster Chronos relies on Mesos, which has two flavors of node: master nodes, and slave nodes. Ordinarily in Jepsen we’d refer to these as “primary” and “secondary” or “leader” and “follower” to avo
This article is part of Jepsen, a series on network partitions. We’re going to learn about distributed consensus, discuss the CAP theorem’s implications, and demonstrate how different databases behave under partition. Modern software systems are composed of dozens of components which communicate over an asynchronous, unreliable network. Understanding the reliability of a distributed system’s dynam
Previously: Reversing the technical interview. Long ago, on Svalbard, when you were a young witch of forty-three, your mother took your unscarred wrists in her hands, and spoke: Vidrun, born of the sea-wind through the spruce Vidrun, green-tinged offshoot of my bough, joy and burden of my life Vidrun, fierce and clever, may our clan’s wisdom be yours: Never read Hacker News But Hacker News has rea
Hazelcast is a distributed in-memory data grid, providing shared data structures for distributed systems. We show that many of Hazelcast’s distributed data structures are unsafe in the presence of network partitions: updates to maps can be lost, unique IDs may not be unique, atomic objects are not atomic, locks aren’t exclusive, and queues can forget about enqueued elements. Stale and dirty reads
Previously on Jepsen, we explored two-phase commit in Postgres. In this post, we demonstrate Redis losing 56% of writes during a partition. Redis is a fantastic data structure server, typically deployed as a shared heap. It provides fast access to strings, lists, sets, maps, and other structures with a simple text protocol. Since it runs on a single server, and that server is single-threaded, it o
Aerospike is a high-performance distributed document store. Following up on our 2015 analysis, we explored Aerospike’s new strong-consistency mode, which offers linearizable operations on single records. We confirmed two documented flaws in Aerospike’s homegrown replication algorithm. First, it can lose updates when more than k nodes crash (either concurrently or in sequence). Second, when either
In this Jepsen report, we’ll verify RethinkDB’s support for linearizable operations using majority reads and writes, and explore assorted read and write anomalies when consistency levels are relaxed. This work was funded by RethinkDB, and conducted in accordance with the Jepsen ethics policy. RethinkDB is an open-source, horizontally scalable document store. Similar to MongoDB, documents are hiera
A few weeks ago I criticized a proposal by Antirez for a hypothetical linearizable system built on top of Redis WAIT and a strong coordinator. I showed that the coordinator he suggested was physically impossible to build, and that anybody who tried to actually implement that design would run into serious problems. I demonstrated those problems (and additional implementation-specific issues) in an
Dgraph is a distributed graph database which uses Raft for per-shard replication and a custom transactional protocol, based on Omid, Reloaded, for snapshot-isolated cross-shard transactions. Dgraph claimed to offer snapshot isolation, per-client monotonicity, and linearizability. However, in Dgraph 1.0.2 through 1.0.6, we found multiple deadlocks & crashes in the cluster join and node recovery pro
Previously in Jepsen, we discussed MongoDB. Today, we’ll see how last-write-wins in Riak can lead to unbounded data loss. So far we’ve examined systems which aimed for the CP side of the CAP theorem, both with and without failover. We learned that primary-secondary failover is difficult to implement safely (though it can be done; see, for example, ZAB or Raft). Now I’d like to talk about a very di
In April 2015, we discussed stale and dirty reads in MongoDB 2.6.7. However, writes appeared to be safe; update-only workloads with majority write concern were linearizable. This conclusion was not entirely correct. In this Jepsen analysis, we develop new tests which show the MongoDB v0 replication protocol is intrinsically unsafe, allowing the loss of majority-committed documents. In addition, we
Previously: Debugging. In this chapter, we’ll discuss some of Clojure’s mechanisms for polymorphism: writing programs that do different things depending on what kind of inputs they receive. We’ll show ways to write open functions, which can be extended to new conditions later on, without changing their original definitions. Along the way, we’ll investigate Clojure’s type system in more detail–disc
In the last Jepsen post, we found that RethinkDB could lose data when a network partition occurred during cluster reconfiguration. In this analysis, we’ll show that although VoltDB 6.3 claims strict serializability, internal optimizations and bugs lead to stale reads, dirty reads, and even lost updates; fixes are now available in version 6.4. This work was funded by VoltDB, and conducted in accord
People keep asking why Jepsen is written in Clojure, so I figure it’s worth having a referencable answer. I’ve programmed in something like twenty languages. Why choose a Weird Lisp? Jepsen is built for testing concurrent systems–mostly databases. Because it tests concurrent systems, the language itself needs good support for concurrency. Clojure’s immutable, persistent data structures make it eas
In the previous Jepsen analysis of RethinkDB, we tested single-document reads, writes, and conditional writes, under network partitions and process pauses. RethinkDB did not exhibit any nonlinearizable histories in those tests. However, testing with more aggressive failure modes, on both 2.1.5 and 2.2.3, has uncovered a subtle error in Rethink’s cluster membership system. This error can lead to st
In the last Jepsen analysis, we saw that RethinkDB 2.2.3 could encounter spectacular failure modes due to cluster reconfiguration during a partition. In this analysis, we’ll talk about Crate, and find out just how many versions a row’s version identifies. Crate is a shared-nothing, “infinitely scalable”, eventually-consistent SQL database built on Elasticsearch. Because Elasticsearch has and conti
There’s a neat kind of symmetry here: P1 and P2 are duals of each other, preventing a read from seeing an uncommitted write, and preventing a write from clobbering an uncommitted read, respectively. P0 prevents two writes from stepping on each other, and we could imagine its dual r1(x) … r2(x)–but since reads don’t change the value of x they commute, and we don’t need to prevent them from interlea
Get Tested You can hire Jepsen to analyze a database, queue, or other kind of system. Jepsen also offers training and consulting to help you build and extend your own tests. Techniques Jepsen occupies a particular niche of the correctness testing landscape. We emphasize: Opaque-box systems testing: we evaluate real binaries running on real clusters. This allows us to test systems without access to
Earlier versions of Jepsen found glaring inconsistencies, but missed subtle ones. In particular, Jepsen was not well equipped to distinguish linearizable systems from sequentially or causally consistent ones. When people asked me to analyze systems which claimed to be linearizable, Jepsen could rule out obvious classes of behavior, like dropping writes, but couldn’t tell us much more than that. Si
In response to You Do It Too: Forfeiting Partition Tolerance in Distributed Systems, I’d like to remind folks of a few things around CAP. Partition intolerance does not mean that partitions cannot happen, it means partitions are not supported. Specifically, partition-intolerant systems must sacrifice invariants when partitions occur. Which invariants? By Gilbert & Lynch, either the system allows n
Previously, on Jepsen, we reviewed Elasticsearch’s progress in addressing data-loss bugs during network partitions. Today, we’ll see Aerospike 3.5.4, an “ACID database”, react violently to a basic partition. [Update, 2018-03-07] See the followup analysis of 3.99.0.3 Aerospike is a high-performance, distributed, schema-less, KV store, often deployed in caching, analytics, or ad tech environments. I
j次のブックマーク
k前のブックマーク
lあとで読む
eコメント一覧を開く
oページを開く