Thursday, March 21, 2024

Grokking The Advanced System Design Interview

Don't Miss

Data Transfer Happens Directly Several Datanodes

Grokking the System Design Interview: How to Design an API Rate Limiter

All blocks of a file are of the same size except the last one.HDFS uses large block sizes because it is designed to store extremelylarge files to enable MapReduce jobs to process them efficiently.Each block is identified by a unique 64-bit ID called BlockID.All read/write operations in HDFS operate at the block level.DataNodes store each block in a separate file on the local file system andprovide read/write access.When a DataNode starts up, it scans through its local file system andsends the list of hosted data blocks to theNameNode.

The NameNode maintains two on-disk data structures to store the file systems state: an FsImage file and an EditLog. FsImage is a checkpoint of the file system metadata at some point in time, while the EditLog is a log of all of the file system metadata transactions since the image file was last created. These two files help NameNode to recover from failure. User applications interact with HDFS through its client. HDFS Client interacts with NameNode for metadata, but all data transfers happen directly between the client and DataNodes. To achieve high-availability, HDFS creates multiple copies of the data and distributes them on nodes throughout the cluster.

BlockID

replication factor = 3

Linux File System Linux File System Linux File System Linux File System

1 3 4 5 2 3 4 1 2 5 1 2 3 4 5

DataNode

Comparison between GFS and HDFS#

What Is Gossip Protocol #

In a Dynamo cluster, since we do not have any central node that keeps trackof all nodes to know if a node is down or not, how does a node know everyother nodes current state? The simplest way to do this is to have every nodemaintain heartbeats with every other node. When a node goes down, it willstop sending out heartbeats, and everyone else will find out immediately. Butthen O messages get sent every tick ,which is a ridiculously high amount and not feasible in any sizable cluster.

Dynamo uses gossip protocol that enables each node to keep track of stateinformation about the other nodes in the cluster, like which nodes arereachable, what key ranges they are responsible for, and so on . Nodes share state information with eachother to stay in sync. Gossip protocol is a peer-to-peer communicationmechanism in which nodes periodically exchange state information aboutthemselves and other nodes they know about. Each node initiates a gossipround every second to exchange state information about itself and other

40 nodes with one other random node. This means that any new event willeventually propagate through the system, and all nodes quickly learn aboutall other nodes in a cluster.

41 Every second each server exchanges information with one randomly selected server

Server 4 Server 1

Every second each server exchanges information about all the servers it knows about

Server 5

Server 4 Server 1

Gossip protocol

Back Next

Cache Clients Cache Metadata Hdfs Uses Distributed Cache

Replication Chunk replicas are spread The HDFS has an automatic Strategy across the racks. Master auto- rack-ware replication system. matically replicates the chunks. By default, two copies of each By default, three copies of block are stored at two different each chunk are stored. User DataNodes in the same rack, can specify a different replica- and a third copy is stored on a tion factor. Data Node in a different rack . chunk replica as soon as the User can specify a different number of available replicas replication factor. falls below a user-specified

File system Files are organized hierarchi- HDFS supports a traditional hi- Namespace cally in directories and identi- erarchical file organization. fied by pathnames. Users or applications can cre- ate directories to store files inside.

Database Bigtable uses GFS as its stor- HBase uses HDFS as its stor- age engine. age engine.

Read Also: How To Crack Facebook Interview

System Design Interviews Arent Easy

Now more than ever, youre faced with the seemingly impossible:

Interviewers want you to solve challenging problems in record time and without error.

It seems like a daunting task, but the truth is:

Its not impossible.

With the help of a course and plenty of practice, you can do it.

This post contains affiliate links. We may receive compensation if you buy something. Read our disclosure for more details.

Grokking the Advanced System Design InterviewEducative.io

If youre properly prepared for these FAANG interviews, you can show off your design skills and demonstrate how they work with complex systems.

Comparing Consistent Hashing Ring With And Without Vnodes

Grokking Advanced System Design Interview

Practically, Vnodes are randomly distributed across the cluster and aregenerally non-contiguous so that no two neighboring Vnodes are assigned tothe same physical node. Furthermore, nodes do carry replicas of other nodesfor fault-tolerance. Also, since there can be heterogeneous machines in theclusters, some servers might hold more Vnodes than others. The figurebelow shows how physical nodes A, B, C, D, & E are using Vnodes of theConsistent Hash ring. Each physical node is assigned a set of Vnodes andeach Vnode is replicated once.

19 16 1 1 3 5 7 10 Server A 2

14 3 2 4 8 10 12 Server B 2 4 6 9 11 Server C

1 3 5 7 9 11 6 Server D

Also Check: Digital Asset Management Interview Questions

Client Stores A Hint For Server 4

Server 5 Server 3

Now when the node which was down comes online again, how should wewrite data to it? Cassandra accomplishes this through hinted handoff.

When a node is down or does not respond to a write request, the coordinatornode writes a hint in a text file on the local disk. This hint contains the dataitself along with information about which node the data belongs to. Whenthe coordinator node discovers from the Gossiper that a node for which it holds hints has recovered, it forwards the writerequests for each hint to the target. Furthermore, each node every tenminutes checks to see if the failing node, for which it is holding any hints,has recovered.

With consistency level Any, if all the replica nodes are down, thecoordinator node will write the hints for all the nodes and report success tothe client. However, this data will not reappear in any subsequent reads

69 until one of the replica nodes comes back online, and the coordinator nodesuccessfully forwards the write requests to it. This is assuming that thecoordinator node is up when the replica node comes back. This also meansthat we can lose our data if the coordinator node dies and never comes back.For this reason, we should avoid using the Any consistency level.

One thing to remember: When the cluster cannot meet the consistency levelspecified by the client, Cassandra fails the write request and does not store ahint.

It Reduces The Memory Usage Of The Tablet Server As It Flushes The

322 2. Merging Compaction Minor compaction keeps increasing the count of SSTables. This means that read operations might need to merge updates from an arbitrary number of SSTables. To reduce the number of SSTables, a merging compaction is performed which reads the contents of a few SSTables and the MemTable and writes out a new SSTable. The input SSTables and MemTable can be discarded as soon as the compaction has finished.

You May Like: What Can I Ask During An Interview

Here Are The Best Resources For System Design Interviews Including System Design Interview Questions Courses And Cheat Sheets

Hello devs, if you are preparing for a system design interview and looking for the best resources to master software design and system design, then you have come to the right place.

Earlier, I shared the best System Courses, Books, and System Interview Questions, and in this article, I will share the best places to learn system design.

But before we get to the best websites that will teach you everything you need to know about system design, let me tell you a little bit about what it is. Systems design is the process of defining elements of a system, including modules, architecture, components, interface, and data for a system based on a specific set of requirements.

It can also refer to the process of defining, developing, and designing systems. These designs have to satisfy the specific needs of a company or an organization.

Bytebytego By Alex Yu

Grokking the System Design Interview: How to Design a Social Network

This is one of the most amazing platforms to learn about System design concepts and prepare for system design interviews. This website was created by Alex Yu, author of the popular System Design Interview An insider’s guidebook, one of the most recommended books for system design interviews. This website also serves as the digital version of his book, but it offers much more. ‘

Alex has shared a detailed, step-by-step framework to solve system design questions from interviews like How to design YouTube and design a chat system. He also regularly shares interesting content on System Design, which is helpful for learning about essential System design concepts like scaling, caching, and distributed messages.

If you are preparing for a system design, I highly recommend you to check out this website and join his course. You can also use code JALJAD to get a 10% discount.

You May Like: What Is An Informational Interview

Grokking The Advanced System Design Interview Vs Systemsexpert

Today were looking at the course Grokking the Advanced System Design Interview, a FAANG interview prep course on Educative.io.

Were also comparing this course to SystemsExpert, another popular prep tool for system design interviews.

Did you know?According to Techopedia, system design came about before World War II because engineers were attempting to solve complex problems involving communications.

What Happens When The Namenode

Metadata backup #On a NameNode failure, the metadata would be unavailable, and a diskfailure on the NameNode would be catastrophic because the file metadatawould be lost since there would be no way of knowing how to reconstruct 269 the files from the blocks on the DataNodes. For this reason, it is crucial tomake the NameNode resilient to failure, and HDFS provides two mechanismsfor this:

Don’t Miss: Software Testing Technical Interview Questions

Resource Fencing: Under This Scheme The Previously Active Namenode

276 Node fencing: Under this scheme, the previously active NameNode is blocked from accessing all resources. A common way of doing this is to power off or reset the node. This is an effective method of keeping it from accessing anything at all. This technique is also called STONIT or Shoot The Other Node In The Head.

Loaded Its Namespace Image Into Memory

Educative (US): Grokking the Advanced System Design Interview

To solve this problem, Hadoop, in its 2.0 release, added support for HDFSHigh Availability . In this implementation, there are two NameNodes in an active-standby configuration. At any point in time, exactlyone of the NameNodes is in an active state, and the others are in a Standbystate. The active NameNode is responsible for all client operations in thecluster, while the Standby is simply acting as a follower of the active,maintaining enough state to provide a fast failover when required.

For the Standby nodes to keep their state synchronized with the active node,HDFS made a few architectural changes:

The NameNodes must use highly available shared storage to share the EditLog mount from a Network Attached Storage ). When a standby NameNode starts, it reads up to the end of the shared EditLog to synchronize its state with the active NameNode, and then continues to read new entries as the active NameNode writes them. DataNodes must send block reports to all the NameNodes because the block mappings are stored in a NameNodes memory, and not on disk. Clients must be configured to handle NameNode failover, using a mechanism that is transparent to users. Client failover is handled transparently by the client library. The simplest implementation uses client-side configuration to control failover. The HDFS URI uses a logical hostname which is mapped to multiple NameNode addresses, and the client library tries each NameNode address until the operation succeeds.

Zookeeper Ensemble

You May Like: Digital Project Manager Interview Questions

A Unique Sequence Id Called An Offset Gets Assigned To Every Message

Kafka follows the principle of a dumb broker and smart consumer. Thismeans that Kafka does not keep track of what records are read by theconsumer. Instead, consumers, themselves, poll Kafka for new messages andsay what records they want to read. This allows them toincrement/decrement the offset they are at as they wish, thus being able toreplay and reprocess messages. Consumers can read messages starting froma specific offset and are allowed to read from any offset they choose. Thisalso enables consumers to join the cluster at any point in time.

Every topic can be replicated to multiple Kafka brokers to make the datafault-tolerant and highly available. Each topic partition has one leaderbroker and multiple replica brokers.

Leader #A leader is the node responsible for all reads and writes for the givenpartition. Every partition has one Kafka broker acting as a leader.

Follower #To handle single point of failure, Kafka can replicate partitions anddistribute them across multiple broker servers called followers. Eachfollowers responsibility is to replicate the leaders data to serve as a backuppartition. This also means that any follower can take over the leadership ifthe leader goes down.

In the following diagram, we have two partitions and four brokers. Broker1 is the leader of Partition 1 and follower of Partition 2 . Consumerswork together in groups to process messages efficiently. More details onconsumer groups later.

0 1 2 3 4

It Exists For Some Interval Of Time And Is Maintained By Periodic

Client requests a new session on first contacting the master of Chubby cell.

176 A session ends if the client ends it explicitly or it has been idle. A session is considered idle if there are no open handles and calls for a minute. Each session has an associate lease, which is a time interval during which the master guarantees to not terminate the session unilaterally. The end of this interval is called session lease timeout. The master advances the session lease timeout in the following three circumstances: When a master failover occurs When the master responds to a KeepAlive RPC from the client

What is KeepAlive? #

Also Check: What To Say About Yourself In An Interview

Reasons To Use Sql Database

Here are a few reasons to choose a SQL database:

  • We need to ensure ACID compliance. ACID compliance reduces anomalies and protects the integrity of your database by prescribing exactly how transactions interact with the database. Generally, NoSQL databases sacrifice ACID compliance for scalability and processing speed, but for many e-commerce and financial applications, an ACID-compliant database remains the preferred option.
  • Your data is structured and unchanging. If your business is not experiencing massive growth that would require more servers and if youre only working with data thats consistent, then there may be no reason to use a system designed to support a variety of data types and high traffic volume.
  • Chubby Is A Highly Available And Persistent Distributed Locking Service That

    Grokking the System Design Interview

    Chubby usually runs with five active replicas, one of which is elected as the master to serve requests. To remain alive, a majority of Chubby replicas must be running. BigTable depends on Chubby so much that if Chubby is unavailable for an extended period of time, BigTable will also become unavailable. Chubby uses the Paxos algorithm to keep its replicas consistent in the face of failure.

    Chubby provides a namespace consisting of files and directories. Eachfile or directory can be used as a lock.Read and write access to a Chubby file is atomic.Each Chubby client maintains a session with a Chubby service. A clientssession expires if it is unable to renew its session lease within the leaseexpiration time. When a clients session expires, it loses any locks andopen handles. Chubby clients can also register callbacks on Chubby filesand directories for notification of changes or session expiration.In BigTable, Chubby is used to: Ensure there is only one active master. The master maintains a session lease with Chubby and periodically renews it to retain the status of the master. Store the bootstrap location of BigTable data Discover new Tablet servers as well as the failure of existing ones Store BigTable schema information Store Access Control Lists

    Let’s explore the components that constitute BigTable.

    Also Check: Rest Assured Api Testing Interview Questions

    Grouping Together Columns To Form Locality Groups

    Compression #

    Clients can choose to compress the SSTable for a locality group to save space.BigTable allows its clients to choose compression techniques based on theirapplication requirements. The compression ratio gets even better whenmultiple versions of the same data are stored. Compression is applied to eachSSTable block separately.

    Coordinator Node Forwards The Read Request To The Fastest Server

    71 How does Cassandra perform a read operation? The coordinator alwayssends the read request to the fastest node. For example, for Quorum=2, thecoordinator sends the request to the fastest node and the digest of the datafrom the second-fastest node. The digest is a checksum of the data and isused to save network bandwidth.

    If the digest does not match, it means some replicas do not have the latestversion of the data. In this case, the coordinator reads the data from all thereplicas to determine the latest data. The coordinator then returns the latestdata to the client and initiates a read repair request. The read repairoperation pushes the newer version of data to nodes with the older version.

    Snitch

    Fastest server

    Server

    With quorum=2, the coordinator sends a read request to the fastest node server for the actual data and digest request to one of the other replicas.

    Read repair

    While discussing Cassandras write path, we saw that the nodes couldbecome out of sync due to network issues, node failures, corrupted disks, etc.The read repair operation helps nodes to resync with the latest data. Read 72 operation is used as an opportunity to repair inconsistent data acrossreplicas. The latest write-timestamp is used as a marker for the correctversion of data. The read repair operation is performed only in a portion ofthe total reads to avoid performance degradation. Read repairs areopportunistic operations and not a primary operation for anti-entropy.

    Snitch #

    Read Also: Best Book For Data Science Interview

    More articles

    Popular Articles