Monday, January 30, 2023

System Design Interview Url Shortener

Don't Miss

How Many Short Urls Can We Store

URL shortener system design | tinyurl system design | bitly system design

With 1 GB of free maximum stored data limit in mind, lets try to estimate how many URLs can we possibly store. Here, I am using this tool to estimate the byte size of the URL:

  • 1 character is 1 byte
  • Since our UUID should only be a maximum of 8 characters, we have no issue with the key size limit.
  • For the value size limit on the other hand I am making a calculated guess that the maximum URL size should average around 200 characters. Thus, I believe it is safe to assume that each stored object should be an average of 400 bytes, which is very well below 25 MiB.
  • And finally, with 1 GB to work with, our URL shortener can support up to a total of 2,500,000 short URLs.
  • I know, I know. 2.5 million URLs is not a lot.

Looking back, we could have made the length of our UUID 4 instead of 8 as 62 possibilities are well more than 2.5 million. Having that said, lets stick with a UUID with a length of 8.

Overall, I would say that the free tier for Cloudflare Worker and KV is pretty generous and decent enough for our POC. Do note that the limits are applied on a per-account basis.

Expected Features In The System

  • The system should be able to generate a short link that is easy to copy.
  • That short link should be able to redirect the page of the original link.
  • The service should be available throughout the day.
  • There should be an option for the user to be able to pick a custom name.
  • Shorter links should not be able to guess and redirect should happen with minimum latency .
  • The service should maintain the analytics.

System Design Interview Guide For Software Engineers

The objective of system design interviews is to evaluate a candidate’s skill at designing real-world software systems involving multiple components. System design questions are typically given to more senior candidates . Interns aren’t typically given system design questions as it is hard to expect interns to have sufficient and relevant industry experience to answer this type of questions well.

Some common questions include:

  • Design a URL shortener
  • Design a social media website
  • Design a video watching website
  • Design a chatting service
  • Design a file sharing service
  • Design a ride sharing service
  • Design a photo sharing service
  • Design an e-commerce website
  • Design a jobs portal
  • Design a web crawler

System design content is still work-in-progress, but the following are some resources to help you in the meanwhile.

Don’t Miss: How To Answer Project Management Interview Questions

Why Do We Need Url Shortening

URL shortening is used to create shorter aliases for long URLs. Lets call these shortened aliases short links. Users are redirected to the original URL when they hit these short links. Short links save a lot of space when displayed, printed, messaged, or tweeted. Additionally, users are less likely to mistype shorter URLs.For example, when we shortened this page through TinyURL:

The shortened URL is nearly one-third the size of the actual URL.

URL shortening is used to optimize links across devices, track individual links to analyze audience, measure ad campaigns performance, or hide affiliated original URLs.If you havent used tinyurl.com before, please try creating a new shortened URL and spend some time going through the various options their service offers. This will help you a lot in understanding this chapter.

How To Shard The Data

Geekibuti

You can notice that data can be queried using a single value – link id.Thanks to that key-value storage will be a good choice here and link_id will be the key.Surprisingly, many problems require only key-value storage.

Data can be easily sharded based on key using some range policy or hashing algorithm e.g.

  • keys with prefix a goes to node 1. With prefix b to node 2 etc.
  • or hash % number_of_nodes == node_id
  • You just need to store these rules somewhere which should be easy as its quite small data.In other systems, you might need to design a more sophisticated sharding algorithm.

    Simple sharding base on modulo algorithm:

    Note that each node stores independent data in the above example and nodes do not need to communicate with each other to read or store data.All thanks to a routing algorithm.At this point, you can imagine that you can store the data in any SQL DB or in document DB like Mongo DB .

    Important questions to ask:

    What will happen after a year when you need to double capacity? What will happen in yet another year?

    The routing algorithm might stop working or can become inefficient when we will add a new node.E.g. because hash % old_number_of_nodes != hash % new_number_of_nodes.How would you mitigate that?You can e.g:

  • Create a special node adding procedure that will mitigate that
  • Our routing algorithm can support that case e.g. take a look into Chordâ© DHT algorithm.
  • Also Check: What Are Some Good Questions To Ask In An Interview

    Build Composable Web Applications

    Dont build web monoliths. Use Bit to create and compose decoupled software components in your favorite frameworks like React or Node. Build scalable frontends and backends with a powerful and enjoyable dev experience.

    Bring your team to Bit Cloud to host and collaborate on components together, and greatly speed up, scale, and standardize development as a team. Start with composable frontends like a Design System or Micro Frontends, or explore the composable backend. Give it a try

    Using Base Conversion To Generate Slugs

    We usually use base-10 numbers, which allow 10 possible numerals:0,1,2,3,4,5,6,7,8, and9.

    Binary is base-2 and has 2 possible numerals: 0 and 1.

    Our random slug alphabet has 62 possible numerals . So we can think of each of our possible “random” slugs as a unique number, expressed in base-62.

    So let’s keep track of a global currentRandomSlugId. When a request for a new random slug comes in, we simply convert that number to base-62 and return it. Oh, and we increment the currentRandomSlugId, in preparation for the next request for a random slug.

    Where should we store our currentRandomSlugId? We can keep it in memory on our web server, perhaps with a regular writethrough to the database, to make it persistent even if the web server crashes. But what if we have multiple front-end web servers?

    How do we do the base conversion? This is easiest to show by example.

    Take the number 125 in base 10.

    It has a 1 in the 100s place, a 2 in the 10s place, and a 5 in the 1s place. In general, the places in a base-10 number are:

    Don’t Miss: What Does An Exit Interview Consist Of

    Lets Understand The Problem

    Design a URL Shortening service like Tiny-URL service. The goal is to design a highly scalable service that could allow users to create shorter URLs, given long URLs, and have read and write functionality.

    What is Tiny-URL?

    Tiny-URL is a URL-shortening web service that creates shorter aliases for long URLs. Whenever the user visits the short URL, he/she will be redirected to the original URL.

    Before jumping to the solution, as an interviewee, we should always discuss with the interviewer what he/she wants. This brings us to a discussion around the requirements and features of the system.

    How Much Request Does The System Need To Handle

    TinyURL System Design | URL Shortner System Design Interview Question | Bitly System Design

    Lets assume, one user may request for a new URL and use it 100 times for redirection. So, the ratio between write and read would be 1:100. So the system is read-heavy.

    How many URL requests do we need to handle in the service? Lets say we may get 200 URL requests per second. So, for a months calculation, we can have 30 days * 24 hours * 3600 seconds*200 =~ 500 M requests.

    So, there can be almost 500M new URL shortening requests per month. Then, the redirection request would be 500M*100 = 50 Billion.

    For year count you have to multiply this number by 12.

    You May Like: What Questions To Ask In Exit Interview

    Bring It All Together

    Refine the architecture

    Candidate: Okay, so here is the revised architecture with the the distributed ID generation part:

    Interviewer: Seems to be coming together.

    Candidate: Yes. Though one thing we haven’t settled on yet is the database technology. I’m thinking that since the schema is quite simple, and the link records are independent , we are not constrained to traditional relational databases. Various NoSQL solutions have good support for sharding and horizontal scaling. I’m wondering whether a key-value store or a document-oriented database would be the better option.

    Interviewer: Sounds logical. What would drive the consideration between the two technologies?

    Candidate: I think it depends on how much information we are storing for each link, and how that information is used. In a key-value store, it is not usually straightforward to query on anything other than the key. We have a few other fields that might be useful to query on, especially the userId and expiry dates.

    Interviewer: I think it’s reasonable to assume that there would be queries at least occasionally beyond just the key.

    Candidate: Okay, in that case, I think we should go with a document store, as it will provide that flexibility.

    Interviewer: Makes sense.

    Requirements To Design A Url Shortening Service

    To begin with, we should know what our requirements are while making a URL shortening service.

  • Our first and foremost requirement is to generate a shorter and unique version of the long URL given to us. The new URL must be short since thats the main motive of our system, and it must be unique so that it leads us to our desired web page and not some other one.
  • An example of this would be, shortening this link:

    www.codingninjas.com/blog/2021/07/23/7-common-system-design-interview-questions/

  • To understand our next requirement, try clicking on both the links given above.
  • Dont they lead you to the same webpage?

    Thats the next requirement of our URL shortening service, i.e., the shortened link must redirect us to the original link.

  • Now, when you clicked both the links, it shouldnt have taken about the same amount of time to open both the original link and that shortened link. This explains our third requirement of redirection in real-time.
  • For our next requirement, let us look at the original and shortened links again.
  • Original link:

    Shortened version:

    bit.ly/3AXF9Dz

    It is evident from the original link what the webpage is about, but is it so for the shortened link too?

    With this realisation, we come to our fourth requirement that shortened links should not be predictable.

  • Another critical requirement for a URL shortening service is that the system should be available almost all the time. If it isnt, then all the URL redirections will fail.
  • Some other requirements are:
  • Recommended Reading: What Do They Ask In Exit Interviews

    How To Scale Reads

    • Caching– By introducing a cache, we can scale the read queries. If multiple queries are received with the same short URL, the service can return long URL from the cache. We can use LRU as the Cache eviction policy.

    Caching

    • Data replication- We can segregate the reads from the write queries. Writes can be performed on the master and data can be replicated to slaves. Slaves can be used for executing read queries.

    Data replication

    I Built My Own Tinyurl Heres How I Did It

    Url Shortener System Design Interview

    Designing a URL shortener such as TinyURL and Bitly is one of the most common system design interview questions in software engineering.

    While meddling around with Cloudflare Worker to sync the Daily LeetCode Challenge to my Todoist, it gave me an idea to build an actual URL shortener that can be used by anyone.

    What follows is my thought process with code examples on how we can create a URL shortener using Cloudflare Worker. If you would like to follow through, you would need a Cloudflare account and use the Wrangler CLI.

    Also Check: How To Study For Google Interview

    Constraints And Use Cases

    Just like algorithm design, system design questions will also most likely be weakly defined. Consider the question about the URL-shortening service . There are so many things that are unclear about it! Without knowing more, it will be impossible to design an appropriate solution. Actually, many candidates forget about this and start designing a solution immediately.

    Dont make this mistake!

    The very first thing you should do with any system design question is to clarify the system’s constraints and to identify what use cases the system needs to satisfy. Spend a few minutes questioning your interviewer and agreeing on the scope of the system. Many of the same rules we discussed while talking about algorithm design apply here as well.

    Usually, part of what the interviewer wants to see is if you can gather the requirements about the problem at hand, and design a solution that covers them well. Never assume things that were not explicitly stated.

    For example, the URL-shortening service could be meant to serve just a few thousand users, but each could be sharing millions of URLs. It could be meant to handle millions of clicks on the shortened URLs, or dozens. The service may have to provide extensive statistics about each shortened URL , or statistics may not be a requirement at all.

    B Generating Keys Offline

    We can have a standalone Key Generation Service that generates random six-letter strings beforehand and stores them in a database . Whenever we want to shorten a URL, we will take one of the already-generated keys and use it. This approach will make things quite simple and fast. Not only are we not encoding the URL, but we wont have to worry about duplications or collisions. KGS will make sure all the keys inserted into key-DB are unique

    Can concurrency cause problems? As soon as a key is used, it should be marked in the database to ensure that it is not used again. If there are multiple servers reading keys concurrently, we might get a scenario where two or more servers try to read the same key from the database. How can we solve this concurrency problem?

    Servers can use KGS to read/mark keys in the database. KGS can use two tables to store keys: one for keys that are not used yet, and one for all the used keys. As soon as KGS gives keys to one of the servers, it can move them to the used keys table. KGS can always keep some keys in memory to quickly provide them whenever a server needs them.

    For simplicity, as soon as KGS loads some keys in memory, it can move them to the used keys table. This ensures each server gets unique keys. If KGS dies before assigning all the loaded keys to some server, we will be wasting those keyswhich could be acceptable, given the huge number of keys we have.

    6 * 68.7B = 412 GB.

    Don’t Miss: What Are Some Interview Questions For Medical Assistants

    A Encoding Actual Url

    We can compute a unique hash of the given URL. The hash can then be encoded for display. This encoding could be base36 or base62 and if we add + and / we can use Base64 encoding. A reasonable question would be, what should be the length of the short key? 6, 8, or 10 characters?

    Using base64 encoding, a 6 letters long key would result in 64 = ~68.7 billion possible strings.Using base64 encoding, an 8 letters long key would result in 64 = ~281 trillion possible strings.

    With 68.7B unique strings, lets assume six letter keys would suffice for our system.

    If we use the MD5 algorithm as our hash function, it will produce a 128-bit hash value. After base64 encoding, well get a string having more than 21 characters . Now we only have space for 6 characters per short key how will we choose our key then? We can take the first 6 letters for the key. This could result in key duplication to resolve that, we can choose some other characters out of the encoding string or swap some characters.

    What are the different issues with our solution? We have the following couple of problems with our encoding scheme:If multiple users enter the same URL, they can get the same shortened URL, which is not acceptable.

    What if parts of the URL are URL-encoded? e.g., , and are identical except for the URL encoding.

    Definition Of The System:

    A Toolkit for Tackling the URL Shortening (TinyURL) System Design Interview

    We need to clarify the goal of the system. System design is such a vast topic if we dont narrow it down to a specific purpose, then it will become complicated to design the system, especially for newbies. URL shortening service provides shorter aliases for long URLs. When users hit the shortened links, they will be redirected to the original URL.

    Recommended Reading: Is Interview Kickstart Worth It

    System Design Of An Url Shortener

    You might have seen URL shortener services such as TinyURL or Bit.ly which are used for using a short alias instead of long URLs. Even wondered why and how these kinds of applications work? In this article, we’ll be discussing the system design of a URL shortener and compare it with some applications currently in use.

    Before reading this article, it is suggested if you could go and try out tinyurl.com so that you could understand this article better.

    How To Design A Tiny Url Or Url Shortener

    How to design a system that takes big URLs like https://www.geeksforgeeks.org/count-sum-of-digits-in-numbers-from-1-to-n/ and converts them into a short 6 character URL. It is given that URLs are stored in the database and every URL has an associated integer id.

    One important thing to note is, the long URL should also be uniquely identifiable from the short URL. So we need a Bijective Function

    Read Also: What Questions To Ask During An Interview

    More articles

    Popular Articles