Tuesday, January 1, 2019

[System Design]User System

Scenario: Design a user system to support login, register, query, edit and friendship management

Assume 100M users to support
Log in, register and edit are not frequent request, each user do 0.1 time per day, QPS = 0.1 * 100M / 86400 = 115 per day, peak QPS = 345

If we support session then it will be higher, since every time user send a request, it will send the cookie which contains the session key, and we need to make sure the key is still valid. Assume users send 100 request per day on average, QPS = 100 * 100M / 86400 = 115K, peak QPS = 345K

query will also be more often(self, friends), assume we each user do 10 operations per day, QPS = 12K, peak QPS = 36K

Services:

Session Service: to manage logging
User Service: Store user profiles, handle register, queries, etc
Friend Service: Manage Friendship

Storage:

Session Table:


Field Type Detail
session_key primary key
owner_id int64
device string
expiration_time timestamp

There are some minor things we can change here, if we want to support login from multiple devices, every time when user logs in, we can create entry for each device. Otherwise, we delete the previous existing one and insert new entry.
Both SQL and NoSQL is ok, since it is more like a key-value pattern, we can use NoSql here, since the session_key will be queried frequently, we also need to cache it.

User Table:

Field Type Detail
user_id primary key
username string
pswd_hash string
other_information other information, we won't list all of them here
create_at timestamp

SQL database, we may need to index by user_id, username and create_at. To prevent rainbow table attack, we can add salt when hashing

Friend table depends on how we actually implement the friendship, if it is a one-way, then it is pretty simple <from_user_id, to_user_id> pair.
If it is two way, then we have different ways to store them, in a SQL table, we can store one copy of each relationship like this:

Field Type Detail
smaller_id int64
larger_id int64
create_at timestamp

And we can query like SELECT * FROM friend_table WHERE smaller_id = id OR larger_id = id.
Or we can store two copies so it is easy for query, SELECT * FROM friend_table WHERE from_id = id.

Field Type Detail
from_id int64
to_id int64
create_at timestamp

If we choose Nosql database, we will also need store as the second way since we can't do multi-index.
We choose the second way since it will be better for sharding, we can shard based on from_user_id, so all queries for single user will go to a single node.

Scale:

Session table shard by owner id
User table shard by user id
Friend table shard by from_id

They will also need another cache layer to speed up read operations, key space can be partite by consistent hashing.

Support friend distance query, like LinkedIn's 1st/2nd/3rd connection, we will need another cache service, the cache layer will start all user's 2nd connections, it is computed on the fly and only stored in the cache, if somehow we lose it we just recompute it by send request to Friend Service. For user and list of users, if they are:

  • 1st, we just check the Friend Service
  • 2nd, we check the 2nd connections stored in cache, if there is not, we recalculate it
  • 3rd, for user A and user B if they are not either 1st or 2nd connection, we get all A's 2nd connections and B's 1st connection and check if there is common subset
For more details, please check this article.

No comments:

Post a Comment