今際の国の呵呵君: [System Design]User System

Scenario: Design a user system to support login, register, query, edit and friendship management

Assume 100M users to support
Log in, register and edit are not frequent request, each user do 0.1 time per day, QPS = 0.1 * 100M / 86400 = 115 per day, peak QPS = 345

If we support session then it will be higher, since every time user send a request, it will send the cookie which contains the session key, and we need to make sure the key is still valid. Assume users send 100 request per day on average, QPS = 100 * 100M / 86400 = 115K, peak QPS = 345K

query will also be more often(self, friends), assume we each user do 10 operations per day, QPS = 12K, peak QPS = 36K

Services:

Session Service: to manage logging
User Service: Store user profiles, handle register, queries, etc
Friend Service: Manage Friendship

Storage:

Session Table:

Field	Type	Detail
session_key	primary key
owner_id	int64
device	string
expiration_time	timestamp

There are some minor things we can change here, if we want to support login from multiple devices, every time when user logs in, we can create entry for each device. Otherwise, we delete the previous existing one and insert new entry.
Both SQL and NoSQL is ok, since it is more like a key-value pattern, we can use NoSql here, since the session_key will be queried frequently, we also need to cache it.

User Table:

Field	Type	Detail
user_id	primary key
username	string
pswd_hash	string
other_information		other information, we won't list all of them here
create_at	timestamp

SQL database, we may need to index by user_id, username and create_at. To prevent rainbow table attack, we can add salt when hashing

Friend table depends on how we actually implement the friendship, if it is a one-way, then it is pretty simple <from_user_id, to_user_id> pair.
If it is two way, then we have different ways to store them, in a SQL table, we can store one copy of each relationship like this:

Field	Type	Detail
smaller_id	int64
larger_id	int64
create_at	timestamp

And we can query like SELECT * FROM friend_table WHERE smaller_id = id OR larger_id = id.
Or we can store two copies so it is easy for query, SELECT * FROM friend_table WHERE from_id = id.

Field	Type	Detail
from_id	int64
to_id	int64
create_at	timestamp

If we choose Nosql database, we will also need store as the second way since we can't do multi-index.
We choose the second way since it will be better for sharding, we can shard based on from_user_id, so all queries for single user will go to a single node.

Scale:

Session table shard by owner id
User table shard by user id
Friend table shard by from_id

They will also need another cache layer to speed up read operations, key space can be partite by consistent hashing.

Support friend distance query, like LinkedIn's 1st/2nd/3rd connection, we will need another cache service, the cache layer will start all user's 2nd connections, it is computed on the fly and only stored in the cache, if somehow we lose it we just recompute it by send request to Friend Service. For user and list of users, if they are:

1st, we just check the Friend Service
2nd, we check the 2nd connections stored in cache, if there is not, we recalculate it
3rd, for user A and user B if they are not either 1st or 2nd connection, we get all A's 2nd connections and B's 1st connection and check if there is common subset

For more details, please check this article.

今際の国の呵呵君

Tuesday, January 1, 2019

[System Design]User System

No comments:

Post a Comment