Assume 100M users to support
Log in, register and edit are not frequent request, each user do 0.1 time per day, QPS = 0.1 * 100M / 86400 = 115 per day, peak QPS = 345
If we support session then it will be higher, since every time user send a request, it will send the cookie which contains the session key, and we need to make sure the key is still valid. Assume users send 100 request per day on average, QPS = 100 * 100M / 86400 = 115K, peak QPS = 345K
query will also be more often(self, friends), assume we each user do 10 operations per day, QPS = 12K, peak QPS = 36K
Services:
Session Service: to manage logging
User Service: Store user profiles, handle register, queries, etc
Friend Service: Manage Friendship
Storage:
Session Table:
Field | Type | Detail |
---|---|---|
session_key | primary key | |
owner_id | int64 | |
device | string | |
expiration_time | timestamp |
There are some minor things we can change here, if we want to support login from multiple devices, every time when user logs in, we can create entry for each device. Otherwise, we delete the previous existing one and insert new entry.
Both SQL and NoSQL is ok, since it is more like a key-value pattern, we can use NoSql here, since the session_key will be queried frequently, we also need to cache it.
User Table:
Field | Type | Detail |
---|---|---|
user_id | primary key | |
username | string | |
pswd_hash | string | |
other_information | other information, we won't list all of them here | |
create_at | timestamp |
SQL database, we may need to index by user_id, username and create_at. To prevent rainbow table attack, we can add salt when hashing
Friend table depends on how we actually implement the friendship, if it is a one-way, then it is pretty simple <from_user_id, to_user_id> pair.
If it is two way, then we have different ways to store them, in a SQL table, we can store one copy of each relationship like this:
Field | Type | Detail |
---|---|---|
smaller_id | int64 | |
larger_id | int64 | |
create_at | timestamp |
And we can query like SELECT * FROM friend_table WHERE smaller_id = id OR larger_id = id.
Or we can store two copies so it is easy for query, SELECT * FROM friend_table WHERE from_id = id.
Field | Type | Detail |
---|---|---|
from_id | int64 | |
to_id | int64 | |
create_at | timestamp |
If we choose Nosql database, we will also need store as the second way since we can't do multi-index.
We choose the second way since it will be better for sharding, we can shard based on from_user_id, so all queries for single user will go to a single node.
Scale:
Session table shard by owner id
User table shard by user id
Friend table shard by from_id
They will also need another cache layer to speed up read operations, key space can be partite by consistent hashing.
Support friend distance query, like LinkedIn's 1st/2nd/3rd connection, we will need another cache service, the cache layer will start all user's 2nd connections, it is computed on the fly and only stored in the cache, if somehow we lose it we just recompute it by send request to Friend Service. For user and list of users, if they are:
- 1st, we just check the Friend Service
- 2nd, we check the 2nd connections stored in cache, if there is not, we recalculate it
- 3rd, for user A and user B if they are not either 1st or 2nd connection, we get all A's 2nd connections and B's 1st connection and check if there is common subset
For more details, please check this article.
No comments:
Post a Comment