開源日報每天推薦一個 GitHub 優質開源項目和一篇精選英文科技或編程文章原文,堅持閱讀《開源日報》,保持每日學習的好習慣。
2024年2月14日,開源日報第1105期:
今日推薦開源項目:《RSSHub》
今日推薦英文原文:《Why YouTube Never Runs Out of Storage? It's NOT just CLOUD!》
開源項目
今日推薦開源項目:《RSSHub》傳送門:項目鏈接
推薦理由:RSSHub是一個開源、易於使用和可擴展的RSS訂閱生成器。它能夠從幾乎所有來源生成RSS訂閱,同時提供了數百萬個內容,這些內容來自各種來源,社區也非常有活力
鏈接直達🔗:docs.rsshub.app
英文原文
今日推薦英文原文:Why YouTube Never Runs Out of Storage? It's NOT just CLOUD!
推薦理由: YouTube 似乎擁有無限存儲容量的原因,作者提到了幾個可能的因素,比如數據壓縮、分層存儲、內容生命周期管理以及全球網路和內容複製。還提到了一些新興技術,如 DNA 存儲,這些技術可能在未來為 YouTube 提供更密集的存儲方式
Why YouTube Never Runs Out of Storage? It's NOT just CLOUD!
Have you ever wondered, despite all these years and an absolutely insane amount of video data being generated. Why YouTube haven't run out of space? Especially with hits like these:
This is insane, right? Imagine a platform bursting with millions of videos, yet never facing a space crunch. And even if you try to counter it with cloud computing, at the end of the day, it's just physical hardware or hard disks sitting somewhere in a data center in the name of the cloud 🙂
From Petabytes to Exabytes:
YouTube operates at an unprecedented scale, storing petabytes and exabytes of video content to cater to its vast user base. To put this into perspective, a single petabyte is equivalent to one million gigabytes, while an exabyte is one billion gigabytes. Managing such immense volumes of data is insane🤯.
So, the question arises:
- What's the limit ?🤔
- How do they never lose anything?
- How can any data be accessed instantly for anywhere in the world?
Let's delve into a deeper, more fascinating story behind YouTube's seemingly infinite storage capabilities.
And, don't worry, I'm not gonna fool you into cloud computing XD
Beyond the Cloud
Well, it does make sense when the maximum quality used to be 720p, but now most videos need to be stored in 4K. They must have developed some special compression algorithms or methods to minimize the size.
If they were to rely solely on cloud storage, it would require enormous space and be costly, regardless of the company's size, especially considering that anyone can upload vast amounts of data for free.
First take: Compression Magic
The only reasonable explanation includes data compression or some compression algorithm. Videos are compressed before storage using cutting-edge codecs, like VP9, H.264, H.265 (HEVC) and AV1. This reduces file size by up to 50%, significantly stretching storage capacity without compromising quality.
However, this must be done in a way that does not compromise quality at all. Nonetheless, with general compression, no matter how effective it is, there is still minimal loss during compression to maintain performance and speed.
This does sound like a Pied Piper's revolutionary compression algorithm from series "Silicon Valley" XD
In addition, YouTube utilizes advanced transcoding and optimization techniques to encode uploaded videos into multiple formats and resolutions, catering to various devices and network conditions. Adaptive bitrate streaming further enhances the user experience by dynamically adjusting video quality based on available bandwidth and device capabilities.
Second take: Storage Tiers
Tiered Storage is one of the main factors as videos aren't stored in a monolithic cloud. YouTube employs a tiered system, where frequently accessed content resides in high-performance, readily accessible storage (think lightning-fast SSDs), while less-viewed videos migrate to colder, more cost-effective tiers (like hard drives). This optimizes latency, performance and storage costs.
Third take: Content Lifecycle Management
-
Content Assessment: YouTube constantly analyzes videos to understand their popularity and engagement. Videos with low viewership or engagement are flagged for archival or removal, freeing up space for fresh content.
(But still there are tons of inactive accounts with all their old videos) -
Partner Programs: YouTube offers monetization options for creators. Videos enrolled in such programs are typically retained longer due to their potential revenue generation.
Technology Advancements:
-
Emerging Technologies: YouTube actively explores cutting-edge technologies like DNA storage, which offers exponentially denser storage compared to traditional methods. While still in its early stages, it holds vast potential for the future.
-
Moore's Law: Storage capacity consistently increases, driven by advancements in hardware technology. This allows YouTube to accommodate growing video libraries while maintaining cost-effectiveness.
What about availability?
Well If you talk about just the availability of this huge data, then it is because of:
-
Global Network: YouTube's storage infrastructure isn't confined to a single location. It's distributed across data centers worldwide, ensuring redundancy and resilience. If one data center experiences an outage, others can seamlessly take over, preventing service interruptions.
-
Content Replication: Popular content is replicated across different data centers. This ensures it's readily available to viewers near them, minimizing latency and buffering issues.
What's the available information?
Google uses Google File System (GFS) and BigTable to manage the large amount of data. They have millions of disks that are in a RAID configuration across multiple data centers. I found an answer on twitter from 'TechWelthEngine' that sounds plausible.
"At 4.3 petabytes a day, it takes just over 232 days to get to an exabyte. If we assume that they have 15 EB of storage, then that means it'll take them 9.5 years to fill it all at this pace."
But if this is true then do they have to built a new 15Eb facility every 9.5 years?
I am not really sure. May be they will just dedupe any redundant data?
And don't forget the fact that the 4.3 petabytes a day will increase over the coming years specially with a huge number of videos are being created and narrated by AI!
And if they are really just constantly upgrading their servers(which obviously they are not) then it explains why we have to watch 2 ads, then 1.5 minutes of the actual video, then 2 ads, then 3 minutes, then the process repeats 🙂
So I believe there must be a way because they can't keep building server farms forever and ever....
下載開源日報APP:https://openingsource.org/2579/
加入我們:https://openingsource.org/about/join/
關注我們:https://openingsource.org/about/love/