开源日报每天推荐一个 GitHub 优质开源项目和一篇精选英文科技或编程文章原文,坚持阅读《开源日报》,保持每日学习的好习惯。

2024年2月14日,开源日报第1105期:
今日推荐开源项目:《RSSHub》
今日推荐英文原文:《Why YouTube Never Runs Out of Storage? It's NOT just CLOUD!》


开源项目

今日推荐开源项目:《RSSHub》传送门:项目链接

推荐理由:RSSHub是一个开源、易于使用和可扩展的RSS订阅生成器。它能够从几乎所有来源生成RSS订阅,同时提供了数百万个内容,这些内容来自各种来源,社区也非常有活力

链接直达🔗:docs.rsshub.app


英文原文

今日推荐英文原文:Why YouTube Never Runs Out of Storage? It's NOT just CLOUD!

推荐理由: YouTube 似乎拥有无限存储容量的原因,作者提到了几个可能的因素,比如数据压缩、分层存储、内容生命周期管理以及全球网络和内容复制。还提到了一些新兴技术,如 DNA 存储,这些技术可能在未来为 YouTube 提供更密集的存储方式


Why YouTube Never Runs Out of Storage? It's NOT just CLOUD!

Have you ever wondered, despite all these years and an absolutely insane amount of video data being generated. Why YouTube haven't run out of space? Especially with hits like these:

This is insane, right? Imagine a platform bursting with millions of videos, yet never facing a space crunch. And even if you try to counter it with cloud computing, at the end of the day, it's just physical hardware or hard disks sitting somewhere in a data center in the name of the cloud 🙂

From Petabytes to Exabytes:

YouTube operates at an unprecedented scale, storing petabytes and exabytes of video content to cater to its vast user base. To put this into perspective, a single petabyte is equivalent to one million gigabytes, while an exabyte is one billion gigabytes. Managing such immense volumes of data is insane🤯.

So, the question arises:

  1. What's the limit ?🤔
  2. How do they never lose anything?
  3. How can any data be accessed instantly for anywhere in the world?

Let's delve into a deeper, more fascinating story behind YouTube's seemingly infinite storage capabilities.

And, don't worry, I'm not gonna fool you into cloud computing XD

Beyond the Cloud

Well, it does make sense when the maximum quality used to be 720p, but now most videos need to be stored in 4K. They must have developed some special compression algorithms or methods to minimize the size.

If they were to rely solely on cloud storage, it would require enormous space and be costly, regardless of the company's size, especially considering that anyone can upload vast amounts of data for free.

First take: Compression Magic

The only reasonable explanation includes data compression or some compression algorithm. Videos are compressed before storage using cutting-edge codecs, like VP9, H.264, H.265 (HEVC) and AV1. This reduces file size by up to 50%, significantly stretching storage capacity without compromising quality.

However, this must be done in a way that does not compromise quality at all. Nonetheless, with general compression, no matter how effective it is, there is still minimal loss during compression to maintain performance and speed.

This does sound like a Pied Piper's revolutionary compression algorithm from series "Silicon Valley" XD

In addition, YouTube utilizes advanced transcoding and optimization techniques to encode uploaded videos into multiple formats and resolutions, catering to various devices and network conditions. Adaptive bitrate streaming further enhances the user experience by dynamically adjusting video quality based on available bandwidth and device capabilities.

Second take: Storage Tiers

Tiered Storage is one of the main factors as videos aren't stored in a monolithic cloud. YouTube employs a tiered system, where frequently accessed content resides in high-performance, readily accessible storage (think lightning-fast SSDs), while less-viewed videos migrate to colder, more cost-effective tiers (like hard drives). This optimizes latency, performance and storage costs.

Third take: Content Lifecycle Management

  • Content Assessment: YouTube constantly analyzes videos to understand their popularity and engagement. Videos with low viewership or engagement are flagged for archival or removal, freeing up space for fresh content.
    (But still there are tons of inactive accounts with all their old videos)

  • Partner Programs: YouTube offers monetization options for creators. Videos enrolled in such programs are typically retained longer due to their potential revenue generation.

Technology Advancements:

  • Emerging Technologies: YouTube actively explores cutting-edge technologies like DNA storage, which offers exponentially denser storage compared to traditional methods. While still in its early stages, it holds vast potential for the future.

  • Moore's Law: Storage capacity consistently increases, driven by advancements in hardware technology. This allows YouTube to accommodate growing video libraries while maintaining cost-effectiveness.

What about availability?

Well If you talk about just the availability of this huge data, then it is because of:

  • Global Network: YouTube's storage infrastructure isn't confined to a single location. It's distributed across data centers worldwide, ensuring redundancy and resilience. If one data center experiences an outage, others can seamlessly take over, preventing service interruptions.

  • Content Replication: Popular content is replicated across different data centers. This ensures it's readily available to viewers near them, minimizing latency and buffering issues.

What's the available information?

Google uses Google File System (GFS) and BigTable to manage the large amount of data. They have millions of disks that are in a RAID configuration across multiple data centers. I found an answer on twitter from 'TechWelthEngine' that sounds plausible.

"At 4.3 petabytes a day, it takes just over 232 days to get to an exabyte. If we assume that they have 15 EB of storage, then that means it'll take them 9.5 years to fill it all at this pace."

But if this is true then do they have to built a new 15Eb facility every 9.5 years?
I am not really sure. May be they will just dedupe any redundant data?
And don't forget the fact that the 4.3 petabytes a day will increase over the coming years specially with a huge number of videos are being created and narrated by AI!

And if they are really just constantly upgrading their servers(which obviously they are not) then it explains why we have to watch 2 ads, then 1.5 minutes of the actual video, then 2 ads, then 3 minutes, then the process repeats 🙂

So I believe there must be a way because they can't keep building server farms forever and ever....


下载开源日报APP:https://openingsource.org/2579/
加入我们:https://openingsource.org/about/join/
关注我们:https://openingsource.org/about/love/