开源日报 每天推荐一个 GitHub 优质开源项目和一篇精选英文科技或编程文章原文,坚持阅读《开源日报》,保持每日学习的好习惯。
今日推荐开源项目:《小插件 notifications-preview-github》
今日推荐英文原文:《The Future of Data Science, Data Engineering, and Tech》
开源日报第981期:《小插件 notifications-preview-github》
今日推荐开源项目:《小插件 notifications-preview-github》传送门:项目链接
推荐理由:一个网页扩展程序,能够预览 github 网页右上角的 notifications 消息,而不用跳转。
今日推荐英文原文:《The Future of Data Science, Data Engineering, and Tech》作者:SeattleDataGuy

The Future of Data Science, Data Engineering, and Tech

6 experts’ views on tech in 2021

As 2020 comes to a close, we wanted to take a moment to reflect on all the changes in technology as well as look to see where things are going.

Whether you are looking at startups and their IPOs, improvements in technology, or you paid attention to Amazon re:Invent, we saw a year filled with companies continuing to try to push boundaries.

A personal favorite announcement from 2020 was AWS’s SageMaker Data Wrangler that is designed to speed up data preparation for machine learning and AI applications. This seems like a great move towards having more fluid machine learning pipelines that will hopefully further make machine learning more accessible to companies not focused on tech.

But 2020 is ending, so we asked people from various parts of the tech world to provide their insights into what they were looking forward to in 2021 — whether that be new startups, technologies, or best practices.

Let’s see what they had to say.

1. Sam Cannon, Facebook, Data Scientist

I feel like natural language processing (NLP) is currently moving at an unfathomable pace, which is simultaneously exciting and frustrating. Once I have established a decent pipeline for text classification or distributed word representation clustering, a new model comes out that outperforms what I was using yesterday.

That being said, I am super excited about the direction that NLP is taking — particularly with respect to open source solutions for complicated NLP tasks. One of my favorite companies in this space, and my personal barometer of open source, state-of-the-art NLP, is Hugging Face. Hugging Face is following a creed of “solv[ing] NLP” through democratizing complex NLP models and tasks that would normally be impossible for many individuals to utilize due to the lack of computational power or expertise.

They already offer simple sentiment analysis solutions that require minimal user input. Building on that, I think 2021 will usher in a wave of pre-packaged SOTA NLP models that can be used with one line of code. While it is impossible to forecast what will truly be accomplished in this space in 2021, I believe that, at the very least, out-of-the-box NLP models will allow more people to gain insights from their natural language data than ever before — and that is what I am most looking forward to in our field in 2021.

2. Catherine Tao, The Data Standard, Data Scientist

I am excited to see how cloud computing will be innovated in the tech space for 2021. As of now, the cloud is a space for a company’s data to be stored. There have been some challenges with this, such as scalability, efficiency, data streams, and more.

I want to see how cloud computing can be improved in order to balance some of these major issues that tech businesses are facing. Many companies are struggling with how to bring AI into their businesses, and this results in certain companies falling behind in the tech industry. By innovating cloud computing, more companies should be able to implement artificial intelligence at their companies and deploy projects/products at a more productive level.

3. Riley Kinser, Terrain, Head of Product

Looking into 2021 (hopefully a much brighter year for commercial real estate!), my primary focus is to become an expert on both new and established tools for mapping geographic data. One of my primary roadmap objectives is to translate our insights at Terrain into maps that are easy to interpret for our end-users.

A lot of the examples in the industry today are done using ArcGIS, which is an older but well-established tool for mapping data. I believe better tools may be out there, which presents an opportunity to provide our clients with a new take on an old classic. Two of the tools I’m interested in exploring are open source projects developed by Uber: H3 and kepler.gl. One of the main advantages to H3 that I see is the ability to subdivide the world into hexagons of varying sizes depending on zoom.

This solves one of the early problems we identified, which is that different users like to take different perspectives on the boundaries of neighborhoods, submarkets, or cities within a metro area. This also better enables us to develop maps internationally where data around boundaries can be harder to obtain.

kepler.gl, on the other hand, is interesting because of its relative ease to develop and host online for end-users or for an MVP. Uber developed Kepler.gl to allow users to internally (technical and non-technical) and quickly develop maps that could be shared for visualizing ideas from geospatial data. One of the other fun things that kepler.gl supports is the ability to easily visualize geographic data over a time series. I expect to begin with kepler.gl for our MVP and then explore H3 as we begin to collect user feedback.

4. Chris Zeoli, Base10 Partners, Principal

While there are a number of trends I am very excited about, eCommerce (particularly the rise of Shopify and the associated tools with it) and telemedicine are two of the areas I am most excited about. I’ve written about the Shopify ecosystem and the company continues to reach new heights, powering over $100B of GMV for over 2 million merchants.

I am particularly excited about its new partnerships with the likes of Facebook/Instagram, TikTok, Alipay, Affirm, and Pinterest, as Shopify becomes the underlying infrastructure for commerce across the major networks where consumers are. Its software as well as third-party ecosystem have been incredibly exciting to watch flourish. It’s been interesting to see traditional areas of eCommerce continue to grow (apparel and fashion, CPG products, health and wellness, etc.) while seeing newer categories like food/ grocery to auto come online through platforms like Shopify.

I’m also very excited by telehealth and new digital healthcare experiences. It’s clear with COVID that healthcare is front and center in terms of what is “essential” for our economy. At 20% of GDP (and growing), the category has had few breakout outcomes and no FAANG-scale companies yet. I would imagine that in five years, there will be at least one major player (and also expect to see Apple, Google, and Amazon continue to push into healthcare). 2020 was a big year for telehealth, with Teladoc acquiring Livongo and creating the most formidable brand in digital health yet at $30B+ combined enterprise value and over $1.5B of ARR growing >100%.

I’m excited to see a whole new wave of digital health experiences that address the most essential human need in taking care of ourselves.

5. Jun Kim, Facebook, Data Engineer

The upcoming 2021 technology that excites me the most is the long-anticipated Apache Airflow 2.0 release. Ever since its initial release in 2015, Apache Airflow has been one of the most popular workflow management systems—if not the most popular — in data engineering.

Its great success can be attributed to the fact that it allows workflows to be written as code, simple but yet effective GUI, and its general flexibility in structuring data pipelines. With the new 2.0 release, everyone’s favorite workflow management system will get even better. Airflow 2.0 will have many impressive added features, including fully supported and comprehensive REST API, TaskFlow API, and Task Groups. And it also offers many improvements that include simplified Kubernetes Executor, scheduler with low latency, and even more intuitive GUI.

I am excited to try out the new and improved Airflow.

6. Michael Mirandi, Saturn Cloud.io, Head of Strategy

There are several technology trends that I’m excited to watch in 2021, but none more than the growing popularity of GPU computing in data science and machine learning. The shift is driven by performance first, as well as ease of use made possible through open source project RAPIDS. If you aren’t familiar with it, RAPIDS enables users to execute Python code on NVIDIA hardware (disclaimer: NVIDIA sponsors the project).

The team released the results of the industry-standard Big Data Analytics Benchmarks earlier this year, where they outperformed by nearly 20x! It’s also interesting that these benchmarks demonstrate not only the power of GPU computing for data science workloads but also its ability to accelerate traditional data engineering ETL jobs. Will this lead to even wider adoption of Python? I’d be willing to bet on it, especially as a new crop of data science startups have recently released distributed GPU computing platforms — that is the ability to spin up a cluster of GPUs in the cloud for unprecedented speed.

Tech in 2021 and Beyond

There is a lot to look forward to in 2021, whether it be pre-packaged SOTA NLP models that can be used with one line of code, natural language queries, or improvements in frameworks like Airflow.

Small and large technology companies seemed to have continued on, even with all the Zoom fatigue.

We hope that 2021 will not only lead to technology improvements but will also be a year where we progress in areas that lifts everyone’s boats.

Thanks for reading and good luck in the new year!