开源日报每天推荐一个 GitHub 优质开源项目和一篇精选英文科技或编程文章原文，坚持阅读《开源日报》，保持每日学习的好习惯。
今日推荐开源项目：《富文本 RichText》
今日推荐英文原文：《Unchecked AI Can Mirror Human Behavior》

今日推荐开源项目：《富文本 RichText》传送门：项目链接
推荐理由：在 Markdown 编辑器火起来之前，富文本编辑器一直统治着整个互联网。该项目为 Android 平台下的富文本解析器，同时支持 Html 和 Markdown 格式文本。

今日推荐英文原文：《Unchecked AI Can Mirror Human Behavior》作者：The Unlikely Techie
原文链接：https://medium.com/better-programming/unchecked-ai-can-mirror-human-behavior-2ce1ce76f914
推荐理由：我们是什么样的用户，决定了我们会训练出什么样的 AI 。

Unchecked AI Can Mirror Human Behavior

How can we try our best to mitigate bias?

There is an interminable interest in artificial intelligence (AI). According to the AI Index 2019 Annual Report published by the University of Stanford, the volume of peer-reviewed AI papers has grown by more than 300% between 1998 and 2018. In over 3,600 global news articles on ethics and AI identified by the Human-Centered AI Institute at Stanford between mid-2018 and mid-2019, topics such as possible frameworks and guidelines on the ethical use of AI, use of face recognition applications, data privacy, the role of big tech, and algorithm bias dominated.

This highlights the importance of understanding how bias can slip into data sets and raise awareness when working towards mitigating bias. AI strikes humanity where it hurts most: It uncovers how preconceived notions affect the outcome of well-intentioned applications. While there has never been more data available to make qualified decisions, it is not guaranteed that these decisions can be put to use successfully. There have been multifold fails (this list is not exhaustive):

In 2015, Google was faced with another controversy when its photo service labeled pictures of Black people as gorillas. How did Google fix this problem? It didn’t really. Instead, it blocked all images tagged as “gorillas,” according to a Wired report in 2018. In 2016, after not even 24 hours of operation, Microsoft’s AI chatbot was shut down because Twitter users had trained it to become an insulting nazi-lover. In 2019, researchers in Brazil discovered that “searching Google for pictures of ‘beautiful woman’ was far more likely to return images of white people than Black and Asian people, and searching for pictures of ‘ugly woman’ was more likely to return images of Black and Asian people than white people.”

Given these incidents, we have to ask ourselves how we can try our best to mitigate bias. We must be aware that there are several sources of bias concerning data sets, but how we handle this is a human decision. It can become dangerous very quickly if we train models based on human misjudgment.

The 5 Most Common Types of Bias

If we approach the topic from a statistical point of view, there are five ways in which bias can creep into the results.

Confirmation bias

Confirmation bias is the inclination to look for, decipher, favor, and review data that affirms or bolsters one’s earlier individual convictions or values. Therefore, confirmation bias is a powerful type of cognitive bias with a critical impact on society’s correct workings by misshaping evidence-based decision-making.

An example of this is when you remember information selectively or make a biased interpretation of information given to you. Studies showed that we could even be manipulated to remember fake childhood memories. This indicates that people sometimes don’t even notice when they analyze data in a biased way (another psychological phenomenon that fits this category is wishful thinking).

Selection bias

Selection bias is the bias introduced by selecting individuals, groups, or data for analysis that does not achieve proper randomization, thereby ensuring that the sample obtained is not representative of the population to be analyzed. The term “selection bias” usually refers to a statistical analysis’s bias resulting from the sampling method. Therefore, it is essential to consider selection bias. Some conclusions of the study may be wrong.

Outliers

An outlier is an extreme data value. For example, a 110-year-old customer or a consumer with $10 million in their savings account. You can identify outliers by carefully inspecting the data, especially when distributing the values. Since outliners are extreme data values, it can be dangerous to decide based on the calculated “average.” In other words, extreme behavior can have a significant impact on what is considered average. It is imperative to base your conclusions on the median (the average value) to have an accurate result.

Overfitting and underfitting

Underfitting implies that a model gives an oversimplistic picture of reality. Overfitting is the inverse (i.e. an overcomplicated picture). Overfitting risks causing a particular assumption to be treated as the truth, whereas it is not the case in practice.

How can this bias be counteracted? The most straightforward approach is to ask how the model was validated. If you receive a somewhat glazed expression as a reaction, there is a good chance that the analysis outcomes are so-called unvalidated outcomes and, therefore, might not apply to the whole database. Always ask the data analyst whether they have done a training or test sample. If the answer is no, it is highly likely that the analysis outcomes will not be applicable to all customers.

Confounding variables

Basically, this happens when additional factors influence variables you have not accounted for. In an experiment, the independent variable usually affects your dependent variable. For example, if you want to investigate whether the need to exercise leads to weight loss, the need to work out is your independent variable and the weight loss is your dependent variable.

Disturbing factors are all other factors that also influence your dependent variable. They are additional factors that have a hidden influence on your dependent variable. Aggravating factors can cause two main problems: increased variance and the introduction of bias.

It is essential to confirm that the conclusion drawn from research and analysis results is not affected by distortions. Uncovering biased results is not the sole responsibility of the analyst concerned. It is the joint responsibility of all those directly involved (including the market participant and the analyst) to reach a valid conclusion based on the correct data.

There Is No AI Without Humans

So when we deal with prejudices, wrong application possibilities, and erroneous results, we always have to ask ourselves how they were created. Social norms, fears, and social shifts existed far before any AI calculations and technological advances. The vast majority of AI applications were made without evil intentions. But it is also clear that many applications have effects that are harmful to society.

It’s precisely for this reason that it is absolutely essential to examine AI not only as a matter of programming language but as a concept with all its intricacies and its significant impact on society as a whole.

AI without humans is impossible. Therefore, other scientific disciplines such as psychology, history, philosophy, ethics, sociology, political science, health, and neuroscience are essential to mitigate bias. Mitigating bias is not only a data science imperative but also valid for other disciplines.

Many publications in this area make me hopeful that there is and will continue to be a broad discourse on this topic. It should not be about pointing the finger at people but about learning from mistakes made, eliminating possible sources of error, and then developing AI applications for the future together that are beneficial for all of us.

下载开源日报APP：https://openingsource.org/2579/
加入我们：https://openingsource.org/about/join/
关注我们：https://openingsource.org/about/love/

开源日报第923期：《富文本 RichText》