Living in an age where big data has become an asset (also refereed to an organization’s unrefined gold) to organization and individuals. Data science has been a hot topic amidst organization’s with the aim of collecting meaningful data to enhance business growth. Not until 2010, organization’s focus was building an infrastructure that can process, store and access data to make sense of consumer data, analyzing them and making decisions based on these data and also to gain business insight.

生活在一个大数据已成为组织和个人资产(也称为组织未精制的黄金)的时代。 数据科学一直是组织中的热门话题,其目的是收集有意义的数据以促进业务增长。 直到2010年,组织的重点才是建立可以处理,存储和访问数据的基础结构,以理解消费者数据,对其进行分析并根据这些数据做出决策并获得业务洞察力。

Due to the great impact data can have on an organization, and considering the rapid advancement in technology, companies processing consumer data has leveraged the use of enhanced software such as Hadoop framework, Business Intelligence software and the use of Artificial Neural Networks and Machine Learning Algorithms to process and understand data. To be able to use these software efficiently, organization needs to employ a data scientist who has a solid understanding on how to analyze data with these software and gain maximum insight to make algorithmic based decision.

由于数据可能对组织产生巨大影响,并且考虑到技术的飞速发展,处理消费者数据的公司已经利用了诸如Hadoop框架,商业智能软件等增强型软件以及人工神经网络和机器学习算法的使用。处理和理解数据。 为了能够有效地使用这些软件,组织需要聘请一位数据科学家,他对如何使用这些软件分析数据具有​​深刻的了解,并获得最大的洞察力,以做出基于算法的决策。

When analyzing these data, data scientist are faced with some ethical challenges such as data collection bias, algorithmic bias, explain-ability results, privacy and so on. In order to formulate and produce morally good solutions: which involves parsing various steps such as generation, creating, processing, discrimination and algorithm. To build such an ethical system, there are four guidelines to consider :

在分析这些数据时,数据科学家面临着一些道德挑战,例如数据收集偏差,算法偏差,可解释性结果,隐私等。 为了制定和产生道德上好的解决方案:涉及解析各个步骤,例如生成,创建,处理,判别和算法。 要建立这样的道德体系,有四个要考虑的准则:

  1. Do good做得好
  2. Minimize harm减少伤害
  3. Just and fair公正公平
  4. Respect privacy尊重隐私

** The word Data Scientist and Data Practitioner are used interchangeably.


谁是数据科学家? (Who is a Data Scientist?)

Before we go down the road of who a data scientist is, lets consider what data science is. Data science is a vast field with a blend of mathematics, statistics, programming, computer science and so on. It brings in scientific method, process and algorithms to extract insight from both structured and unstructured data. The term could be traced back to 1974 when Peter Naur proposed it as an alternative name for Computer Science. However, the professional term “Data Science” has been attributed to Dj Patil and Jeff Hammrbocher. Till date, there is still no consensus among scientist on the definition of data science and some still consider it a “buzzword”.

在我们走上数据科学家的身份之路之前,让我们考虑一下数据科学是什么。 数据科学是一个广阔的领域,融合了数学,统计学,编程,计算机科学等。 它引入了科学的方法,过程和算法,以从结构化和非结构化数据中提取见解。 该术语可以追溯到1974年,当时Peter Naur提出将其作为计算机科学的替代名称。 但是,专业术语“数据科学”已归因于Dj Patil和Jeff Hammrbocher。 直到现在,科学家之间在数据科学的定义上仍未达成共识,有人仍将其视为“流行语”。

On the other hand, a data scientist is someone who harness and process huge volume of data to generate, extract insight, interpret data effectively and capable of presenting results in a non-technical term. Also, a data scientist is someone who is able to collect a large amount of data (usually consumer data collected or stored by an organization) and gain meaningful insight by working with several elements related to mathematics, statistics, computer science using analytical techniques such as Machine Learning techniques and BI software’s.

另一方面,数据科学家是利用和处理大量数据以生成,提取见解,有效解释数据并能够以非技术术语呈现结果的人。 此外,数据科学家是指能够收集大量数据(通常是组织收集或存储的消费者数据)并通过使用与分析,数学和统计学相关的若干要素进行分析,从而获得有意义的见解的人。机器学习技术和BI软件。

You might be wondering why ethics and laws in relation to the advancement of AI, well we can say because laws cannot move faster than technology and innovations, so rather than waiting for the laws to catch up with them, we work with the tiny bit of technological innovations and problems we encountering right now to mitigate future impacts. Laws and ethics are not meant to make AI feel constrained but to be more innovative and creative which will help in getting prepared for the unknown and being able to do good.

您可能想知道为什么与AI进步相关的道德和法律,好吧,我们可以说,因为法律的发展不能比技术和创新快,所以我们不等法律赶上法律,而是与之合作。我们现在遇到的技术创新和问题,以减轻未来的影响。 法律和道德规范并不意味着要让AI受到约束,而是要更具创新性和创造力,这将有助于为未知事物做好准备并能够做好事。

我们要建立道德环境有多远? (How Far are we to Building an Ethical Environment?)

If you’ve ever googled what ethics means you’d see a lot of definition pop-up but they all digest to ethics being concern of human well-being: about the well-being of others.


Data ethics on the other hand is an new branch of ethics that study’s and evaluate moral problems related to data in order to formulate and support morally good solutions. When it comes to in-questing, accessing and understanding previously unknown human/consumer behavior, data plays an important role. Because of these values data has brought a competitive marketing strategy to the work force.

另一方面,数据伦理学是伦理学的一个新分支,它研究和评估与数据有关的道德问题,以便制定和支持道德上良好的解决方案。 在询问,访问和了解以前未知的人类/消费者行为时,数据起着重要的作用。 由于这些价值观,数据为员工带来了竞争性的营销策略。

With great power comes great responsibility


However, with these great opportunities comes some ethical and moral challenges/problems faced by data practitioners when dealing with consumers’ data. Data has brought a competitive impact to the market and has enhanced the development of intelligent products and services. However, they are some ethical challenges which has posed as a threat to human privacy with the use of AI for intelligent product and services; the human privacy is very important.

但是,伴随着这些巨大的机遇,数据从业者在处理消费者数据时面临着一些道德和道德挑战/问题。 数据给市场带来了竞争影响,并促进了智能产品和服务的发展。 但是,它们是一些道德挑战,这些挑战将人工智能用于智能产品和服务对人类隐私构成威胁。 人类的隐私非常重要。

During the last few years we’ve seen various examples of data breach and the use of consumer data without consent to develop advance AI products. A popular and recent example is a tech company called Clearview AI. This company devised a groundbreaking facial recognition app that can take the picture of a person, upload it and get to see public photos of that person, along with the links to where those photos appeared. The system whose backbone is a database of more than three billion of images that Clearview claims to have scrapped from Facebook, YouTube, Venmo and millions of other websites — New York Times, Jan. 18, 2020. This software is great, it could help solve crimes such as shoplifting, identity theft, murder and child sexual exploitation cases and so on, but all these at the expense of corroding privacy.

在过去的几年中,我们看到了各种数据泄露和未经许可就开发高级AI产品使用消费者数据的示例。 最近流行的一个例子是一家名为Clearview AI的科技公司。 该公司设计了一种突破性的面部识别应用程序,可以拍摄人的照片,将其上传并查看该人的公开照片,以及指向这些照片出现位置的链接。 该系统的骨干是一个数据库,该数据库包含Clearview声称已从Facebook,YouTube,Venmo和数百万其他网站( 纽约时报,2020年1月18日)废弃的30亿张图像 该软件功能强大,可以帮助解决诸如入店行窃,身份盗窃,谋杀和儿童性剥削案件等犯罪,但所有这些都以牺牲隐私为代价。

Big tech companies such as google refrained from doing such in 2018: when the company put the kibosh on the Project Maven (awarded by the US Pentagon). After the contract expired (the company said the project was too unethical and about 12 employees left google because of the unethical project). The aim of the project was to support the advance development of human-identifying drone technology by analyzing drone footage using AI trained on billions of data sets derived through the company’s other product (Not long enough a company named Palantir took over the project). Another recent example is the Cambridge Analytica.

谷歌(Google)等大型科技公司在2018年避免这样做:公司将kibosh放在Project Maven上 (由美国五角大楼授予)。 合同到期后(该公司表示该项目太不道德,由于该项目不道德,约有12名员工离开了Google)。 该项目的目的是通过使用通过对公司其他产品衍生的数十亿数据集进行训练的AI分析无人机画面来支持人类识别无人机技术的进一步发展(时间不长,一家名为Palantir的公司接管了该项目 )。 最近的另一个例子是Cambridge Analytica 。

With all the above examples, we can see that the future is bright for AI whilst considering the ethical and moral section of these advancement.


Dr. Ewa Luger (Chancellor’s Fellow, Digital Arts and Humanities. University of Edinburgh.) said the most ethical and recurrent problem faced by a data scientist are:

Ewa Luger博士(爱丁堡大学数字艺术与人文科学大臣)说,数据科学家面临的最道德和经常性问题是:

  1. Algorithm算法
  2. Prejudice/Bias偏见/偏见
  3. Explain-ability AI (XAI)可解释性AI(XAI)
  4. Privacy隐私

数据从业者应如何看待以激发其道德操守? (What should a Data Practitioner look at to inspire him/her to work ethically?)

Has every revolution has it good and bad side, the data revolution will inflict harm in way intended or not intended to, just as the Clearview problem and so on. Not to exacerbate harm the data revolution will bring, it is important for data scientist to be ethical when handling consumer data.

每一次革命都有好与坏的一面,数据革命将以有意或无意的方式造成损害,就像Clearview问题一样。 为了不加剧数据革命将带来的损害,对于数据科学家而言,在处理消费者数据时要具有道德性很重要。

How then can we make a data scientist more ethical, what could inspire a data scientist to do ethical work or what ethical/moral laws or rules have been laid down to inspire an ethical environment?


Till date, there hasn’t been a law or rule to inspire an ethical environment for data scientist. However, to inspire an ethical environment, Ben Olsen a Sr. Content Developer at Microsoft drafted a data oat referencing the Hippocratic oat. He proposed what a modern data oat might look like:

到目前为止,还没有法律或规则可以激发数据科学家的道德环境。 但是,为了激发道德环境,Microsoft的高级内容开发人员Ben Olsen起草了引用希波克拉底燕麦的数据燕麦。 他提出了现代数据燕麦的外观:

I, a Data Practitioner will promote the well-being of others and myself while striving to do no harm with data through:


a. Professional application of analytical technique

一个。 分析技术的专业应用

b. Humility in analytical claims

b。 分析要求中的谦虚

c. Anticipation of legal and regulatory scenarios

C。 预期法律和监管情景

d. Transparency in computation and documentation

d。 计算和文档透明度

e. Fidelity to this oath beyond the bottom line.

e。 保真度超越了底线。

Other ways a data practitioner could inspire an ethical environment will be asking him/herself critical questions when handling consumer data. These questions include but not limited to:

数据从业人员可以激发道德环境的其他方式是在处理消费者数据时询问他/她自己的关键问题。 这些问题包括但不限于:

  1. Is the data bias in terms of gender, prejudice etc.?数据是否存在性别,偏见等偏见?
  2. How much relative importance should be given to the data?应该给数据多少相对重要性?
  3. Can the process of getting the result be explainable?获得结果的过程可以解释吗?
  4. Is the algorithm bias? What bases or intuition is the algorithm built on?算法有偏差吗? 该算法基于什么基础或直觉?
  5. What factors or features did my algorithm consider to get this conclusion?我的算法考虑了哪些因素或功能来得出此结论?
  6. Will my result inflect harm or do good; how much weight should be given to any?我的结果会损害健康还是行善? 应该给多少重量?
  7. What laws and regulatory should be considered when handling these data?


  8. What consumer right might I have impinge while handling the data?在处理数据时,我可能会遇到哪些消费者权利?
  9. Because I have been given consent doesn’t fully mean I shouldn’t respect privacy?因为得到了我的同意并不完全意味着我不应该尊重隐私吗?

These questions goes on and on which helps to create an ethical environment, as they say with great power comes great responsibility and being an ethical data practitioner will go a long way to paving the way for a safe and responsible social implication and integration of AI.




