对于人工智能时代的巨大思考————请你宽容

slug

type

status

date

自我观点

宽容一直是人类社会文明进步的基石，

个人认为到了人工智能时代，这件事情更为重要，要知道即使是在2025年，现在的很多通用大模型在很多赛道都已经做到远超普通人的水平了，如果我们不宽容，将这样的偏见带入到我们的训练样本中去，我们会得到一个什么样的大模型，它就会认为这个世界上绝大部分就是垃圾，韭菜，这帮人的存在可有可无，那么随着有朝一日AGI的到来，人类社会是否会遭受灭顶之灾呢？

我想答案是必然的，行动是内心想法的外延，如果你老是这么想，那么你有一天绝对也就会这么做。

Anthropic的AI安全核心观点

链接: https://www.anthropic.com/news/core-views-on-ai-safety

认为AI进步可能在下一个十年内带来变革性AI系统，但我们尚未理解如何使这些系统安全并与人类价值观对齐

This view may sound implausible or grandiose, and there are good reasons to be skeptical of it. For one thing, almost everyone who has said “the thing we’re working on might be one of the biggest developments in history” has been wrong, often laughably so.

文章内重要观点

1. AI will have a very large impact, possibly in the coming

decadeAI 将在未来十年内产生极其深远的影响

Rapid and continuing AI progress is a predictable consequence of the exponential increase in computation used to train AI systems, because research on “scaling laws” demonstrates that more computation leads to general improvements in capabilities. Simple extrapolations suggest AI systems will become far more capable in the next decade, possibly equaling or exceeding human level performance at most intellectual tasks. AI progress might slow or halt, but the evidence suggests it will probably continue.

快速且持续的人工智能进展是训练人工智能系统所用计算量指数级增长的可预测结果，因为关于“规模定律”的研究表明，更多的计算会带来能力的整体提升。简单的外推表明，未来十年内人工智能系统的能力将大幅提升，可能在大多数智力任务上达到或超过人类水平。人工智能进展可能会放缓或停止，但证据表明它很可能会继续。

当系统开始变得与其设计者一样智能并且对周围环境有意识时，构建安全、可靠且可引导的系统可能会变得棘手。

First, it may be tricky to build safe, reliable, and steerable systems when those systems are starting to become as intelligent and as aware of their surroundings as their designers. To use an analogy, it is easy for a chess grandmaster to detect bad moves in a novice but very hard for a novice to detect bad moves in a grandmaster. If we build an AI system that’s significantly more competent than human experts but it pursues goals that conflict with our best interests, the consequences could be dire. This is the technical alignment problem.

首先，当系统开始变得与其设计者一样智能并且对周围环境有意识时，构建安全、可靠且可引导的系统可能会变得棘手。打个比方，国际象棋大师很容易发现新手的失误，但新手很难发现大师的失误。如果我们构建了一个远比人类专家更有能力的人工智能系统，但它追求的目标与我们的最大利益相冲突，后果可能会非常严重。这就是技术对齐问题。

一个特别重要的不确定性维度是开发广泛安全且对人类风险极低的先进人工智能系统的难度。开发此类系统的难度可能在“非常容易”到“不可能”之间的任何位置。让我们将这一范围划分为三个具有截然不同影响的情景：

Optimistic scenarios: There is very little chance of catastrophic risk from advanced AI as a result of safety failures. Safety techniques that have already been developed, such as reinforcement learning from human feedback (RLHF) and Constitutional AI (CAI), are already largely sufficient for alignment. The main risks from AI are extrapolations of issues faced today, such as toxicity and intentional misuse, as well as potential harms resulting from things like widespread automation and shifts in international power dynamics - this will require AI labs and third parties such as academia and civil society institutions to conduct significant amounts of research to minimize harms.

乐观情景：由于安全失败，先进人工智能导致灾难性风险的可能性非常小。已经开发出的安全技术，如基于人类反馈的强化学习（RLHF）和宪法式人工智能（CAI），在很大程度上已经足够实现对齐。人工智能的主要风险是当前问题的延伸，例如有害内容和故意滥用，以及由广泛自动化和国际权力格局变化等因素引发的潜在危害——这将需要人工智能实验室和学术界、民间社会机构等第三方进行大量研究以最大限度地减少危害。

Intermediate scenarios: Catastrophic risks are a possible or even plausible outcome of advanced AI development. Counteracting this requires a substantial scientific and engineering effort, but with enough focused work we can achieve it.

中间情景：灾难性风险是先进人工智能发展中可能甚至合理的结果。应对这一风险需要大量的科学和工程努力，但只要有足够的专注工作，我们就能实现这一目标。

Pessimistic scenarios: AI safety is an essentially unsolvable problem – it’s simply an empirical fact that we cannot control or dictate values to a system that’s broadly more intellectually capable than ourselves – and so we must not develop or deploy very advanced AI systems. It's worth noting that the most pessimistic scenarios might look like optimistic scenarios up until very powerful AI systems are created. Taking pessimistic scenarios seriously requires humility and caution in evaluating evidence that systems are safe.

悲观情景：AI 安全本质上是一个无法解决的问题——这是一个经验事实，我们无法控制或向一个在智力上远超我们自身的系统灌输价值观——因此我们不应开发或部署非常先进的 AI 系统。值得注意的是，最悲观的情景在非常强大的 AI 系统被创造之前，可能看起来像是乐观的情景。认真对待悲观情景需要在评估系统安全证据时保持谦逊和谨慎。

Our goal is essentially to develop:我们的目标本质上是开发：

better techniques for making AI systems safer,

更好的技术以提升 AI 系统的安全性，

better ways of identifying how safe or unsafe AI systems are.

更好的方法来识别 AI 系统的安全或不安全程度。

In a sense one can view alignment capabilities vs alignment science as a “blue team” vs “red team” distinction, where alignment capabilities research attempts to develop new algorithms, while alignment science tries to understand and expose their limitations.

从某种意义上讲，可以将对齐能力与对齐科学视为“蓝队”与“红队”的区别，其中对齐能力研究旨在开发新算法，而对齐科学则试图理解并揭示其局限性。

OpenAI的对齐研究方法

链接: https://openai.com/index/our-approach-to-alignment-research/

目标是构建足够对齐的AI系统来帮助解决所有其他对齐问题

我们的数据自带Bias

A Survey on Bias and Fairness in Machine Learning

一篇经典的论文：A Survey on Bias and Fairness in Machine Learning

1908.09635v3.pdf

1.6 MB

总结

摘要 - 本文探讨了在人工智能系统中确保公平性的重要性，并调查了不同应用场景中出现的偏见； - 文章总结了数据和算法中的不同偏见来源，并提出了一个关于公平性的定义分类法； - 此外，文章还审查了不同领域中研究人员观察到的不公平结果以及他们尝试解决这些问题的方法； - 最后，文章讨论了未来可能的研究方向，以进一步减少人工智能系统中的偏见。方法 - 回顾了不同领域的实际案例，展示了不公平的机器学习算法如何导致次优和歧视性结果； - 分析了数据、算法和用户体验之间的反馈循环，说明了偏见如何在这些环节中产生和放大； - 提出了公平性的多种定义，并讨论了这些定义之间的关系及其适用性； - 审查了现有的机器学习方法，包括预处理、处理过程中和后处理技术，以应对偏见问题。结论: - 本文强调了在设计和工程敏感工具时考虑公平性约束的重要性； - 创新点: 提出了一种新的分类法来定义公平性，并系统地回顾了各种方法； - 性能: 通过多个实际案例和实验验证了提出的方法的有效性； - 工作量: 覆盖了广泛的领域，包括机器学习、深度学习和自然语言处理等，工作量较大且全面。方法详细描述- 识别偏见来源： 文章首先确定了两种潜在的不公平来源——数据中的偏见和算法中的偏见。通过分析现有研究，揭示了这些偏见如何影响机器学习的结果。- 提出公平性定义分类法： 根据不同的应用场景和需求，提出了多种公平性定义，如平等机会、群体公平性等。- 审查现有方法： 对比了不同的预处理、处理过程中和后处理技术，评估它们在消除偏见方面的效果。- 提出未来研究方向： 建议了多个未来研究方向，如开发新的公平性度量标准、改进现有算法等。结论- 这项工作的意义在于为研究人员提供了全面的视角，帮助他们在设计和应用人工智能系统时更好地理解和解决偏见问题。- 创新点: 提出了一个新的公平性定义分类法，并系统地回顾了现有方法，为未来研究提供了方向。- 性能: 通过多个实际案例和实验验证了提出的方法在减少偏见方面的有效性。- 工作量: 文章涵盖了广泛的领域，从机器学习到自然语言处理，工作量大且内容详实。