213 - Are Transformer Models Aligned By Default?

The Bayesian Conspiracy

213 - Are Transformer Models Aligned By Default?

1×

0:00

-1:59:26

213 - Are Transformer Models Aligned By Default?

May 28, 2024

Our species has begun to scrute the inscrutable shoggoth! With Matt Freeman :)

LINKS
Anthropic's latest AI Safety research paper, on interpretability
Anthropic is hiring
Episode 93 of The Mind Killer
Talkin' Fallout
VibeCamp

0:00:17 – A Layman's AI Refresher
0:21:06 – Aligned By Default
0:50:56 – Highlights from Anthropic's Latest Interpretability Paper
1:26:47 – Guild of the Rose Update
1:29:40 – Going to VibeCamp
1:37:05 – Feedback
1:43:58 – Less Wrong Posts
1:57:30 – Thank the Patron

Our Patreon, or if you prefer Our SubStack

Hey look, we have a discord! What could possibly go wrong?
We now partner with The Guild of the Rose , check them out.

Rationality: From AI to Zombies, The Podcast

LessWrong Sequence Posts Discussed in this Episode:

If You Demand Magic, Magic Won’t Help

The Beauty of Settled Science

Next Sequence Posts:

Is Humanism A Religion-Substitute?

0 Comments

The Bayesian Conspiracy

A conversational podcast about things of interest to rationalists

Listen on

Substack App

RSS Feed

Recent Episodes

215 – Rationalism and Religion, with Brandon Hendrickson

Jun 25

214 bonus - Genetic Cycling

Jun 11

214 - Is Death Good? Simone and Malcolm Collins return!

Jun 11

Extra Feedback from 213

May 28

Bonus 213 chat with Matt

May 28

212 - Feedback Extravaganza

May 14

Bayes Blast 30 – Less.Online

May 4

#nojs-banner { position: fixed; bottom: 0; left: 0; padding: 16px 16px 16px 32px; width: 100%; box-sizing: border-box; background: red; color: white; font-family: -apple-system, "Segoe UI", Roboto, Helvetica, Arial, sans-serif, "Apple Color Emoji", "Segoe UI Emoji", "Segoe UI Symbol"; font-size: 13px; line-height: 13px; } #nojs-banner a { color: inherit; text-decoration: underline; } This site requires JavaScript to run correctly. Please turn on JavaScript or unblock scripts