Grantmaking
Funding groundbreaking research in AI safety

FAR.AI Grant Program
Grants:
Using weak-to-strong generalization to explain superhuman AI systems’ decisions, focusing on domains like chess/Go where superhuman AI already exists.
Building automated testing systems for LLM alignment against both training-phase threats and testing-phase threats, with a focus on developing agent-based systems that can generate adversarial prompts.
Developing methods to make alignment more secure against jailbreaks, prefilling attacks, and finetuning attacks, with approaches spanning the entire model lifecycle.
Broad project examining robustness across four vectors: data poisoning, consistency checks, model stealing, and prompt injection.