Schedule and Readings
The course schedule may be subject to change. All deadlines are on Wednesday at 9:30 AM unless stated otherwise.
The textbooks for this course are:
SLP Speech and Language Processing by Dan Jurafsky and James H. Martin
D2L Dive into Deep Learning by Aston Zhang, Zachary C. Lipton, Mu Li, and Alexander J. Smola
Ling1 Linguistic Fundamentals for Natural Language Processing by Emily M. Bender
Ling2 Linguistic Fundamentals for Natural Language Processing II by Emily M. Bender and Alex Lascarides
Notes Course notes by Sophie
These textbooks are primarily for reference; we will not be “following” them in any sense. Additional readings will include research papers and blog posts. All readings are available online free of charge. Some of them may require you to be on campus wi-fi or VPN or to be logged into your NYU Drive account.
You do not have to do all the readings, but we will talk about most of them during class.
Week 1, Jan. 24/26
What Is Meaning?
We will introduce the concept of meaning in natural language, taking inspiration from linguists, philosophers, and data scientists. We will learn about the word2vec model of semantics and examine in what sense and to what extent it models “meaning.”
- Topics
- Lexical semantics, word embeddings, the distributional hypothesis
- Lecture
- Slides, Zoom Recording
- Lab
- Colab Notebook, Zoom Recording
- Reading
- Ling2 #18–19, #21–24, on lexical semantics
- Notes on vector semantics (skip-gram with negative sampling)
- D2L Sections 15.1, 15.5–15.7, on various word embedding models
- Mikolov et al. (2013), the original word2vec paper
- Notes on vector semantics (skip-gram with negative sampling)
Week 2, Jan. 31/Feb. 2
Basic Techniques 1: Deep Learning
We will learn how to optimize an arbitrary machine learning objective using the stochastic gradient descent algorithm and its more popular variant, Adam. We will also learn how automatic differentiation is implemented in the PyTorch software library.
- Topics
- Stochastic gradient descent, Adam, automatic differentiation, PyTorch
- Lecture
- Slides, Zoom Recording
- Lab
- Colab Notebook, Zoom Recording
- Reading
- SLP Chapter 5, on logistic regression
- D2L Sections 15.3–15.4, on training word2vec
- Notes on stochastic gradient descent and Adam
- Notes on backpropagation in PyTorch
- Olah (2015a), Calculus on Computational Graphs: Backpropagation (blog post)
- D2L Sections 15.3–15.4, on training word2vec
- Deadlines
- HW 0 Due
Week 3, Feb. 7/9
Basic Techniques 2: Neural Networks
We introduce sentiment analysis, a text classification task that requires models to “understand” language in some sense. Using sentiment analysis as a running example, we introduce the conceptual framework of neural networks. We also discuss the concept of a “word” in natural language, and how it is instantiated in NLP through tokenization.
- Topics
- Text classification, sentiment analysis, neural networks, multi-layer perceptrons, natural language morphology (morphemes and words), tokenization (traditional and byte-pair)
- Lecture
- Slides, Zoom Recording
- Lab
- Colab Notebook, Zoom Recording
- Reading
- SLP Chapter 25, on sentiment analysis
- Notes On neural networks
- Ling1 #7–16, on the concept of a “word”
- Chapter 5 of the 🤗 Course, on byte-pair tokenization
- Notes On neural networks
- Deadlines
- HW 1 Due EC 1 Due
Week 4, Feb. 14/16
Basic Techniques 3: Sequence Modeling
We introduce the simple recurrent network, long short-term memory network, and the Transformer, which are neural network architectures designed to learn embeddings for sequences of word embeddings. We also introduce the task of natural language inference, and learn about the concepts of entailment, presupposition, and implicature.
- Topics
- RNNs, LSTMs, Transformers, natural language inference, pragmatics (entailment, presupposition, implicature, Grice’s maxims)
- Lecture
- Slides, Zoom Recording
- Lab
- Colab Notebook (No Zoom recording due to technical issue)
- Reading
- Olah (2015b), Understanding LSTM Networks (blog post)
- Alammar (2018), The Illustrated Transformer (blog post)
- D2L Sections 16.1–16.2, on sentiment analysis using RNNs
- Ling2 #77–78, on entailment and presupposition
- Bowman et al. (2015), the Stanford Natural Language Inference corpus (website)
- Alammar (2018), The Illustrated Transformer (blog post)
Week 5, Feb. 21/23
Basic Techniques 4: Transfer Learning
We introduce transfer learning, a technique where large quantities of unlabeled data can be leveraged by pre-training an encoder network on a language modeling objective. A guest lecturer, NYU graduate student Jason Phang, will tell us how modern engineering techniques can allow us to do fine-tuning at scale.
- Topics
- The BERT model, pre-training, fine-tuning, parallel computation, GPUs
- Lecture
- Sophie’s Slides, Jason’s Slides, Zoom Recording
- Lab
- Slides, Zoom Recording
- Readings
- SLP Chapter 10, on Transformer language models
- SLP Chapter 11, on fine-tuning
- D2L Sections 15.8–15.10, on BERT
- D2L Sections 16.6–16.7, on fine-tuning BERT
- Devlin et al. (2019), the original BERT paper
- Ruder (2019), The State of Transfer Learning in NLP (blog post)
- SLP Chapter 11, on fine-tuning
Week 6, Feb. 28/Mar. 2
Basic Techniques 5: In-Context Adaptation
We take the idea of transfer learning a step further with in-context adaptation. In this approach, tasks are performed by large, general-purpose language models by having them auto-complete a prompt that describes the task and the input. There is no fine-tuning or other explicit training on the target task.
- Topics
- Language modeling, perplexity, the noisy channel model, in-context adaptation, reinforcement learning from human feedback, the GPT models (GPT-2, GPT-3, InstructGPT, ChatGPT)
- Lecture
- Slides, Zoom Recording
- Lab
- Slides, Zoom Recording
- Reading
- Radford et al. (2019), Better Language Models and Their Implications (blog post)
- Lowe and Leike (2022), Aligning Language Models to Follow Instructions (blog post)
- Brown et al. (2020), in-context adaptation with GPT-3 (Read sections 1, 2, and 4, and pick two subsections of section 3 to read)
- Wei et al. (2022), chain-of-thought prompting
- Press et al. (2022), self-ask prompting
- Ouyang et al. (2022), reinforcement learning from human feedback (RLHF), the fine-tuning technique for InstructGPT and ChatGPT
- Lowe and Leike (2022), Aligning Language Models to Follow Instructions (blog post)
- Deadlines
- HW 2 Due 2/27 EC 2 Due 2/27
Week 7, Mar. 7/9
Research Skills 1: Empirical Methods in NLP
As you begin to flesh out your projects, we will learn the basic scientific methodology of NLP. We will learn about the peer review and publishing process in NLP as well as the elements that make up a typical NLP research project.
- Topics
- Types of academic research, the ACL, publishing and peer review, structure of a research project, experimental methodology, high-performance computing
- Lecture
- Slides, Zoom Recording
- Lab
- Slides, Zoom Recording
- Reading
- Zaken et al. (2022), The BitFit paper that we read together in class
- Extra slides that we didn’t get to in class
- Sam Bowman’s slides from last year’s version of this course
- The ACL 2023 website, with the Call for Papers and information about what an ACL paper looks like (the EMNLP website is not available yet)
- Resnik and Lin (2010), on evaluations in NLP
- Extra slides that we didn’t get to in class
- Deadlines
- Project Mini-Proposal Due
Spring Recess, Mar. 13–19
No Class
Week 8, Mar. 21/23
The Building Blocks of Meaning
Meaning in natural language has an important property: it is compositional, meaning that the meaning of a complex expression is a combination of the meanings of its parts. In this lecture we will learn about how individual words combine with one another to form complex expressions with complex meanings.
- Topics
- Compositional semantics, natural language syntax, lambda calculus, predicate logic, logical form
- Lecture
- Slides, Zoom Recording
- Lab
- Slides, Zoom Recording
- Readings
- LING1 #44–50, on syntax
- LING2 #47, on compositional semantics
- SLP Chapter 17, on constituency parsing
- SLP Chapter 19, on logical form
- LING2 #47, on compositional semantics
Week 9, Mar. 28/30
Meaning, Knowledge, and Ontology
When we talk about the “meaning” of a word or expression, we are assuming that the expression says something about objects in the real world. Using Amazon Alexa as an example, we learn how NLU systems parse natural language utterances into logical expressions, which are interpreted according to an ontology that models the universe of objects.
- Topics
- Parsing, model-theoretic semantics, ontologies
- Lecture
- Slides, No Zoom Recording (Sorry!)
- Lab
- Slides, Zoom Recording
- Readings
- SLP Chapter 18, on dependency parsing
- Kollar et al. (2018), the Alexa Meaning Representation Language
Week 10, Apr. 4/6
Structure Without Supervision
The neural network architectures from the first part of the course do not incorporate any explicit notion of syntactic or semantic structure. But does that mean that these models do not use such structures in any way? This week we discover that many neural networks create structured representations of linguistic features during training without any explicit supervision. We will also learn various techniques that have been used to discover such features.
- Topics
- Bias, interpretability, debiasing word embeddings, diagnostic classifiers, BERTology
- Lecture
- Slides, Zoom Recording
- Lab
- Zoom Recording
- Readings
- Blodgett et al. (2020), overview of bias, including allocational and representational harms
- Bolukbasi et al. (2016), on bias in word embeddings (including hard debiasing)
- Ravfogel et al. (2020), more on bias in word embeddings
- Lin et al. (2019), an example of diagnostic classification
- Hewitt and Manning (2019), on constituency structure in BERT
- Rogers et al. (2020), more BERTology results
- Belinkov and Glass (2019), a more general overview of interpretability in NLP
- Extra slides we didn’t get to in class
- Bai et al. (2022), the Constitutional AI paper from lab
- Bolukbasi et al. (2016), on bias in word embeddings (including hard debiasing)
- Deadlines
- HW 3 Due (Extended) Full Project Proposal Due 4/7 (Extended)
Week 11, Apr. 11/13
Research Skills 2: Communication and Scientific Discourse
Communication is key if you want to participate in the scientific community of NLP. As you work on your final paper drafts, we will learn how to communicate your research findings effectively in writing and in your presentations.
- Topics
- Writing skills, final paper requirements
- Lecture
- Slides, Zoom Recording
- Lab
- Slides, Zoom Recording
- Readings
- EMNLP 2023 short paper requirements
- EMNLP 2023 paper templates
- Vaswani et al. (2017), Attention Is All You Need
- Yang et al. (2019), XLNet: Generalized Autoregressive Pretraining for Language Understanding
- Sam Bowman’s slides from last year’s version of this course
- EMNLP 2023 paper templates
Week 12, Apr. 18/20
General Knowledge and General Intelligence
Large language models like GPT-3 are now being used as databases of general knowledge and serving as general-purpose language assistants. This week we study the problem of alignment: how to control the behavior of a general-purpose language model so that it adheres to constraints not entailed by its pre-training objective.
- Topics
- Alignment, large language models, imitation learning, preference modeling, RLHF, red-teaming, debate, constitutional AI, longtermism
- Lecture
- Slides, Zoom Recording
- Lab
- No lab this week!
- Readings
- Lin et al. (2022), the TruthfulQA benchmark, which tests for imitative falsehoods
- Askell et al. (2021), on the HHH criterion
- Perez et al. (2022), on red-teaming
- Irving et al. (2018), on debate (blog post version)
- New Yorker article that narrates the origins of Effective Altruism
- Opinion pieces for and against longtermism
- Sam Bowman’s slides from last year’s version of this course
- Slides from Princeton’s course on large language models
- The NYU Alignment Research Group
- Askell et al. (2021), on the HHH criterion
Week 13, Apr. 25/27
Research Skills 3: NLP, Technology, and Society
As NLP takes on an increasingly prominent role in modern life, concerns regarding the social impact of NLP are more pertinent than ever. As you finish up your projects, we highlight the potential for natural language technology to facilitate illicit or malicious activities, reinforce prejudice and discrimination, and contribute to climate change. We learn strategies adopted by the scientific community for doing research responsibly.
- Topics
- Energy consumption, documentation debt, dual use, trust, fairness, right to an explanation, intellectual property, data statements, model cards
- Lecture
- Slides (including some we didn’t get to), Zoom Recording
- Lab
- Zoom Recording
- Readings
- Bommasani et al. (2022), Section 5, overview on ethical issues
- Bender et al. (2021), the Stochastic Parrots paper
- Strubell et al. (2019), on the environmental impact of large-scale computation
- Guide on how to write data statements
- Mitchell et al. (2019), on model cards
- Blueprint for an AI Bill of Rights, a white paper from the White House Office of Science and Technology Policy
- Pessach and Shmueli (2023), survey on algorithmic fairness
- EMNLP ethics policy and FAQ
- Stanford course on ethics in NLP
- Bender et al. (2021), the Stochastic Parrots paper
- Deadlines
- Final Paper Draft Due
Week 14, May 2/4
What Is Understanding?
We have now learned many ways to teach a model about “meaning.” But do models truly understand natural language? We conclude the course with a discussion of the notion of “understanding,” and highlight the limitations of current techniques.
- Topics
- Clever Hans models, adversarial test sets, dataset artifacts, the imitation game (Turing test), the Chinese room thought experiment, the octopus test
- Lecture
- Slides, Zoom Recording
- Readings
- Pfungst’s (1907/1911) book on his experiments with Clever Hans
- McCoy et al. (2019), on the HANS benchmark (3 heuristics for NLI)
- Gurungan et al. (2018), on dataset artifacts in SNLI and MNLI
- Turing (1950), on the imitation game
- Searle (1980), on the Chinese room
- Bender and Koller (2020), on the octopus test
- NYU Debate on grounding and understanding, featuring Yann LeCun, David Chalmers, Brenden Lake, Ellie Pavlick, Jacob Browning, and Gary Lupyan
- McCoy et al. (2019), on the HANS benchmark (3 heuristics for NLI)