LLM CLASSIC PAPERS

Paper Roadmap

大模型经典论文路线图

Human Preferences
Reward Learning
Reinforcement Learning

RLHF

Self-Attention
Encoder-Decoder
Parallelization

Transformer

Generative Pre-training
Decoder-only
Fine-tuning

GPT-1

Masked LM
Bidirectional
Pre-training

BERT

Zero-shot
WebText
Scaling

GPT-2

Text-to-Text
Transfer Learning
Unified Framework

T5

175B Parameters
Few-shot Learning
In-context Learning

GPT-3

Vision Transformer
Patch Embedding
Pure Transformer

ViT

Vision-Language
Patch-only
Efficient Pre-training

ViLT

Contrastive Learning
Zero-shot Vision
Natural Language Supervision

CLIP

Text-to-Image
Autoregressive
Zero-shot Generation

DALL·E 1

Code Generation
GitHub Copilot
HumanEval

CodeX

Competitive Programming
Code Reasoning
Codeforces

AlphaCode

Speech Recognition
Weak Supervision
Multilingual

Whisper

Compute-Optimal Scaling
Data-Centric Training

LLaMA-1

Visual Instruction Tuning
GPT-4 Data
Multimodal Chat

LLaVA

RLHF
GQA
长上下文
高效推理

LLaMA-2

Vision-Language
OCR
Localization

Qwen-VL

GQA
NTK-aware
LogN-Scaling

Qwen-1

Large Vision Model
Visual Sentence
Autoregressive Vision

LVM

Open Model
Practical Size
Google

Gemma 1

Chinese LLM
Tool Use
Long Context

ChatGLM

405B Parameters
Multilingual
Open Weights

Llama 3

Distillation
Local-Global Attention
Practical LLM

Gemma 2

Multimodal
128K Context
Edge Deployment

Gemma 3