Large Language Models

Understanding Large Language Models -- A Transformative Reading List - Sebastian's whole site is very worth reading, start with this survey of LLM posts and literature
A Primer on Neural Network Models for Natural Language Processing - Good idea to read everything Yoav has written but this is a great start
Figures Everyone Should Know https://github.com/ray-project/llm-numbers
Transformers from Scratch - This is the one I come back to every time. https://e2eml.school/transformers.html
Illustrated Word2Vec - Jay's site is extremely good, this one is particularly good for Word2Vec https://jalammar.github.io/illustrated-word2vec/
Attention? Attention! - Deep dive into the attention mechanism. A History of NLP - Great summary of the field over the last 20 or so years. https://lilianweng.github.io/posts/2018-06-24-attention/
Dive into Deep Learning Course https://d2l.ai/index.html
https://arstechnica.com/science/2023/07/a-jargon-free-explanation-of-how-ai-large-language-models-work/
Indic-gemma-7b-Navarasa Blog, Code

https://ig.ft.com/generative-ai/

RLHF

https://towardsdatascience.com/rlhf-reinforcement-learning-from-human-feedback-faa5ff4761d1

"If we aim to match the performance of ChatGPT through open source, I believe we need to start taking training data more seriously. A substantial part of ChatGPT’s effectiveness might not come from, say, specific ML architecture, fine-tuning techniques, or frameworks. But more likely, it’s from the breadth, scale and quality of the instruction data.

To put it bluntly, fine-tuning large language models on mediocre instruction data is a waste of compute. Let’s take a look at what has changed in the training data and learning paradigm—how we are now formatting the training data differently and therefore learning differently than in past large-scale pre-training."

Local language - LLMS

Kannada LLAMA https://www.tensoic.com/blog/kannada-llama/
Malaysian Mistral https://github.com/mesolitica/research-paper/blob/master/malaysian-mistral.pdf
MaLLaM Malaysia Large Language Model https://github.com/mesolitica/research-paper/blob/master/mallam.pdf https://huggingface.co/mesolitica/mallam-1.1B-4096
Tamil LLAMA https://arxiv.org/abs/2311.05845 and later https://abhinand05.medium.com/breaking-language-barriers-introducing-tamil-llama-v0-2-and-its-expansion-to-telugu-and-malayalam-deb5d23e9264
Introducing Airavata: Hindi Instruction-tuned LLM https://ai4bharat.github.io/airavata/
Malayalam LLM https://github.com/VishnuPJ/MalayaLLM
AYA https://huggingface.co/CohereForAI/aya-101

Copyright

OpenAI says it’s “impossible” to create useful AI models without copyrighted material https://arstechnica.com/information-technology/2024/01/openai-says-its-impossible-to-create-useful-ai-models-without-copyrighted-material/ - Further, OpenAI writes that limiting training data to public domain books and drawings "created more than a century ago" would not provide AI systems that "meet the needs of today's citizens."
https://www.aisnakeoil.com/p/generative-ais-end-run-around-copyright We don’t think the injustice at the heart of generative AI will be redressed by the courts. Maybe changes to copyright law are necessary. Or maybe it will take other kinds of policy interventions that are outside the scope of copyright law. Either way, policymakers can’t take the easy way out.

Courses

https://github.com/mlabonne/llm-course

PreviousDialog systems, Information retrieval NextEmbedding

Last updated 1 year ago