Large Language Models

  • Understanding Large Language Models -- A Transformative Reading List - Sebastian's whole site is very worth reading, start with this survey of LLM posts and literature

  • A Primer on Neural Network Models for Natural Language Processing - Good idea to read everything Yoav has written but this is a great start

  • Figures Everyone Should Know https://github.com/ray-project/llm-numbers

  • Transformers from Scratch - This is the one I come back to every time. https://e2eml.school/transformers.html

  • Illustrated Word2Vec - Jay's site is extremely good, this one is particularly good for Word2Vec https://jalammar.github.io/illustrated-word2vec/

  • Attention? Attention! - Deep dive into the attention mechanism. A History of NLP - Great summary of the field over the last 20 or so years. https://lilianweng.github.io/posts/2018-06-24-attention/

  • Dive into Deep Learning Course https://d2l.ai/index.html

  • https://arstechnica.com/science/2023/07/a-jargon-free-explanation-of-how-ai-large-language-models-work/

  • Indic-gemma-7b-Navarasa Blog, Code

https://ig.ft.com/generative-ai/

RLHF

https://towardsdatascience.com/rlhf-reinforcement-learning-from-human-feedback-faa5ff4761d1

"If we aim to match the performance of ChatGPT through open source, I believe we need to start taking training data more seriously. A substantial part of ChatGPT’s effectiveness might not come from, say, specific ML architecture, fine-tuning techniques, or frameworks. But more likely, it’s from the breadth, scale and quality of the instruction data.

To put it bluntly, fine-tuning large language models on mediocre instruction data is a waste of compute. Let’s take a look at what has changed in the training data and learning paradigm—how we are now formatting the training data differently and therefore learning differently than in past large-scale pre-training."

Local language - LLMS

  • Kannada LLAMA https://www.tensoic.com/blog/kannada-llama/

  • Malaysian Mistral https://github.com/mesolitica/research-paper/blob/master/malaysian-mistral.pdf

  • MaLLaM Malaysia Large Language Model https://github.com/mesolitica/research-paper/blob/master/mallam.pdf https://huggingface.co/mesolitica/mallam-1.1B-4096

  • Tamil LLAMA https://arxiv.org/abs/2311.05845 and later https://abhinand05.medium.com/breaking-language-barriers-introducing-tamil-llama-v0-2-and-its-expansion-to-telugu-and-malayalam-deb5d23e9264

  • Introducing Airavata: Hindi Instruction-tuned LLM https://ai4bharat.github.io/airavata/

  • Malayalam LLM https://github.com/VishnuPJ/MalayaLLM

  • AYA https://huggingface.co/CohereForAI/aya-101

  • OpenAI says it’s “impossible” to create useful AI models without copyrighted material https://arstechnica.com/information-technology/2024/01/openai-says-its-impossible-to-create-useful-ai-models-without-copyrighted-material/ - Further, OpenAI writes that limiting training data to public domain books and drawings "created more than a century ago" would not provide AI systems that "meet the needs of today's citizens."

  • https://www.aisnakeoil.com/p/generative-ais-end-run-around-copyright We don’t think the injustice at the heart of generative AI will be redressed by the courts. Maybe changes to copyright law are necessary. Or maybe it will take other kinds of policy interventions that are outside the scope of copyright law. Either way, policymakers can’t take the easy way out.

Courses

  • https://github.com/mlabonne/llm-course

Last updated