The myth that open source is an artificial intelligence
Code LLMs for Software Developers: The Case of Meta, Copilot, AWS, AlphaCode, GitHub, and OpenAI
Code llama can be used to create strings of code when pointed to a specific code string. Meta has two different versions of Code Llama, one which can understand instructions in Natural Language and the other which can understand Python. The company does not recommend that the base Code llama or the Code llama-Python should be used in natural language instructions.
Meta said that programmers are using LLMs to assist in a variety of tasks. “The goal is to make developer workflows more efficient so they can focus on the most human-centric aspects of their jobs.”
Code generators have been helping developers work for a while now. Copilot was powered by GPT-4 to quickly write and check code. The code can be updated by the Copilot. Amazon’s AWS also has CodeWhisperer, which also writes, checks, and updates code. A code-writing tool in AlphaCode isn’t out yet.
GitHub’s parent company, Microsoft, and OpenAI are being sued for allegedly violating copyright law with Copilot because the tool can reproduce licensed code.
The release of code Llama and its implications for AI coding: Copilot, Python, open source, and open source GitHub
“It’s exciting that they’re releasing the weights to the community,” says Deepak Kumar, a postdoctoral researcher at Stanford who has studied AI coding, referring to the parameters of the neural network at the core of the model.
Kumar says the release of Meta’s regular language model Llama 2 led to the formation of communities dedicated to discussing how it behaves and how it can be modified. It gives us a little bit more flexibility and lets us explore what’s going on underneath the hood, compared to closed-source models.
Kumar says developers will use Code Llama to build new applications. For example, it could be possible to create a programming assistant that performs various additional safety checks before recommending a chunk of code, says Kumar, whose own research has explored how AI assistance can sometimes lead to less secure code. Kumar thinks that the release will inspire the creation of assistants for certain types of coding. “You can build all sorts of tooling on top of the model,” he says.
In May of 2021, the GitHub division of Microsoft launched Copilot, a plug-in for coding programs that can be used to complete sections of code based on the first line or comment of the user. The GPT is a large language model similar to the one behind the chatGPT. That model is trained further using code that GitHub stores for developers, as well as, reportedly, by contractors who are paid to annotate their own code.
A lawsuit is likely for the use of open source code in the training data, and Meta is likely to limit the training data to avoid these types of problems. Copilot costs $10 per month for individuals and $19 per month, per user, for businesses.
Accessibility and Data Requirements for AI-Aided Chatbots: The Censorship of Llama 2
A key piece of the world’s acclaimed chatbot remains a closely guarded secret, despite it being able to play with powerful artificial intelligence.
In recent months efforts to make the technology more accessible seem to have gained steam. A model from Meta was leaked in May, and it gave outsiders access to the underlying code and the weights that determine how it behaves. Meta made an even more powerful model, called Llama 2, available for anyone to download, modify and reuse. The models of Meta have become very popular with many companies, researchers and people building tools and applications.
Llama 2 is free to get, modify and deploy but not covered by a conventional open source license. Meta’s license prohibits using Llama 2 to train other language models, and it requires a special license if a developer deploys it in an app or service with more than 700 million daily users.
This level of control means that Llama 2 may provide significant technical and strategic benefits to Meta—for example, by allowing the company to benefit from useful tweaks made by outside developers when it uses the model in its own apps.
Models that are released under normal open source licenses, like GPT Neo from the nonprofit EleutherAI, are more fully open, the researchers say. It is difficult to get equal footing for such projects.
First, the data required to train advanced models is often kept secret. Large corporations control the software frameworks that are required to build such models. The most popular ones are maintained by both Google and Meta. For training a large model, computer power is beyond the reach of most normal developers and companies, requiring tens of millions of dollars for a single training run. The human labor necessary to improve models is a resource only available to big companies with deep pockets.