BEST OF THE WEB

Open source models race to beat GPT-4 on coding tasks

Two open source models, WizardCoder 34B by Wizard LM and CodeLlama-34B by Phind, have been released in the last few days. Both models are based on Code Llama, a large language model (LLM) developed by Meta.

Wizard LM claims that WizardCoder 34B outperformed GPT-4, ChatGPT-3.5, and Claude-2 on HumanEval, a benchmark for evaluating the coding abilities of LLMs. However, it appears that Wizard LM compared WizardCoder 34B’s score to the HumanEval rating of GPT-4’s March version, rather than the August version, where GPT-4 achieved an 82 per cent.

Phind also claims that their fine-tuned versions, CodeLlama-34B and CodeLlama-34B-Python, achieved pass rates of 67.6 per cent and 69.5 per cent on HumanEval, respectively. These numbers are almost equivalent to GPT-4’s.

The open source community is said to be obsessed with beating GPT-4, which is considered to be the ultimate benchmark for LLMs. Meta on its own is creating models meant for specific tasks, and they are trying to surpass GPT-4 in those particular tasks.

HumanEval benchmark may not be a perfect measure of the coding abilities of LLMs. Factors like code explanation, docstring generation, code infilling, SO questions, and writing tests are not captured by HumanEval.

OpenAI on its own has not released any details about the training data or evaluation metrics used for GPT-4. This has led some to speculate that OpenAI is holding back its trade secrets in order to maintain its lead in the LLM market.

The sources for this piece include an article in AnalyticsIndiaMag.

IT World Canada Staff
IT World Canada Staffhttp://www.itworldcanada.com/
The online resource for Canadian Information Technology professionals.

Would you recommend this article?

Share

Thanks for taking the time to let us know what you think of this article!
We'd love to hear your opinion about this or any other story you read in our publication.


Jim Love, Chief Content Officer, IT World Canada

Featured Download

ITW in your inbox

Our experienced team of journalists and bloggers bring you engaging in-depth interviews, videos and content targeted to IT professionals and line-of-business executives.

More Best of The Web