Thursday, July 3, 2025
Social icon element need JNews Essential plugin to be activated.
No Result
View All Result
Digital Currency Pulse
  • Home
  • Crypto/Coins
  • NFT
  • AI
  • Blockchain
  • Metaverse
  • Web3
  • Exchanges
  • DeFi
  • Scam Alert
  • Analysis
Crypto Marketcap
Digital Currency Pulse
  • Home
  • Crypto/Coins
  • NFT
  • AI
  • Blockchain
  • Metaverse
  • Web3
  • Exchanges
  • DeFi
  • Scam Alert
  • Analysis
No Result
View All Result
Digital Currency Pulse
No Result
View All Result

OpenAI GPT 4o ranked as best AI model for writing Solidity smart contract code by IQ

October 22, 2024
in Web3
Reading Time: 3 mins read
A A
0

[ad_1]

Receive, Manage & Grow Your Crypto Investments With BrightyReceive, Manage & Grow Your Crypto Investments With Brighty

SolidityBench by IQ has launched as the primary leaderboard to judge LLMs in Solidity code technology. Accessible on Hugging Face, it introduces two modern benchmarks, NaïveJudge and HumanEval for Solidity, designed to evaluate and rank the proficiency of AI fashions in producing sensible contract code.

Developed by IQ’s BrainDAO as a part of its forthcoming IQ Code suite, SolidityBench serves to refine their very own EVMind LLMs and evaluate them in opposition to generalist and community-created fashions. IQ Code goals to supply AI fashions tailor-made for producing and auditing sensible contract code, addressing the rising want for safe and environment friendly blockchain functions.

As IQ instructed CryptoSlate, NaïveJudge presents a novel method by tasking LLMs with implementing sensible contracts primarily based on detailed specs derived from audited OpenZeppelin contracts. These contracts present a gold normal for correctness and effectivity. The generated code is evaluated in opposition to a reference implementation utilizing standards corresponding to useful completeness, adherence to Solidity greatest practices and safety requirements, and optimization effectivity.

The analysis course of leverages superior LLMs, together with totally different variations of OpenAI’s GPT-4 and Claude 3.5 Sonnet as neutral code reviewers. They assess the code primarily based on rigorous standards, together with implementing all key functionalities, dealing with edge instances, error administration, correct syntax utilization, and total code construction and maintainability.

Optimization issues corresponding to gasoline effectivity and storage administration are additionally evaluated. Scores vary from 0 to 100, offering a complete evaluation throughout performance, safety, and effectivity, mirroring the complexities {of professional} sensible contract growth.

Which AI fashions are greatest for solidity sensible contract growth?

Benchmarking outcomes confirmed that OpenAI’s GPT-4o mannequin achieved the very best total rating of 80.05, with a NaïveJudge rating of 72.18 and HumanEval for Solidity move charges of 80% at move@1 and 92% at move@3.

Apparently, newer reasoning fashions like OpenAI’s o1-preview and o1-mini had been overwhelmed to the highest spot, scoring 77.61 and 75.08, respectively. Fashions from Anthropic and XAI, together with Claude 3.5 Sonnet and grok-2, demonstrated aggressive efficiency with total scores hovering round 74. Nvidia’s Llama-3.1-Nemotron-70B scored lowest within the high 10 at 52.54.

SolidityBench scores for LLMs (Hugging Face)
SolidityBench scores for LLMs (Hugging Face)

Per IQ, HumanEval for Solidity adapts OpenAI’s authentic HumanEval benchmark from Python to Solidity, encompassing 25 duties of various issue. Every process contains corresponding assessments suitable with Hardhat, a well-liked Ethereum growth setting, facilitating correct compilation and testing of generated code. The analysis metrics, move@1 and move@3, measure the mannequin’s success on preliminary makes an attempt and over a number of tries, providing insights into each precision and problem-solving capabilities.

Targets of using AI fashions in sensible contract growth

By introducing these benchmarks, SolidityBench seeks to advance AI-assisted sensible contract growth. It encourages the creation of extra refined and dependable AI fashions whereas offering builders and researchers with precious insights into AI’s present capabilities and limitations in Solidity growth.

The benchmarking toolkit goals to advance IQ Code’s EVMind LLMs and likewise units new requirements for AI-assisted sensible contract growth throughout the blockchain ecosystem. The initiative hopes to deal with a important want within the business, the place the demand for safe and environment friendly sensible contracts continues to develop.

Builders, researchers, and AI lovers are invited to discover and contribute to SolidityBench, which goals to drive the continual refinement of AI fashions, promote greatest practices, and advance decentralized functions.

Go to the SolidityBench leaderboard on Hugging Face to study extra and start benchmarking Solidity technology fashions.

🤖 High AI Crypto Property

View AllMentioned on this article

[ad_2]

Source link

Tags: CodecontractGPTModelOpenAIRankedsmartSolidityWriting
Previous Post

Google’s AI Podcast Creator Goes Viral: A New Era of Content

Next Post

Critical areas for productivity gains with data and AI

Next Post
Critical areas for productivity gains with data and AI

Critical areas for productivity gains with data and AI

Why Are Ethereum ETFs Underperforming? Bitwise CEO Reveals

Why Are Ethereum ETFs Underperforming? Bitwise CEO Reveals

Bucket Protocol on Sui Eyes Record $30M in TVL

Bucket Protocol on Sui Eyes Record $30M in TVL

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Social icon element need JNews Essential plugin to be activated.

CATEGORIES

  • Analysis
  • Artificial Intelligence
  • Blockchain
  • Crypto/Coins
  • DeFi
  • Exchanges
  • Metaverse
  • NFT
  • Scam Alert
  • Web3
No Result
View All Result

SITEMAP

  • About us
  • Disclaimer
  • DMCA
  • Privacy Policy
  • Terms and Conditions
  • Cookie Privacy Policy
  • Contact us

Copyright © 2024 Digital Currency Pulse.
Digital Currency Pulse is not responsible for the content of external sites.

No Result
View All Result
  • Home
  • Crypto/Coins
  • NFT
  • AI
  • Blockchain
  • Metaverse
  • Web3
  • Exchanges
  • DeFi
  • Scam Alert
  • Analysis
Crypto Marketcap

Copyright © 2024 Digital Currency Pulse.
Digital Currency Pulse is not responsible for the content of external sites.