The Need for Data for AI Generated Engineering
More than just geometry, we need design intent and validation.
There has been increasing interest and conjecture about AI in the 3D design space over the past year on the back of the hype surrounding text to image AI with Dall-E, Stable Diffusion and Midjourney, along with breathless excitement about ChapGPT and what it might mean for engineering.
While I am excited to see the incredibly fast progress of developments in the text and 2D space, to consider what it might mean for 3D geometry generation it is worth taking a deep breath before applying the same assumptions to engineering for a number of reasons, but mainly data.
I have been in conversations with many of the product managers at CAD software companies and startups in the engineering design space who are all well aware that data input is a core problem to overcome in adopting AI in engineering. I plan to publish a series of interviews as the field evolves, as well as host presentations and panel discussions around the topic at CDFAM Computational Design (+DfAM) Symposium in NYC in June 2023, but first, let’s look at the state of AI generated 3D content as it stands as of 12:00 January 10th 2023 (things might change between now and when I publish).
The first example of 3D generative AI that I am aware of that is currently easily accessible for public use is the web based Point-E Demo via Hugging Face.
Point-E by OpenAI, the same group that developed ChatGPT is based on text2pointcloud.ipynb, a small text-to-3D model experiment to produce 3D point clouds directly from text descriptions. This model’s capabilities are limited, but it does understand some simple categories and colors.
You can read the paper Point-E: A System for Generating 3D Point Clouds from Complex Prompts if you are interested in the process behind the results.
Our method first generates a single synthetic view using a text-to-image diffusion model, and then produces a 3D point cloud using a second diffusion model which conditions on the generated image. While our method still falls short of the state-of-the-art in terms of sample quality, it is one to two orders of magnitude faster to sample from, offering a practical trade-off for some use cases.
Interestingly in the paper they call out a potential misuse of the model “where it could be used to fabricate objects in the real world without external validation”, with the prompt “a 3D printable gear, a single gear 3 inches in diameter and half inch thick” as a very relevant example.
If I run the same prompt from the paper, I get a different result, every time.
While I completely understand this is a very early demo it raises a couple of issues right away that we can expect to continue through as recurring themes as the technology develops, in this example, accuracy and repeatability.
The data that trained the massive language and image models for ChatGPT, StableDiffusion and etc. were scraped from millions of databases and websites to train the algorithms, the same volume of open data does not exist for 3D objects in quite the same way.
There are impressive developments by researchers around the world, including NVIDIA generating full color textured geometry from just images and DreamFusionfrom the team at Google which solves this problem by bypassing the need for labeled 3D training data.
In this instance Dreamfusion is using Text to 2D to 3D:
Recent breakthroughs in text-to-image synthesis have been driven by diffusion models trained on billions of image-text pairs. Adapting this approach to 3D synthesis would require large-scale datasets of labeled 3D assets and efficient architectures for denoising 3D data, neither of which currently exist. In this work, we circumvent these limitations by using a pretrained 2D text-to-image diffusion model to perform text-to-3D synthesis. We introduce a loss based on probability density distillation that enables the use of a 2D diffusion model as a prior for optimization of a parametric image generator. Using this loss in a DeepDream-like procedure, we optimize a randomly-initialized 3D model (a Neural Radiance Field, or NeRF) via gradient descent such that its 2D renderings from random angles achieve a low loss. The resulting 3D model of the given text can be viewed from any angle, relit by arbitrary illumination, or composited into any 3D environment. Our approach requires no 3D training data and no modifications to the image diffusion model, demonstrating the effectiveness of pretrained image diffusion models as priors.
While generation of low fidelity models such as this is a step forward that could be applied to applications like background content in games or animations (or the Metaverse where low-poly aesthetic is the standard), 3D geometry for engineering is a very different beast.
AI generation of functional objects that serve an engineering purpose, capturing design intent and are manufacturable are going to take a lot more than just geometric data or visual appearance.
How we generate, label and communicate that data, whether it be specified by an engineer to include loads, constraints and materials, gathered from empirical testing, real world sensor data or simulation results, the accuracy of all of this data will be critical to the results that an AI might generate. Once we can generate useful geometry we will then need to address issues of validation, accuracy and repeatability.
By now we have all been wowed by the verbose, yet sometimes confidently inaccurate musings of ChatGPT, and a trickle of ‘AI Art’ become a flood of images quickly exposing cultural bias, along with concerns of authorship, ownership and appropriation that is already playing out in these realms. So let’s add these to things we should get ahead of as we deal with the whole data input problem.
I have no doubt that generative AI will make it’s way into engineering workflows, and that the early iterations of AI assisted design tools for optimizing specific applications and processes are already making their way into commercially available software, it is important that we first consider how we are going to create, label and validate the data that will feed these emerging tools.