The Future of SEO: When Will Google Be Able to Detect AI Content? (ft. Hal)
What to expect from artificial intelligence in the future
For today’s post, I tagged in artificial intelligence expert Hal from Language Tech. Read on to learn more about his opinions on the future of AI writing tools (including how he thinks Google will one day be able to detect them).
Intro
Following up on the recent state of AI writing tools written by Tetra, we would like to speculate and prepare for the future of SEO copywriting with these tools. While we’re still in the early days of the technology, it’s impossible to imagine these tools disappearing and not improving over time.
It is safe to bet that these AI writing assistants will become a larger part of a copywriter’s toolbox. But apart from the standard bet on improving fluency, speed, and quality of generated content what does the future hold for AI copyrighting tools?
This blog post is solely an opinion of an AI expert who has spent more than 5 years in the field and has closely participated in and watched many of these advancements.
You can follow me on Twitter and subscribe to my Substack for more posts like this.
Recap
A quick recap on how assistive AI writing tools like Generative Pre-trained Transfer 3 (GPT-3) work:
Implement a deep neural network (think of it as a black box that learns from data) that takes a series of words in the sentence as the input and learns to predict the follow-up word. For example, in the sentence “Where are we going“ the neural network is trained to predict “going“ given “Where are we“ as the input.
Scrap 45TB of text data from the web to inject the entire knowledge of the world stored on the internet into the system.
Scale up the deep neural networks across hundreds (if not thousands) of Graphic Processing Units (GPUs) or highly specialized machine learning hardware like Tensor Processing Units (TPUs) to train 175 billion parameter model.
After training, you get a very fluent text generation system. But it is not enough as it still has many limitations.
What’s next?
Quick adaptability to custom content
GPT-3 is not great with new or custom concepts that are not well represented on the web as text. If the topic you’re writing about is not part of the 45TB data used to train GPT-3, it’s quite unlikely that the model would generate anything factual about it and will start hallucinating inaccurate information instead.
It is unreasonable to assume that GPT-3 would be able to know everything in the world. There’s always confidential information and new facts are added to the world every day.
In the future, you’ll be able to write down a few sentences explaining a new concept to the AI, and after a few steps of comprehension, the AI will be able to write down a factual and consistent article about the topic.
The largest AI writing tools will become inaccessible to the majority of writers
The largest language model at the moment Google PaLM model has 540 billion parameters (3x size of the largest publicly available GPT-3) and is producing incredible results, but is unlikely to be released to the public any time soon.
For reference, the cost to train such a model is estimated to be around $9M to $17M not counting the amount paid to researchers and engineers to train and test multiple iterations of this model. Expect such models to be inaccessible to the majority of writers. Google is one of the very few places in the world having control of these powerful models.
Google will get even better at detecting AI content at scale REALLY soon.
The biggest implication of Google controlling the largest AI language model (PaLM) is not the potential improvements in the written content but the ability to detect the AI-generated content produced by other models with greater accuracy.
Tetra highlighted in his blog post that the GLTR tool can already reasonably detect the content produced by GPT-3 even though it was originally trained on GPT-2. Considering that GPT-3 and similar tools like AI21Labs and co:here are easily accessible through their API, I have no doubts that Google will easily sample tons of data from them and use them to feed their own AI writing detector. The larger size of the internal PaLM model will help improve the detection of text generated by the relatively smaller AI writing tools.
The question remains will some writers be able to game Google’s AI writing detector? I think so. It remains to be seen how we can best game the AI detectors so expect to learn more as AI writing tools become more widely used.
The situation reminds me of cat and mouse games with Captcha breakers in the late 2000s. Eventually, the Captchas were broken, so the industry moved on to other tools to protect the web content.
Don’t expect the tricks to last long.
Interactive writing by explaining what and where to correct.
To generate content with GPT-3, you simply write down the prompt and try pressing the Generate button several times. If you don’t like the generated text you come back again to tune the prompt. Hopefully, after multiple iterations of the process, you get the text that you are happy with.
Unfortunately, this process is quite repetitive and can quickly add up your costs. Wouldn’t it be better if you could simply highlight the text that you don’t like, explain to AI how would you change it, and simply refine the generated text? Something like negotiating with an AI writing tool to get to the text that you would like to see would be a gamechanger.
Luckily, it is something that OpenAI is already working on with the InstructGPT series. InstructGPT models are trained with humans in the loop, who learn to follow the intentions. It’s trained on top of GPT-3 using reinforcement learning from human feedback that feeds the generations rated higher by human labelers back into the GPT-3 model in order to improve it.
More weird ways to play with the prompts
Prompt tuning is still considered a dark art, largely not understood by the expert AI researchers and most active users of GPT-3.
Expect power users to discover new powerful ways to express the intent of the GPT3 model while communicating with them in weird ways. I wouldn’t be surprised to see separate AI models that are trained to generate and improve the prompts in order to satisfy your goal.
Something that we are already seeing with another large AI model that generates an image given the text prompt released by OpenAI.
Conclusion
The next few years in joint human and AI writing are going to be exciting. I suggest you stay closer to the source and play with core AI writing APIs like GPT-3, AI21Labs, and co:here in addition to marketing tools like Jasper that build on top of them.
And subscribe to my Substack where I plan to cover the best tricks of extracting the most value from natural language processing technology.
Free articles on Second Income SEO are supported by:
►Surfer SEO - Generate an entire content strategy with a few clicks. This tool features a content editor that’s equipped with Natural Language Processing technology. It gives you the exact phrases you should use to help your article climb to page one in the SERPs and outrank your competition. If you’re not using this tool, you’re falling behind.
►Jasper - Use an AI assistant to write your content faster and easier. Jasper creates original content for meta descriptions, emails, subheadings, post headlines, website copy, and more. Automate your content creation and free up hours of your valuable time.
►Kinsta - Sign up for the best hosting provider with just a few clicks. Follow an easy and intuitive process to set up hosting for your WordPress site.
Very insightful. Do you have a timeline for when Google will be able to detect AI generated images of fake people?
Can you write an article on how to do research to write a niched SEO article? Where do you find sources to fact check and do the actual writing?