Quality is always of the utmost importance in the survey world, and it’s needless to say that the primary goal of any survey platform is to reduce the incidents of poor-quality completions. However, the nature of survey responses is undergoing seismic shifts, with the emergence of generative AI and its unprecedented accessibility challenging all attempts to distinguish genuine responses from fraudulent ones.
Pursuing good authentic data has become a high-stakes chasing game, with types of fraud and their detection techniques, both leveraging the learning curve of the other.
Survey fraud detection, especially in a digital world, has always demanded attention because of the complexity of the scope and scale of the emulations possible through click farms and bot activities. However, the evolution of generative AI-induced fraudulences has nullified all tried-and-tested metrics, leading to false positives and false negatives. Looking for human nuances in ‘a too good to be true answer’, which used to serve as a largely reliable benchmark, while meticulous in process, may not serve the purpose any further.
Fraud detection measures we have been relying on so far
Our first line of defense has always been a rigorous registration process, further secured by VPN blocks, adaptive captchas, and triple-opt-in methods. In addition, the survey entrants are further scrutinized through checks such as digital fingerprinting, attention checks, IP blocks, etc. We also monitor survey quality and create quality scores based on open-end response validity, plagiarism checks, straight lining, outlier identification, and inconsistency in responses.
Correct behaviors are rewarded, and fraudulent behaviors are rectified via negative incentivization, such as custom communication and temporary account suspension.
Now, add the challenges that AI-generated responses pose into that mix. This could dip the numbers further, not just because of the undetected fraudulent responses going up but also because of incorrect detection.
What is making it so grim?
There’s a significant risk of losing a lot of good data misdiagnosed as bad data while bad data goes undetected. Profiling fraudulence through spotting patterns across historical responses, customer bases, and partners and then taking action to disqualify surveys or terminate accounts may not suffice. Survey-related frauds are not necessarily an organized network activity anymore, with AI lowering the entry barrier, making it accessible to individuals, and increasing the erradicity of in-survey frauds.
Although pre-screening surveys and in-survey checks with open-ended questions have been effective, resource-intense methods to detect inauthentic responses have failed.
Yes, open-ended and qualitative survey responses have always required laser-focus detection, lacking a fool-proof measure due to the nature of the subjectiveness associated, often requiring sifting through data manually to look for contextual gaps, human errors, too perfect, idealistic responses. Historically, this process obviously would lead to a few misses as well. However, the chasm has expanded dramatically, with all those inattentive, unengaged responses looking like authentic responses with contextual and human behavioral matches using generative AI. Context, tone, and language can all be set to suit one’s intent. All can be attained with just a few clicks using an AI tool.
AI detectors often fail to identify such fraudulent responses, passing them as human responses.
Taking note of these instances, we ran an experiment with a few open-ended questions. We used Gen AI like ChatGPT and Copilot to generate answers for questions like ‘What is your method of evaluating the performance of start-ups?’ or ‘Where do you travel the most to and why?’
We then checked for the authenticity of these answers using AI detection applications. Here are some samples of the false negatives that we got.
Question – What is your method of evaluating the performance of start-ups?
Question – Where do you travel the most to and why?
Question – What do you think is the most effective way to contribute to social causes?
Alternatively, unique, authentic survey responses can now be polished/ paraphrased using AI tools, which still qualifies them as authentic responses. Flagging those as ‘too good to be true’ would also lead to false positives.
The accuracy of the algorithms of AI detection tools is often put to a stringent test by open-ended responses because the length of responses is too small. This makes the data output very volatile and the detection accuracy low.
To quote another example,
What are the prime reasons why large organizations use AWS?
Here’s what another website had to say:
Live surveys on the other hand have seen increased violations with video and voice deep fakes along with synthetic identities. As per a (regulaforensics.com) report, 37% of the organizations have experienced voice fraud with 29% falling victim to deep fake voices.
Are there any feasible measures?
While the binary answer to whether there’s a foolproof mechanism to counter such AI-generated fraud is ‘no,’ there can be measures that can be taken to resist to a certain extent, curatively if not preventably. Of course, it’s a matter of time for the fraudsters to catch up and the AI models to learn new behaviors. However, the following line of checks can be handy.
- Counteract with probing questions that are more personal/ unique in nature and would not apply to a look-alike profile. For example, if a question like ‘When talking with your employees, please tell us in a few words what is important to them when talking about careers?’ is met with an answer, ‘Employees prioritize growth opportunities, recognition, work-life balance, fair compensation, positive company culture, purposeful alignment, feedback, and supportive leadership in career discussions.’ (AI-generated answer), a probing question like the following can be posed. That’s a great list. However, could you tell me which of these factors is the most important to your employees when discussing their careers and why?
- There could also be a mandate to take a short qualitative interview at the end of a survey with a random mix of a select few questions from the previously answered set to cross-check and verify the values. This can also be done by capturing a video response to open-ended questions.
Drawing the Line: Ensuring Authenticity of Future Survey Responses
As the market research industry moves forward, some unresolved questions must be addressed to ensure more participants complete surveys and market research firms can gather wide-ranging insights from the responses.
Is using AI actually harmful to survey responses? Using AI could potentially be useful in providing insights into user preferences, opinions, and behaviors, especially if it is disclosed by participants. Discarding them as fraudulent may not be the best foot forward unless there are patterns available in the other parts of the overall survey data that suggest otherwise.
In a world where AI has permeated nearly every imaginable sector, it is essential for the market research industry to take these questions up for debate and set the bar for fraudulent responses.
Only by weighing in all the options and their implications will market research firms be able to understand the efforts needed to ensure the accuracy and reliability of the data collected.