It has been espoused in the generative AI phenomenon that the technology's key uses would include providing personalized shopping experiences for customers and creating content. Nonetheless, generative AI can also be seen to be having a very real impact on fields such as healthcare, for example.
There is a tectonic shift in healthcare and life sciences, as technology is being implemented and data-driven systems are being integrated.
A must-follow trend in this revolution is the burgeoning use of synthetic data, a breakthrough advancement poised to reshape how medical research is conducted, AI is developed, and patient privacy will be protected in the coming years.
Data available in synthetic format is comparable to data available in real-world format (such as real fibers such as hemp). In the course of human evolution, humans have created synthetic products to achieve our goals and to develop new products that improve our lives in many different ways.
It's widely known that synthetic fiber is used in clothing, rope, industrial equipment, automobiles, and many other places. It is because of the ability to create synthetic fiber that a wide range of products can be created that are needed in modern life.
Healthcare is another area where synthetic data can have an impact similar to that of traditional data. Synthetic data is created based on real-world data using a data synthesizer.
These synthesizers may leverage different methods to create synthetic data that have the same statistical and correlative properties as the original data; however, they are completely independent from the real-world data (1, 2).
Notably, synthetic data do not contain any personal identifying information which ensures personal privacy and full compliance with privacy regulations such as the EU’s General Data Protection Regulation (GDPR).
The use of high-fidelity synthetic data for data augmentation is an area of growing interest in data science, generating virtual patient cohorts, such as digital twins, to estimate counterfactuals in silico trials, allowing for better prediction of treatment outcomes and personalised medicine.
Synthetic data allows clinicians to use prompts to generate a conversation between a patient with depression and a therapist where they are discussing the onset of symptoms.
Healthcare providers can also use partially synthetic data, which takes a real-life transcript and has AI adjust it to remove personally identifiable information or private health information, while still telling a cohesive story. This data can then be used to train AI models to develop transcripts, training materials and so on.
Regardless of whether the data is fully or partially synthetic, the data can (and often is) adjusted as needed with additional prompts until it reaches the desired result.
Healthcare is subjected to a variety of privacy rules through HIPAA.
Eliminating these privacy concerns is a primary reason Read feels synthetic data is valuable in training models.
With synthetic data, healthcare providers don’t need to use real people’s data to train models. Instead, they can generate a conversation that is representative of a specific therapeutic intervention without involving anyone’s protected health information.
As Read explains, “Synthetic data also makes it easy to calibrate what we’re looking for — like to generate different examples of how a healthcare provider could say something explicitly or implicitly. This makes it easier to provide different examples and tighten up the information we provide to AI models to learn from, ensuring that we can teach it the right data for providing training or feedback to real-world clinicians.”
Synthetic data also democratizes the ability of different healthcare organizations to train and fine-tune their own machine learning models. Whereas previously, an organization might need to provide hundreds (or even thousands) of hours of transcribed sessions between patients and clinicians as well as other data points, synthetic data erases this barrier to entry.
Synthetic data allows for models to learn and build out responses at a much faster rate — which also makes it easier for new players in healthcare to enter the field.
As Read’s insights reveal, the use of AI and synthetic data isn’t going to replace clinicians’ value or decision-making authority. But with the help of synthetic data, AI can help push clinicians in the right direction to ensure that there is greater standardization and adherence to best practices.
As more providers begin to utilize synthetic data to ensure they are following best practices in all patient interactions and to get feedback on their sessions, they can elevate the quality of care for all.
A similar impact could also be felt in the healthcare sector by the use of synthetic data similar to how traditional data would.
With the help of a data synthesizer, it is possible to create synthetic data based on real-world data. It has been shown that these synthesizers can leverage different methods to produce synthetic data which are capable of being compared to the original data, even if those properties cannot be extracted from the original data, but they are completely independent of real-world data (1, 2).
A distinctive feature of synthetic data is the absence of any personal identifying information, which ensures that the data is completely private to the individual and complies with all needed privacy regulations, such as the General Data Protection Regulation (GDPR) of the European Union.
As a result of increasing interest in data science, the use of high-fidelity synthetic data for data augmentation is becoming increasingly popular. To better predict treatment outcomes and tailor medical treatments for individual patients, digital twins, and virtual cohorts are used to estimate counterfactuals in silico trials, allowing better predictions of treatment outcomes.
As a result of synthetic data, clinicians can generate a conversation between patients with depression and therapists to demonstrate how their symptoms began, and these prompts can be used to guide the conversation.
Providers of healthcare can also use partially synthetic data, which is a combination of a real-life transcript and AI processing that removes any personally identifiable information or private health information, while still telling a coherent story. By using this data, it can then be developed into the types of transcripts, materials for training, etc, that are needed for creating transcripts.
Whether the data being used is synthetic data or not, it can (and often is) manipulated or adjusted, as necessary, with additional prompts, until it reaches the result that is desired regardless of whether the data is synthetic or not.
HIPAA is a sort of Federal law that imposes a variety of privacy rules on the healthcare industry.
The fact that Synthetic Data is useful in training models is because it can eliminate these privacy concerns, according to Read.
To train models based upon synthetic data, healthcare providers do not need to rely on real person-to-person information. This would allow them to generate a conversation in which they would represent a specific therapeutic intervention, without involving any protected health information of anybody involved in such a conversation.
Moreover, Read explains, "Synthetic data also allows us to calibrate our search in a much easier way - like for example, generating examples of how a healthcare provider would be able to send an implicit or explicit message to an individual."
Moreover, synthetic data democratizes the possibility of various healthcare organizations to train and refine their own artificial intelligence models by enabling them to use synthetic data.
An organization might have previously been required to provide hundreds (or even thousands) of hours of transcribed sessions between patients and clinicians, along with other information points about these sessions, in order to offer this service, but with synthetic data, businesses are no longer required to do so.
Using synthetic data, it is possible for models to learn and develop responses at much faster rates as well, making it easier for new players in healthcare to enter the field to learn and build on existing responses.
In light of Read's insights, it's important to emphasize that AI and synthetic data are not going to replace clinicians' capabilities or their decision-making authority as Read identifies. By using synthetic data, however, AI has the potential to help clinicians in the right direction to ensure that better standards of care are observed and that best practices are followed.
As healthcare providers increasingly adopt synthetic data, they gain a valuable tool for adhering to best practices in patient interactions and enhancing the overall quality of care.
By leveraging synthetic data, practitioners can simulate various clinical scenarios, ensuring their approaches align with industry standards and ethical guidelines.
This technology also enables providers to receive constructive feedback on their patient sessions, helping to identify areas for improvement and fostering continuous professional development. The integration of synthetic data into healthcare workflows not only supports more consistent and informed decision-making but also elevates the standard of care delivered to patients across diverse settings.
By embracing synthetic data, providers can drive innovation, improve outcomes, and contribute to a more efficient and patient-centered healthcare ecosystem.