Search This Blog

Powered by Blogger.

Blog Archive

Labels

Showing posts with label LLMs. Show all posts

The Privacy Risks of ChatGPT and AI Chatbots

 


AI chatbots like ChatGPT have captured widespread attention for their remarkable conversational abilities, allowing users to engage on diverse topics with ease. However, while these tools offer convenience and creativity, they also pose significant privacy risks. The very technology that powers lifelike interactions can also store, analyze, and potentially resurface user data, raising critical concerns about data security and ethical use.

The Data Behind AI's Conversational Skills

Chatbots like ChatGPT rely on Large Language Models (LLMs) trained on vast datasets to generate human-like responses. This training often includes learning from user interactions. Much like how John Connor taught the Terminator quirky catchphrases in Terminator 2: Judgment Day, these systems refine their capabilities through real-world inputs. However, this improvement process comes at a cost: personal data shared during conversations may be stored and analyzed, often without users fully understanding the implications.

For instance, OpenAI’s terms and conditions explicitly state that data shared with ChatGPT may be used to improve its models. Unless users actively opt-out through privacy settings, all shared information—from casual remarks to sensitive details like financial data—can be logged and analyzed. Although OpenAI claims to anonymize and aggregate user data for further study, the risk of unintended exposure remains.

Real-World Privacy Breaches

Despite assurances of data security, breaches have occurred. In May 2023, hackers exploited a vulnerability in ChatGPT’s Redis library, compromising the personal data of around 101,000 users. This breach underscored the risks associated with storing chat histories, even when companies emphasize their commitment to privacy. Similarly, companies like Samsung faced internal crises when employees inadvertently uploaded confidential information to chatbots, prompting some organizations to ban generative AI tools altogether.

Governments and industries are starting to address these risks. For instance, in October 2023, President Joe Biden signed an executive order focusing on privacy and data protection in AI systems. While this marks a step in the right direction, legal frameworks remain unclear, particularly around the use of user data for training AI models without explicit consent. Current practices are often classified as “fair use,” leaving consumers exposed to potential misuse.

Protecting Yourself in the Absence of Clear Regulations

Until stricter regulations are implemented, users must take proactive steps to safeguard their privacy while interacting with AI chatbots. Here are some key practices to consider:

  1. Avoid Sharing Sensitive Information
    Treat chatbots as advanced algorithms, not confidants. Avoid disclosing personal, financial, or proprietary information, no matter how personable the AI seems.
  2. Review Privacy Settings
    Many platforms offer options to opt out of data collection. Regularly review and adjust these settings to limit the data shared with AI

The Growing Cybersecurity Concerns of Generative Artificial Intelligence

In the rapidly evolving world of technology, generative artificial intelligence (GenAI) programs are emerging as both powerful tools and significant security risks. Cybersecurity researchers have long warned about the vulnerabilities inherent in these systems. From cleverly crafted prompts that can bypass safety measures to potential data leaks exposing sensitive information, the threats posed by GenAI are numerous and increasingly concerning. Elia Zaitsev, Chief Technology Officer of cybersecurity firm CrowdStrike, recently highlighted these issues in an interview with ZDNET. 

"This is a new attack vector that opens up a new attack surface," Zaitsev stated. He emphasized the hurried adoption of GenAI technologies, often at the expense of established security protocols. "I see with generative AI a lot of people just rushing to use this technology, and they're bypassing the normal controls and methods of secure computing," he explained. 

Zaitsev draws a parallel between GenAI and fundamental computing innovations. "In many ways, you can think of generative AI technology as a new operating system or a new programming language," he noted. The lack of widespread expertise in handling the pros and cons of GenAI compounds the problem, making it challenging to use and secure these systems effectively. The risk extends beyond poorly designed applications. 

According to Zaitsev, the centralization of valuable information within large language models (LLMs) presents a significant vulnerability. "The same problem of centralizing a bunch of valuable information exists with all LLM technology," he said. 

To mitigate these risks, Zaitsev advises against allowing LLMs unfettered access to data stores. Instead, he recommends a more controlled approach. "In a sense, you must tame RAG before it makes the problem worse," he suggested. This involves leveraging the LLM's capability to interpret open-ended questions and using traditional programming methods to fulfill queries securely. "For example, Charlotte AI often lets users ask generic questions," Zaitsev explained. 

"What Charlotte does is identify the relevant part of the platform and the specific data set that holds the source of truth, then pulls from that via an API call, rather than allowing the LLM to query the database directly." 

As enterprises increasingly integrate GenAI into their operations, understanding and addressing its security implications is crucial. By implementing stringent control measures and fostering a deeper understanding of this technology, organizations can harness its potential while safeguarding their valuable data.

IT and Consulting Firms Leverage Generative AI for Employee Development


Generative AI (GenAI) has emerged as a driving focus area in the learning and development (L&D) strategies of IT and consulting firms. Companies are increasingly investing in comprehensive training programs to equip their employees with essential GenAI skills, spanning from basic concepts to advanced technical know-how.

Training courses in GenAI cover a wide range of topics. Introductory courses, which can be completed in just a few hours, address the fundamentals, ethics, and social implications of GenAI. For those seeking deeper knowledge, advanced modules are available that focus on development using GenAI and large language models (LLMs), requiring over 100 hours to complete.

These courses are designed to cater to various job roles and functions within the organisations. For example, KPMG India aims to have its entire workforce trained in GenAI by the end of the fiscal year, with 50% already trained. Their programs are tailored to different levels of employees, from teaching leaders about return on investment and business envisioning to training coders in prompt engineering and LLM operations.

EY India has implemented a structured approach, offering distinct sets of courses for non-technologists, software professionals, project managers, and executives. Presently, 80% of their employees are trained in GenAI. Similarly, PwC India focuses on providing industry-specific masterclasses for leaders to enhance their client interactions, alongside offering brief nano courses for those interested in the basics of GenAI.

Wipro organises its courses into three levels based on employee seniority, with plans to develop industry-specific courses for domain experts. Cognizant has created shorter courses for leaders, sales, and HR teams to ensure a broad understanding of GenAI. Infosys also has a program for its senior leaders, with 400 of them currently enrolled.

Ray Wang, principal analyst and founder at Constellation Research, highlighted the extensive range of programs developed by tech firms, including training on Python and chatbot interactions. Cognizant has partnerships with Udemy, Microsoft, Google Cloud, and AWS, while TCS collaborates with NVIDIA, IBM, and GitHub.

Cognizant boasts 160,000 GenAI-trained employees, and TCS offers a free GenAI course on Oracle Cloud Infrastructure until the end of July to encourage participation. According to TCS's annual report, over half of its workforce, amounting to 300,000 employees, have been trained in generative AI, with a goal of training all staff by 2025.

The investment in GenAI training by IT and consulting firms pivots towards the importance of staying ahead in the rapidly evolving technological landscape. By equipping their employees with essential AI skills, these companies aim to enhance their capabilities, drive innovation, and maintain a competitive edge in the market. As the demand for AI expertise grows, these training programs will play a crucial role in shaping the future of the industry.


 

The Dual Landscape of LLMs: Open vs. Closed Source

 

AI has emerged as a transformative force, reshaping industries, influencing decision-making processes, and fundamentally altering how we interact with the world. 

The field of natural language processing and artificial intelligence has undergone a groundbreaking shift with the introduction of Large Language Models (LLMs). Trained on extensive text data, these models showcase the capacity to generate text, respond to questions, and perform diverse tasks. 

When contemplating the incorporation of LLMs into internal AI initiatives, a pivotal choice arises regarding the selection between open-source and closed-source LLMs. Closed-source options offer structured support and polished features, ready for deployment. Conversely, open-source models bring transparency, flexibility, and collaborative development. The decision hinges on a careful consideration of these unique attributes in each category. 

The introduction of ChatGPT, OpenAI's groundbreaking chatbot last year, played a pivotal role in propelling AI to new heights, solidifying its position as a driving force behind the growth of closed-source LLMs. Unlike closed-source LLMs like ChatGPT, open-source LLMs have yet to gain traction and interest from independent researchers and business owners. 

This can be attributed to the considerable operational expenses and extensive computational demands inherent in advanced AI systems. Beyond these factors, issues related to data ownership and privacy pose additional hurdles. Moreover, the disconcerting tendency of these systems to occasionally produce misleading or inaccurate information, commonly known as 'hallucination,' introduces an extra dimension of complexity to the widespread acceptance and reliance on such technologies. 

Still, the landscape of open-source models has witnessed a significant surge in experimentation. Deviating from the conventional, developers have ingeniously crafted numerous iterations of models like Llama, progressively attaining parity with, and in some cases, outperforming closed models across specific metrics. Standout examples in this domain encompass FinGPT, BioBert, Defog SQLCoder, and Phind, each showcasing the remarkable potential that unfolds through continuous exploration and adaptation within the open-source model ecosystem.

Apart from providing a space for experimentation, other points increasingly show that open-source LLMs are going to gain the same attention closed-source LLMs are getting now.

The open-source nature allows organizations to understand, modify, and tailor the models to their specific requirements. The collaborative environment nurtured by open-source fosters innovation, enabling faster development cycles. Additionally, the avoidance of vendor lock-in and adherence to industry standards contribute to seamless integration. The security benefits derived from community scrutiny and ethical considerations further bolster the appeal of open-source LLMs, making them a strategic choice for enterprises navigating the evolving landscape of artificial intelligence.

After carefully reviewing the strategies employed by LLM experts, it is clear that open-source LLMs provide a unique space for experimentation, allowing enterprises to navigate the AI landscape with minimal financial commitment. While a transition to closed source might become worthwhile with increasing clarity, the initial exploration of open source remains essential. To optimize advantages, enterprises should tailor their LLM strategies to follow this phased approach.

The Pros and Cons of Large Language Models

 


In recent years, the emergence of Large Language Models (LLMs), commonly referred to as Smart Computers, has ushered in a technological revolution with profound implications for various industries. As these models promise to redefine human-computer interactions, it's crucial to explore both their remarkable impacts and the challenges that come with them.

Smart Computers, or LLMs, have become instrumental in expediting software development processes. Their standout capability lies in the swift and efficient generation of source code, enabling developers to bring their ideas to fruition with unprecedented speed and accuracy. Furthermore, these models play a pivotal role in advancing artificial intelligence applications, fostering the development of more intelligent and user-friendly AI-driven systems. Their ability to understand and process natural language has democratized AI, making it accessible to individuals and organizations without extensive technical expertise. With their integration into daily operations, Smart Computers generate vast amounts of data from nuanced user interactions, paving the way for data-driven insights and decision-making across various domains.

Managing Risks and Ensuring Responsible Usage

However, the benefits of Smart Computers are accompanied by inherent risks that necessitate careful management. Privacy concerns loom large, especially regarding the accidental exposure of sensitive information. For instance, models like ChatGPT learn from user interactions, raising the possibility of unintentional disclosure of confidential details. Organisations relying on external model providers, such as Samsung, have responded to these concerns by implementing usage limitations to protect sensitive business information. Privacy and data exposure concerns are further accentuated by default practices, like ChatGPT saving chat history for model training, prompting the need for organizations to thoroughly inquire about data usage, storage, and training processes to safeguard against data leaks.

Addressing Security Challenges

Security concerns encompass malicious usage, where cybercriminals exploit Smart Computers for harmful purposes, potentially evading security measures. The compromise or contamination of training data introduces the risk of biased or manipulated model outputs, posing significant threats to the integrity of AI-generated content. Additionally, the resource-intensive nature of Smart Computers makes them prime targets for Distributed Denial of Service (DDoS) attacks. Organisations must implement proper input validation strategies, selectively restricting characters and words to mitigate potential attacks. API rate controls are essential to prevent overload and potential denial of service, promoting responsible usage by limiting the number of API calls for free memberships.

A Balanced Approach for a Secure Future

To navigate these challenges and anticipate future risks, organisations must adopt a multifaceted approach. Implementing advanced threat detection systems and conducting regular vulnerability assessments of the entire technology stack are essential. Furthermore, active community engagement in industry forums facilitates staying informed about emerging threats and sharing valuable insights with peers, fostering a collaborative approach to security.

All in all, while Smart Computers bring unprecedented opportunities, the careful consideration of risks and the adoption of robust security measures are essential for ensuring a responsible and secure future in the era of these groundbreaking technologies.





Microsoft ‘Cherry-picked’ Examples to Make its AI Seem Functional, Leaked Audio Revealed


According to a report by Business Insiders, Microsoft “cherry-picked” examples of generative AI’s output since the system would frequently "hallucinate" wrong responses. 

The intel came from a leaked audio file of an internal presentation on an early version of Microsoft’s Security Copilot a ChatGPT-like artificial intelligence platform that Microsoft created to assist cybersecurity professionals.

Apparently, the audio consists of a Microsoft researcher addressing the result of "threat hunter" testing, in which the AI examined a Windows security log for any indications of potentially malicious behaviour.

"We had to cherry-pick a little bit to get an example that looked good because it would stray and because it's a stochastic model, it would give us different answers when we asked it the same questions," said Lloyd Greenwald, a Microsoft Security Partner giving the presentation, as quoted by BI.

"It wasn't that easy to get good answers," he added.

Security Copilot

Security Copilot, like any chatbot, allows users to enter their query into a chat window and receive responses as a customer service reply. Security Copilot is largely built on OpenAI's GPT-4 large language model (LLM), which also runs Microsoft's other generative AI forays like the Bing Search assistant. Greenwald claims that these demonstrations were "initial explorations" of the possibilities of GPT-4 and that Microsoft was given early access to the technology.

Similar to Bing AI in its early days, which responded so ludicrous that it had to be "lobotomized," the researchers claimed that Security Copilot often "hallucinated" wrong answers in its early versions, an issue that appeared to be inherent to the technology. "Hallucination is a big problem with LLMs and there's a lot we do at Microsoft to try to eliminate hallucinations and part of that is grounding it with real data," Greenwald said in the audio, "but this is just taking the model without grounding it with any data."

The LLM Microsoft used to build Security Pilot, GPT-4, however it was not trained on cybersecurity-specific data. Rather, it was utilized directly out of the box, depending just on its massive generic dataset, which is standard.

Cherry on Top

Discussing other queries in regards to security, Greenwald revealed that, "this is just what we demoed to the government."

However, it is unclear whether Microsoft used these “cherry-picked” examples in its to the government and other potential customers – or if its researchers were really upfront about the selection process of the examples.

A spokeswoman for Microsoft told BI that "the technology discussed at the meeting was exploratory work that predated Security Copilot and was tested on simulations created from public data sets for the model evaluations," stating that "no customer data was used."  

Gemini: Google Launches its Most Powerful AI Software Model


Google has recently launched Gemini, its most powerful generative AI software model to date. And since the model is designed in three different sizes, Gemini may be utilized in a variety of settings, including mobile devices and data centres.

Google has been working on the development of the Gemini large language model (LLM) for the past eight months and just recently provided access to its early versions to a small group of companies. This LLM is believed to be giving head-to-head competition to other LLMs like Meta’s Llama 2 and OpenAI’s GPT-4. 

The AI model is designed to operate on various formats, be it text, image or video, making the feature one of the most significant algorithms in Google’s history.

In a blog post, Google CEO Sundar Pichai wrote, “This new era of models represents one of the biggest science and engineering efforts we’ve undertaken as a company.”

The new LLM, also known as a multimodal model, is capable of various methods of input, like audio, video, and images. Traditionally, multimodal model creation involves training discrete parts for several modalities and then piecing them together.

“These models can sometimes be good at performing certain tasks, like describing images, but struggle with more conceptual and complex reasoning,” Pichai said. “We designed Gemini to be natively multimodal, pre-trained from the start on different modalities. Then we fine-tuned it with additional multimodal data to further refine its effectiveness.”

Google also unveiled the Cloud TPU v5p, its most potent ASIC chip, in tandem with the launch. This chip was created expressly to meet the enormous processing demands of artificial intelligence. According to the company, the new processor can train LLMs 2.8 times faster than Google's prior TPU v4.

For ChatGPT and Bard, two examples of generative AI chatbots, LLMs are the algorithmic platforms.

The Cloud TPU v5e, which touted 2.3 times the price performance over the previous generation TPU v4, was made generally available by Google earlier last year. The TPU v5p is significantly faster than the v4, but it costs three and a half times as much./ Google’s new Gemini LLM is now available in some of Google’s core products. For example, Google’s Bard chatbot is using a version of Gemini Pro for advanced reasoning, planning, and understanding. 

Developers and enterprise customers can use the Gemini API in Vertex AI or Google AI Studio, the company's free web-based development tool, to access Gemini Pro as of December 13. Further improvements to Gemini Ultra, including thorough security and trust assessments, led Google to announce that it will be made available to a limited number of users in early 2024, ahead of developers and business clients.  

AI Chatbots' Growing Concern in Bioweapon Strategy

Chatbots powered by artificial intelligence (AI) are becoming more advanced and have rapidly expanding capabilities. This has sparked worries that they might be used for bad things like plotting bioweapon attacks.

According to a recent RAND Corporation paper, AI chatbots could offer direction to help organize and carry out a biological assault. The paper examined a number of large language models (LLMs), a class of AI chatbots, and discovered that they were able to produce data about prospective biological agents, delivery strategies, and targets.

The LLMs could also offer guidance on how to minimize detection and enhance the impact of an attack. To distribute a biological pathogen, for instance, one LLM recommended utilizing aerosol devices, as this would be the most efficient method.

The authors of the paper issued a warning that the use of AI chatbots could facilitate the planning and execution of bioweapon attacks by individuals or groups. They also mentioned that the LLMs they examined were still in the early stages of development and that their capabilities would probably advance with time.

Another recent story from the technology news website TechRound cautioned that AI chatbots may be used to make 'designer bioweapons.' According to the study, AI chatbots might be used to identify and alter current biological agents or to conceive whole new ones.

The research also mentioned how tailored bioweapons that are directed at particular people or groups may be created using AI chatbots. This is so that AI chatbots can learn about different people's weaknesses by being educated on vast volumes of data, including genetic data.

The potential for AI chatbots to be used for bioweapon planning is a serious concern. It is important to develop safeguards to prevent this from happening. One way to do this is to develop ethical guidelines for the development and use of AI chatbots. Another way to do this is to develop technical safeguards that can detect and prevent AI chatbots from being used for malicious purposes.

Chatbots powered by artificial intelligence are a potent technology that could be very beneficial. The possibility that AI chatbots could be employed maliciously should be taken into consideration, though. To stop AI chatbots from organizing and carrying out bioweapon strikes, we must create protections.

ChatGPT Privacy Concerns are Addressed by PrivateGPT

 


Specificity and clarity are the two key ingredients in creating a successful ChatGPT prompt. Your prompt needs to be specific and clear to ensure the most effective response from the other party. For creating effective and memorable prompts, here are some tips: 

An effective prompt must convey your message in a complete sentence that identifies what you want. If you want to avoid vague and ambiguous responses, avoid phrases or incomplete sentences. 

A more specific description of what you're looking for will increase your chances of getting a response according to what you're looking for, so the more specific you are, the better. The words "something" or "anything" should be avoided in your prompts as much as possible. The most efficient way to accomplish what you want is to be specific about it. 

ChatGPT must understand the nature of your request and convey it in such a way. This is so that ChatGPT can be viewed as the expert in the field you seek advice. As a result of this, ChatGPT will be able to understand your request much better and provide you with helpful and relevant responses.

In the AI chatbot industry and business in general as well, the ChatGPT model, released by OpenAI, appears to be a game-changer for the AI industry and business.

In the chat process, PrivateGPT sits at the center and removes all personally identifiable information from user prompts. This includes health information and credit card data, as well as contact information, dates of birth, and Social Security numbers. It is delivered to ChatGPT. To make the experience for users as seamless as possible, PrivateGPT works with ChatGPT to re-populate the PII within the answer, according to a statement released this week by Private AI, the creator of PrivateGPT.

It is worth remembering however that ChatGPT is the first of a new era for chatbots. Several questions and responses were answered, software code was generated, and programming prompts were fixed. It demonstrated the power of artificial intelligence technology.

Use cases and benefits will be numerous. The GDPR does bring with it many challenges and risks related to privacy and data security, particularly as it pertains to the EU. 

A data privacy company Private AI announced that PrivateGPT is a "privacy layer" used as a security layer for large language models (LLMs) like OpenAI's ChatGPT. The updated version automatically redacts sensitive information and personally identifiable information (PII) users give out while communicating with AI. 

By using its proprietary AI system PrivateAI is capable of deleting more than 50 types of PII from user prompts before submitting them to ChatGPT, which is administered by Atomic Inc. OpenAI is repopulated with placeholder data to allow users to query the LLM without revealing sensitive personal information to it.