AI shadow mode enables intelligent systems to continue to learn about the individual user and the surrounding environment while reducing the likelihood that the intelligent system will produce outputs or perform actions that may pose a threat or cause harm to the user or the surrounding community.
Unlike previous forms of machine learning where an algorithm was created first and trained by the user through direct interaction via a digital interface, today's intelligent systems provide an interface to users before they begin learning from them. These new systems will be able to provide smarter predictions and recommendations based on prior user interactions without the need for a user to be involved in the decision-making process.
When conducting a shadow test, engineers can directly compare the output of an intelligent system against the output of the production system, without users being required to see either output. This allows engineers to understand how precise, safe and timely their intelligent systems will perform when users are given direct control of the system. In addition, the creation of shadow systems allows for the development of better risk-reduction frameworks to help engineers and researchers investigate how an intelligent system learns and behaves.
Although the development of these types of intelligent systems continues to evolve as research continues, the ability for intelligent systems to learn from their users will create a safer and easier overall experience for users.
In many product sectors, shadow mode has gained wide recognition as a standard method of testing and validating new algorithms before implementing them into production.
Managed offerings such as Amazon's SageMaker allow "shadow testing" for clients to test how their new version of the model will operate on current traffic. Teams can acquire metrics, such as error rates, ahead of production. Shadow testing has now become part of standard MLOps best practices for determining if a model is ready for production.
Mobile operating systems (OS) and applications (apps) utilize local, low-impact shadowing testing. For example, on phones, when making keyboard suggestions, or when providing suggestions for better battery life, the mobile models will gather information about user activity, refine themselves, and then present users with suggestions. By running these models locally, on-device shadow testing helps to protect user privacy. Related research is underway investigating the security of on-device inference processes and how to use Trusted Execution Environments (TEEs) to protect user data.
Tesla's Autopilot/FSD (full self-driving) system is one of many examples of how automotive manufacturers are implementing shadow testing for self-driving cars. Shadow testing allows the vehicle to collect information about what the autonomy stack would do in various driving situations and logs the results of discrepancies, or snapshots, to be later utilized for improving the vehicle's autonomy. Shadow testing enables automotive companies to identify unique driving situations without placing consumers in danger.
Overall, these examples indicate a trend towards leveraging shadow testing to provide confidence that new algorithms can be validated before they are activated with real-world authority.
SHADOW DEPLOYMENTS ARE POWERFUL, BUT NOT FREE.
If you ran duplicate inference workloads (e.g., by shadowing), you would need to pay for the additional compute resources and potentially run at least twice to three times as many resources as you are currently using.
Detailed snapshots of the logging infrastructure can take up a lot of storage space and require more sophisticated tools to index and triage rare failure modes.
Shadow pipelines introduce more complexity in routing, synchronization, metric alignment, and drift detection, and need to be set up robustly.
Shadow-mode can be misleading when logs do not reflect downstream actuation complexity. Proper test design is essential.
With these tools and approaches, shadow mode will allow artificial intelligence to earn the trust and the right to assist.
Try this: Emotional Latency in AI
Also read: AI Time Capsules