Video: A typical augmented reality scene from “Iron Man 2”.

In the past, people had a lot of imagination about human-computer interaction in augmented reality, and operations like those in Iron Man were expected by everyone. However, the more boundless the imagination, the more disastrous it is for application developers. There must be a unified paradigm for application form and human-computer interaction, and this task is entrusted to the OS. Just as Apple defined the application form and human-computer interaction of mobile phones through iOS, and pointed the direction for the future of smartphones. From this perspective, visionOS provides a very complete and outstanding definition for the application form and human-computer interaction of spatial computing devices.

Applications forms

visionOS defines three application forms that consider both the migration of user habits formed on mobile phones/tablets and fully leverage the interaction advantages of spatial computing applications:

  • Windows
  • Volumes
  • Spaces

Here are the introductions to these three forms:

Windows: It can be simply understood as fixing a two-dimensional iOS or iPadOS application in space, such as the Apple Music application in the figure below, which is very similar to the Apple Music you use on the iPad. Therefore, you can directly compile existing iOS/iPadOS projects into visionOS applications in Xcode. This greatly helps the initial application ecology of visionOS.

It is worth further explaining that the application in the Windows form is only fixed in space. Apart from being able to manually adjust the position and size of the application in space, it will not actively interact with the space. The space is like the background desktop on a PC for it. The realism presented in space is automatically completed by the system, such as the shadow projected on the table.

Image: Apple Music in visionOS as Windows

Volumes: Similar to Windows, Volumes is also a fixed spatial application, but its form has changed from a two-dimensional plane to a three-dimensional box. Within this box, users can place 3D models, allowing them to view and interact with these models. Here are two scenarios to help you better understand Volumes:

  • Education: Load 3D models to present teaching content in a more realistic way. Users can interact with the objects they are learning about by using standard gestures such as clicking, dragging, and zooming.
  • Games: For example, I can fix a chessboard on my desk and move the chess pieces through hand-eye interaction. However, it is worth noting that this type of game does not interact with space itself. For instance, a game where I throw a ball in a room and it bounces off walls and floors cannot be implemented in Volumes.
Image: A scene combining Volumes and Windows is shown in the Xcode simulator.

Spaces: Unlike the previous two forms of applications, Spaces is not simply a two-dimensional plane or a three-dimensional box fixed in space. It regards the entire space in front of you as the stage for the application, and the application can understand and interact with the space to achieve a more advanced augmented reality experience. Therefore, in this form, you can obtain more customization capabilities through ARKit. For example, you can model your own hand interactions by modeling the bones of your hands, or identify walls, floors, tables, sofas, etc. in the space to set up corresponding elasticity. You can also use anchors to place content in space to achieve “decoration” of your house.

Image: Realistic hand pushing down a stack of blocks on a tabletop achieved by tracking hand bones.

Application Space

After putting on the Vision Pro, what you see in front of you is Space. In this Space, multiple applications can appear at the same time, and you can also enjoy this Space alone.

  • Shared Space: Multiple applications share one space.
  • Full Space: One application occupies the entire space.

In Full Space mode, you can use Passthrough to combine the application with reality (AR experience), or use Fully Immersive to enter an “isolated” virtual environment (VR experience). If you want to develop a virtual cinema, you can use Fully Immersive to achieve it.

User interactions

Excellent interaction must balance the naturalness of interaction with the efficiency of input and output.

In basic daily interactions, fixed gestures can be used to achieve more natural interactions, such as clicking, long-pressing, dragging, zooming, etc.

If you create an Entity model based on RealityKit, you can get more interaction support on top of RealityKit. For example, when you get close to an Entity, an event will be triggered. A typical use case is a digital person, who turns to face you and initiates a conversation when you walk up to him/her.

Both of these interactions are predefined templates in the system, but you can also design custom interaction methods. For example, waving to push blocks, which uses the ability of hand bone tracking.

The final interaction method is based on input and output through peripherals, such as keyboards and game controllers. Entity keyboards can improve input efficiency compared to virtual keyboards, which is a great help for Vision Pro as a productivity tool. Controllers can expand more application scenarios, including games.

In summary

As mentioned at the beginning of this article, visionOS has defined very complete and outstanding application forms and human-computer interaction, which is a gospel for XR application developers. However, one more step is needed to make the XR application ecosystem thrive, which is the follow-up of friendly competitors, providing developers and users with cross-platform product definitions and interaction experiences similar to visionOS. This may still take 1-2 years.

Note: All images in this article are from Apple Developer.

By Cosmo

Leave a Reply