Object Tracking For visionOS Is Here - But How Does It Work?

Apple visionOS Object Tracking In-Depth Video...

Today, I’d like to share my work from the last two weeks, which has been mainly focused on testing the major features released with Object Tracking and Object Reconstruction during WWDC 2024. I will be sharing the good, the bad, and the frustrating aspects. While there were many great things to say about it, there were also a few things I didn’t enjoy as much. However, as always, I try to stay positive and view this as just the beginning. Apple often starts this way and improves drastically over time.

First, let’s define a few things before we continue to ensure you understand.

What Is Object Tracking? Why Is Object Tracking Such A Needed Spatial Feature?

Object Tracking With Visualizations

Object Tracking is a pretty self-explanatory term, but it is important to understand how Apple defines it to get a clearer picture of what we’re dealing with. Apple defines this ARKit feature as “the use 3d reference objects to find and track real-world objects in a person’s environment” which should make more sense now. But you may ask, "Dilmer, what are these 3D reference objects? Does Apple provide them, or do we need to model them before implementing this feature?" Well, no, Apple doesn’t provide these 3D reference objects. Regarding modeling them, I will go over the workflow below to help you with these answers.

Next, why is Object Tracking such a needed spatial feature? Object Tracking has been a huge request from many of my subscribers and clients I’ve worked with over the years. They may not refer to it specifically as Object Tracking but instead as Computer Vision. To give you more context, I get asked at least 2-3 questions a month about camera access on Meta Quest 2-3, Meta Quest Pro, Magic Leap 2, and most recently, Apple Vision Pro. You may ask, "Why does that matter? Why are people asking about camera access?" Well, because they need Object Tracking or a similar solution. For most use cases, when people ask about camera access, they want a better understanding of what users are seeing through the lenses, and Object Tracking makes huge leaps toward that solution.

Object Tracking Requirements:

  • macOS Sequoia (Beta 15.2 or greater)

  • macOS Sequoia is available on Mac computers with Apple Silicon and Intel-based Mac computers with a T2 Security Chip

  • Xcode Version 16.0 beta 2 or greater with visionOS support

    • This includes “Create ML” & “Reality Composer Pro”

  • Apple Vision Pro with visionOS 2.0 or greater

  • Additional Requirements are provided below as we go through the object tracking workflow.

How To Prepare Physical Objects For visionOS Object Tracking?

First, we need to find a way to convert a physical object into a virtual object, meaning we need a 3D model generated from the object we want to scan. There are many tools available today to accomplish this, and I could write numerous articles just on that topic. However, to stay on topic, I want to focus on the tools that Apple freely provides to developers, as well as some I discovered by using the latest beta tools.

Fig 1.0 - macOS Object Reconstruction App

Fig 1.1 - New Object Capture Model

  • Apple’s Guided Capture iOS App: this project is a fully-fledged app that guides you through the process of scanning a physical object. Honestly, Apple did such a beautiful job here. You simply start by placing and resizing a bounding box around your target object, then walk around the object while the app automatically captures images. Each full move around the object counts for a set number of images, and you can also manually capture more with the same app using a provided button. When you’re done, it saves the generated USDZ model to your iPhone’s File Directory.

  • Apple’s Object Reconstruction macOS App: this is a standalone app (see fig 1.0) that allows you to specify a directory of images for object reconstruction. This app also provides options to change the mesh type (triangular mesh vs. quad mesh), quality (preview, reduced, medium, full, raw, and custom), masking (isolate object from environment vs. include environment around object), and cropping options with custom bounding box bounds.

  • Apple’s Photogrammetry Command-Line App: As its name suggests, this is a command-line coding example project that includes all the features listed previously with the other apps. However, instead of having a UI, you interact with it using the command line. It is also important to note that Apple released a Photogrammetry API, which allows you to easily implement these features in your own custom apps.

  • Apple’s Reality Composer Pro > Create A New Object Capture Model Option: I noticed this new option within Reality Composer Pro when using Xcode 16. This option allows you to reconstruct a 3D model from a list of images, similar to the previous apps but embedded into Reality Composer Pro. This was the option I chose to use for the video, and strangely, I found it gave me much better results than using the Guided Capture iOS or the command line. However, the Guided Capture was great for generating all the images, so I used the images generated from it along with Reality Composer Pro for object reconstruction.

With any of these options, you will get a .usdz file generated, which we will use to train our machine learning model using Create ML's Spatial Object Tracking capabilities.

How To Train A Model With visionOS Object Tracking?

Xcode 16 or later provides a tool called Create ML, which allows you to use your Mac to train machine learning models. Currently, this app offers a variety of predefined models you can use for training, but to keep this post focused, we will specifically look at the Spatial Object Tracking option.

To train a model, follow these steps:

  • Open Xcode and click on File > Open Developer Tool > Create ML.

  • Once Create ML opens, click on File > New Project.

  • Choose the Object Tracking template under Spatial.

  • Give your project a name and click Create to generate a new project.

  • Add a new Model Source and drag and drop your .usdz file into the Settings view.

  • Choose the viewing angle depending on your use case for object tracking. If you need to track your model from all angles, select the "All Angles" option. For instance, if your object is always hanging from the wall, select "Front" as your angle. If your model is always on top of a table, choose "Upright."

  • Click on Train to start the training.

    • As of today, training models for object tracking can take anywhere from 8 to 16 hours or more. In my case, training a PS5 controller took over 16 hours, while training an action figure toy took about 12 hours. So, be very patient.

Once training is complete, click on the Output option and then click Get to download your trained model, which will have a .referenceobject extension.

How To Integrate Object Tracking 3D Reference Objects Into Xcode?

There are a few ways to accomplish this. You could use Reality Composer Pro by first creating a new visionOS project with Xcode, then opening one of the default Reality Content Scenes and adding an anchoring component to a transform. Alternatively, you could add the reference object at runtime, which is what Apple did with their visionOS coding example and what I also ended up doing. You can access the full source code for what I did to test these features on GitHub. Big credits to Apple for providing so many resources, which I will add at the end of this post.

I recommend using the code I provided and simply removing all the .referenceobject files from the project. If you have a PS5 controller, you can build the project for your visionOS device and begin testing its features. If you have new .referenceobject files, drag and drop them into the Reference Objects group within Xcode, then rebuild and redeploy the project.

Also, be sure to review the source code in the following order, as it will help you understand the application flow with respect to object tracking, including getting world sensing permissions, checking if object tracking is supported, and handling anchor events:

  • ReferenceObjectLoader.swift

  • AppState.swift

  • VisionOSObjectTrackingDemoApp.swift

  • HomeContentView.swift

  • ObjectTrackingRealityView.swift

  • ObjectAnchorVisualization.swift

Object Tracking Training Time, Performance, Latency, and Accuracy:

This is all very subjective. I didn’t measure anything with complex systems other than using the Create ML tool to determine how long it took to train the models. Performance, latency, and accuracy are all highly subjective, and you can clearly see the results by watching my latest YouTube video on this topic.

Apple Vision Pro Object Tracking Demo

  • Object Tracking Training Times: It took a long time! I was very surprised by how long it took. I thought my machine was older, but I have an M1 Pro with 16GB (2021 model). It’s not the fastest or newest model, but it should be able to train these models much faster. Perhaps this is just the first version, and we’ll see significant improvements soon. So, how long did it take to train each model? It took 16 hours and 52 minutes to train the PS5 controller and 11 hours and 42 minutes for the toy, which was surprising to me. However, the more I thought about it, the less I saw it as an issue. It really depends on the use case: for instance, if you are a manufacturer of a popular coffee machine and have the .USDZ file and Reference Object, you could offer that to developers as part of your SDK, allowing them to integrate those models into their apps without the hassle of training. Similarly, with a widely used model like the PS5 controller, sharing it could be beneficial for others.

  • Performance & Latency: With mobile phones, you can usually tell if the device is peaking in performance if it gets too hot or if apps become unresponsive. With the Apple Vision Pro I didn’t notice any major performance issues, but the PS5 controller occasionally had trouble being identified in certain scenarios or had delays in updating the final position and rotation of the anchor. This might be due to the PS5 controller's lack of vibrant colors or distinctive features compared to the toy I scanned. The toy tracking felt better, though not perfect. In some situations, there was latency when the object was detected but the final target destination wasn’t updated promptly.

  • Accuracy: I have to say this is incredible. The reference object model aligned perfectly with the detected physical object, and even with clutter around the object, the tracking mechanism was able to find the physical objects without problems. I did notice a few issues when moving more than about 6 feet away from the physical objects. This makes sense, as the objects get smaller and the tracking quality decreases with distance.

Final Thoughts About Object Tracking

Apple Vision Pro Fast Movement With Object Tracking

Overall, I feel this is a significant advancement in object detection with Vision Pro. Yes, it is very slow and inconvenient to wait so many hours for training, but aside from that, the workflow worked quite well. I didn’t encounter any issues with Xcode integration or with models being detected when running the experience on my AVP device.

I do wish Apple would improve Reality Composer Pro. It would be helpful if developers could add text, bounding volumes, and 3D object models during anchor visualizations at design time rather than having to code all of that. Why, Apple? Here’s a link to the code I am referring to which should be part of the Reality Composer Pro designer instead.

That’s all for today, everyone. If you have any questions, please let me know below. Thanks!

Additional Resources:

  • Apple Object Capture Landing Page

Previous
Previous

Mixed Reality Utility Kit: Build Spatial-Aware Experiences In Unity

Next
Next

Unity visionOS 2D Windows and Fully Immersive VR