Pixvana was a VR Video tech startup from 2016-2019 that built a cloud virtual reality video processing, streaming, and editing software suite SPIN Studio. The company was based in Seattle WA and had traction with large media companies that used its platform to build consumer facing media streaming apps. As the 2015-2018 VR market cycle crashed (Microsoft and Google canceled their consumer headset plans, Meta/Oculus adoption faltered) and consumer VR adoption failed to breakthrough to meaningful usage, Pixvana built enterprise training tools. Ultimately the VR market proved “too-early” and development of Pixvana was shuttered in late 2019.
Pixvana SPIN Studio had comprehensive features to process raw VR video camera files and prepare them for very high quality streaming to headsets at 8k+ resolutions. The app was capable of massive parallel rendering with cloud GPU instances, so that a task that might require 10hrs to render on a single workstation class PC, could be distributed to 100+ nodes and rendered in just minutes.
Some of the core features are shown below, for posterity.
SPIN Play was the headset playback app available in the many VR app stores (Windows, Oculus, Google, iOS, etc.) that could be programmed/skinned with playlists of videos and interactive programs developed using SPIN Studio. The app could be synced over-the-wire and then run in offline mode, which allowed for very efficient management of fleets of headsets. If you had 50 headsets that you wanted to prepare for an event or trade-show, for example, you could prepare content and deploy/update on the fleet, using SPIN Studio and SPIN Play.Pixvana SPIN Studio included both 180 degree and 360 degree camera “stitching”, wherein multiple video files from camera-rigs could be uploaded and “solved” to formats ready for streaming to VR headsets.Parallel processing in the cloud was achieved by “sharding” jobs to multiple rendering nodes. Here dozens of clips are being rendered on 100s of individual GPU and CPU nodes in AWS cloud. Rendering this same set of clips on a high-end workstation would take 100x time. This sort of “cloud-first” approach to manipulating large media files was novel for its time, and remains a yet-to-come technology for video processing in 2023.Getting VR video onto headsets was a complex mess and many startups built video-players with varying approaches to “theater-mode” — a way to organize, deliver, and control playback on VR headsets to controlled groups of viewers (such as for training curriculums). Pixvana SPIN Studio had many features to target individual headsets with specific content and playlists, to gather analytics of how that content was viewed, and to allow for a proctor/guide to set-up group viewing–a requirement for enterprise applications such as training in VR.
Pixvana SPIN Studio’s most innovative and exciting features were it’s in-headset video editing capabilities. Tools for trimming, sequencing, and adding interactive graphics/text to VR video programs were layered on the cloud administration of files and interactive files. Users could put on a headset, edit while in VR viewing the content at high quality, then immediately publish/share to other headsets–since all of the data was in the cloud at all times.
In fall 2015 we made an ”emerging tech” bet on VR and chose a “swing for the fence” scale risk-reward approach. We believed VR would rapidly emerge as a very large-scale industry based on anecdotal buzz and our own profound amazement at early trials of the 6-DOF systems floating about Seattle via Valve’s early-access demonstrations.
I’ve been a founder of several businesses and by my count worked on ~15 v1.0 software products at both startups and large co’s. Pixvana’s SPIN Studio platform far and away exceeded anything else I’ve ever been involved with in terms of system design, technical innovation, and the potential to be of large commercial consequence for decades. Alas, the work also scores as the most catastrophically irrelevant (measured by adoption by end-users we achieved) of my career.
Voodle by comparison was a practical, pragmatic application that required very little technical innovation or real change in users’ expectations, but it did come on the scene at a time of “app saturation” when we were welcomed by a market with quite a bit of app-adoption-friction. We executed well-enough, but failed to find product-market-fit.
Over the last 7 years our approach evolved and ultimately meandered as we shipped a series of interesting tools that scored as not-quite-right for customers. We started with large media companies and followed with makers; pivoted to enterprise learning orgs, to individuals on teams, and ended up in last efforts with “one-to-many” affinity communities. From VR, to mobile selfie video-messaging and of late to web3 and utility for NFTs in community.
All of us that worked on the projects are incredibly disappointed. Hard work, good execution, dogged perseverance – these are table-stakes. Timing and luck are also brutally critical ingredients. We aspired to delight customers. We didn’t. I’m chagrined that we pursued such a wide set of interesting technologies in search of problems to solve—a cardinal sin.
To our shareholders and advisors Thank You for your support of me and the team with your trust, mentorship, and capital. To my colleagues, we did a lot of great work and I know we all take our experience together forward into new chapters to come in our lives.
— Forest Key, Dec 2022
The last 7 years touched the lives of many team members who worked together. For many Pixvana + Voodle were a first job right out of college, and for a few it was their formal job before retirement. From an office in Seattle, we evolved into a remote team in 8 states in our pajamas. We collaborated with passion, and experienced disappointments and achievements.
We have been working with an awesome talent search firm called Fuel Talent and CEO Shauna Swerland reached out to me re: her podcast series What Fuels You. I have recently been listening to a ton of audio books on Audible, and have been getting into thematic podcasts at bedtime and on drive-time… so i dove in enthusiastically and really enjoyed the chat.
“XR-vu“, or “VR-vu”…whichever term comes into vogue in the near-future, i want to go on record as saying that it happened to me a few weeks, ago, and I liked it–a lot!
Yes i’m playing on the word “deja-vu“, that oh so fun feeling of experiencing something and having a sense of foreboding or otherworldly prescience, as though you’ve previously dreamed of the moment or even lived the moment, in a different state of consciousness? Well play with me for a moment–take that feeling, and now imagine what that feels like when it arises because you HAVE experienced the moment before… but in Virtual Reality or another form of XR (extended reality)?
Me wearing a VR headset in the middle of the street, illustrating just how wild and crazy VR experiences can feel? Or, posing for a photo-shoot we did at Pixvana so that we had interesting pictures of people in VR headsets, for blog posts like this, illustrating our experiences in VR! Actually, a very good image that conveys my astonishment at feeling XR-vu for the first time.
That’s what i experienced a few weeks ago when I visited Ollantaytambo Peru, a lovely Andean village about 2 hrs outside of Cusco, the former empire capital of the Incas. I had been to the region before, about 30 years ago when i was backpacking for 18 months after college. However, i had never been to Ollantaytambo’s ruins–not in person. But I did visit Ollantaytambo in Virtual Reality, in a detailed, compelling experience that was built by Microsoft as an example of how tourism and travel might be conveyed using VR. It shipped as Microsoft HoloTour, a demonstration app that launched in 2017. This technical document describes what the team did to build the Holotour experience of Ollantaytambo–quite interesting mix of techniques to photographically capture and convey the site.
Unfortunately i couldn’t find any images to illustrate the experience in the headset–suffice to say that in Holotour, i experienced standing in the midst of the Ollataytambo ruins… and when i visited these dame ruins in April of 2019, i had a triple-take moment that flooded my brain with a sense of *very* strong “deja-vu” like cues. Have i been here before? Why does this place seem so familiar? Did I dream it?
Here I am in Ollantaytambo’s Inca ruins, marveling at the beauty of the region, and the astonishing stone-work that pervades Inca sites.This rock formation is what strongly triggered my sense of XR-vu, as it was prominently featured in the Microsoft Holotour visit to the same site.
No, i had never been here. But yes, i was here in Virtual Reality! Wow. WOW. It was all the fun of deja-vu, times at least 5x… or maybe 10x. It really showed me the difference between seeing a picture or a movie, and having been immersed and felt the unique compelling experience of *presence* that is the hallmark of XR/VR, which triggers activity in the human brain that forms actual spatial *memories* that i was then recollecting/remembering, as though they were real. I don’t know if this feeling would always be as strong, say, if i had felt this sensation many times before? But it was incredibly interesting, and i wanted to first at the podium to share it and i look forward to writing about it more and discussing it with others as they have XR-vu of their own!
Anyone else experience XR-vu of VR-vu?
A wider view of the amazing, beautiful Inca site at Ollantaytambo.
Kudos to Aaron Rhodes and Sean Safreed for the first of many Pixvana videos that outline some of the unique challenges, and solutions, to making great stories and experiences using video in Virtual Reality. This video tackles the unique challenges to working with *really* big video files, on relatively under powered devices and networks. This general approach is something that we think of as “field of view adaptive streaming”, in that unlike traditional adaptive streaming where multiple files are used on the server/cdn to make sure that at any given time, a good video stream is available to the client device… in VR we have to tackle the additional complexity of *where* a viewer is looking within that video. The notion of using “viewports” to break up the stream/video into many smaller, highly optimized for a given FOV, videos, is something we are firing away on at the office these days.
So, should we call this FOVAS for short, for Field of View Adaptive Streaming. ? It is kind of weird, but it makes a lot of sense… i’m using the term regularly, maybe it will stick!
We’re having a lot of fun at the Pixvana working on various VR storytelling technologies, what we have termed “XR Storytelling” as we are thinking broadly about both AR and VR but also xR, such as virtual reality caves, and other as yet to be conceived of immersive platforms which will require similar tools and platforms. One of the key challenges we are working on is how to deliver absolutely gorgeous/high-quality adaptive streaming 360 VR video.
Last week we combined our love for food with our love for VR, and shot a rough blocking short film that we intend to turn into a higher quality production in a few more weeks, when we can bring a higher quality camera rig into the mix. Aaron blocked out the shots while the team at Manolin, the f-ing awesome restaurant next to our office, was prepping for the day. Here is the rough cut:
Then, we threw it into our cloud elastic compute system on AWS and produced several variations as a series of “viewports” which when viewed on a VR headset like the HTC Vive (the best on the market so far) produces some pretty darn immersive/awesome video at a comfortable streaming bandwidth that can delivered on demand to both desktop and mobile VR rigs. Here’s a preview of what the cumulative render “viewports” look like in one configuration of the settings (we are working on dozens of variations using this technique, so we can optimize the quality:bandwidth bar on a per-video basis):
Looking forward to sharing more of what we are up to with the public in the near future–for now, if you are a seattle friend, stop by for a demo, and, delicious dinner at Manolin Restaurant!
Here’s some really clear images and videos that illustrate a VR Video assembly process using a 6 camera go-pro rig. This isn’t meant as a comprehensive how-to, rather, just a visual only guide that I will be using in presentations to walk folks through the process.
A lot of my friends have asked me why i’ve plunged into starting a new company, and, why / how i chose building a VR Video Platform specifically as an area for software innovation? I think i can succinctly summarize as: VR Video is *magical*, and things that are truly *magic* are f8cking cool and rarer than unicorns. I see a unique confluence in time for me, my skills, my passions, and a market need and opportunity. It’s only been about 90 days since I put on my first vintage 2015 VR headset (like many i had tried the 1990s era stuff which just made me vomit), and my Pixvana Co-Founders and I gave birth to our VR Video startup Pixvana this week.
Here’s why:
When i put on a HTC Vive headset for the first time and experienced the demos Valve has been showing in summer of 2015, i experienced a profound, complete, pervasive feeling of what I knew immediately to be what the VR industry calls “presence”. The sensation was right there with other must-try-in-a-lifetime, hard-to-describe-to-someone-who-hasn’t-done-it-yet experiences: falling-in-love, skydiving, scuba, sex, certain recreational mind-expanding drugs, finishing a marathon, watching my wife give birth to our boys… Specifically, for me, I experienced a sense of outer-body time and space travel: time stopped functioning on the normal scale of my daily routines, my body perception was replaced with something “virtual” that was not quite real but not quite fake either, and i was taken to far away imagined worlds–underwater, into robot labs and toy tables and several other places that while not photo-real in their rendering, felt and behaved in ways that were significantly real enough that it WAS REAL.
WEVR’s theBlu, often the first moment of real “presence” experienced by those that have tried the HTC Vive in 2015–it was for me!
When i took the goggles off after that first experience, it took me a good 3-5 minutes to “come back”–just like landing in Europe after a long flight and sensing the Parisian airport as different than my home city departure equivalent, coming back from the virtual world took me a moment of reflection and introspection to balance the “wait a minute, where am i now”? It made me think of existentialism and some of my favorite Jorge Luis Borges short stories–my mind immediately considered “wait, am i still in VR and i am just perceiving another layer of possible reality, waiting to take off another set of goggles within goggles?” This wasn’t a scary thought or psychotic split, rather, a marvel at the illusion that i had just witnessed, like a great card trick from a magician–only it was my own mind that had played the trick on me…
The smile on my friend Lu’s face perfectly captures her “aha moment” of first-time-presence. I’ve seen dozens of friends light up this way during their first time VR trials.
In addition to the Steam VR experience (HTC Vive is just one hardware implementation, what I was really marveling at was Valve’s SteamVR vision and software–not the hardware form factor) in the last few months I’ve tried most of the other mainstream 2016 expected delivery VR experiences: Oculus Rift, Samsung Gear, Playstation’s VR, and a variety of configurations of Google Cardboard and various phones. In terms of delivering “presence”, without a doubt the Vive is on a completely different level–i’d rate it a 10 on a scale of 1-10, the DK2 Rift and Sony VR a 5, Samsung Gear a 3, and Google Cardboard a -5 (i’ll write more in detail about Cardboard in the future–suffice to say it is antithetical to creating any sense of presence, and it does VR an injustice to have so many of them floating around out there, suggesting a inferior experience is to be expected to all the unknowing consumers who have tried it and think they have seen what is coming in VR). But these distinctions between hardware systems this early in the market is really inconsequential. I believe that just like with mobile devices or PCs, within 5 years the hardware will become pretty uniform and indistinct (is there really any difference at all between a iPhone 6 and a Samsung Galaxy 6?), and the real business and consumer differentiation will be in the software ecosystems within the app stores and developer communities that will rise, as well as in the software applications that will be fantastic but will run cross-platform on all of these devices.
Andrew in disbelief, watching a VR video that made him forget he was sitting in my living room.
So for that reason, i’m much more interested in the content and software enablement systems that need to be built to enable creators to build cool shit that will be compelling and magical for consumers. The more magic experienced, the more VR consumption and headsets will be sold, and a virtuous business cycle of new content, demand for that content, more content creators, repeat….
It is clear to me that there are two (2) canonical types of content for these devices–3d CGI environments, and video/still image photography based content. 3D CGI material is very attractive and inherently magical, as it can fully render images that track the users head movement side to side and even at “full room scale” if she walks around and freely explores the environment. A pretty mediocre piece of VR content in 3d CGI on the Vive is pretty darn amazing. A great piece of CGI VR is astoundingly cool (eg: WEVR’s theBlu Experience.
Chris Milk’s U2 VR Video is a glimpse of VR video specific semantics that are just now being worked out–both creatively and from a technology perspective.
On the other hand, even a really great VR video can be pretty darn “meh” on any of the VR headsets, and pretty darn awful and nausea producing on a bad VR headset (‘wassup Google Cardboard!). But it won’t be that way for long–this is more a reflection of the nascent state of VR video than of a fundamental problem with the medium. VR Video Content and the technology to shoot, prepare, fluff, and deliver for playback of VR video will follow a rapid improvement cycle just like other new film mediums have enjoyed. Consider:
In the late 1890s when motion pictures were being introduced, Vaudeville was the mainstream performance art form and most early cinema consisted of “filmed vaudeville”. Within 20 years, unique storytelling technology and production and editing techniques were introduced with films such as the Great Train Robbery, and various intercutting techniques between very different camera compositions (wide shots, close ups, tracking shots, etc.) started to tell stories in ways that bore no resemblance at all to vaudeville’s tropes. This transition from Vaudeville-to-cinema was ~1900-1950 phenomena which included the addition of audio in the 20s and color in the 40s and large format wide aspect ratio spectaculars like VistaVision and Cinerama in the 1950s.
1903 film The Great Train Robbery used a myriad of new techniques in composition and editing, which must have been initially disorienting in their novelty and break from more traditional Vaudeville “sitting in an audience” perspectives that viewers would have been accustomed with.
Television came next and introduced live broadcasting and recorded programs which were stored on tapes in both professional (and later) for consumer distribution on VHS/Beta. Editing was done as “tape-to-tape” transfer, cumbersome and time consuming and actually slower than just cutting film pieces together on a Moviola.
Thankfully i came into the film industry just as digital film making tools were obsoleting devices like this. I’m sure it was just a joy to handle all that film by hand and make splices with razor blades and cuts with glue and tape… NOT!
In the 1990s when i worked at Industrial Light and Magic, the first digital effects and digital post-production projects were just being introduced. When Jurassic Park was made in 1993 there were less than 30 digital effects shots with CGI creatures, but 5 years later there were films being made with 1000s of shots and some that were color graded digitally and thus 100% processed through computers. In that same timeframe non-linear editing tools like the Avid made it so much quicker and time efficient to edit, that editors started to cut films in a whole new style that was much more rapid and varied–it is incredible to watch a sampling of films from the 1985-92 period, and compare them to those from 1996-2000. My teenage sons see the earlier films as i might see a 1922 film pre-sound/color. The analog-to-digital-cinema production transition was perhaps a 1990-2009 transition that started and ended with James Cameron films (The Abyss was the start, and Avatar as the culmination in its perfection of blending digital and analog content seamlessly).
Web video infrastructure enjoyed rapid innovation and disruption, from crappy low-resolution thumbnails in 2000, to pretty darn awesome 4k with robust streaming by 2010.
In the 2000s the web was the big disruptor, and technologies like Quicktime, Flash, Silverlight, Windows Media, and the enabling web infrastructure have pushed televisions which were once broadcast reception devices, into on-demand streaming playback screens for web-content and DVR playback. My household is now dominated by Youtube (which consumes my teenagers free time at all hours of the day on their phones) and Netflix and HBO GO (which dominate my wife and my evenings). Early web-video was mostly inconceivably small and crappy looking, but by 2010 was of the highest quality and matched master recordings in resolution and fidelity.
I’ve given VR Video demos to ~70 folks so far; it has been fascinating to see and hear people’s reactions.
Which brings me to VR Video. It is clear to me that VR Video will disrupt other forms of video consumption and viewing in a similar manner, and following the trend of other media tech adoption, will do so in a much shorter time frame. There is so much to do, so much to build, so many creative problems to solve. I’ll write more about that soon–but for my friends that have asked, now you know the context for my excitement about VR Video.
Forest Key with a “VR Video is going to be frickin awesome” grin sitting on the steps of Pixvana’s new office in Fremont neighborhood of Seattle.