Mastodawn

Eric Brachmann Feb 2, 2023

Due to requests at #ECCV2022 and to make our #MapFreeReloc dataset useful for more tasks, we make the SfM reconstructions of our train set publicly available.

🔥460 SfM models of outdoor scenes all around the world 🔥
https://research.nianticlabs.com/mapfree-reloc-benchmark/dataset

Want to train 460 NeRFs? Go ahead.

Each scene was captured by non-expert users with two independent scans, sometimes months apart. We reconstructed them with COLMAP and aligned them to the original phone trajectories.

Thus, all models are in metric scale.

Map-free Visual Relocalization - Download

Eric Brachmann Jan 12, 2023

Dmytro Mishkin 🇺🇦Jan 12, 2023

We have updated our Image Matching Workshop for #CVPR2023 webpage.

Paper submission deadline: March 19, 2023.
Notification to authors: April 4, 2023.
Camera-ready deadline: April 6, 2023

Challenge: will announce a bit later.
https://image-matching-workshop.github.io

Fourth Workshop on Image Matching: Local Features & Beyond

Image Matching: Local Features & Beyond - CVPR 2023 Workshop

Show thread

Eric Brachmann Jan 11, 2023

@ducha_aiki Model misspecification is the theoretical aspect of "unavoidable". But there are practical issues I would also call unavoidable.

I know you know them, put putting here for completeness.

We never observe the device's pose directly. We _derive_ the pose from implicit measurements, often relying on heuristics. For example, we record camera images and get the pose by optimizing visual constraints. We have noise in key point locations, matching, depth sensor noise, ICP etc...

Show thread

Eric Brachmann Jan 11, 2023

@ducha_aiki Let me try to reformulate my position:

The GT of our benchmarks have error bars, and our benchmarking results have error bars. They are invisible at the moment, and some people act as they would not exist.

I advertise that they do exist. Some people think they are insignificant. We have shown on two benchmarks that they do matter, and should not be ignored.

The next step is to make the error bars visible. I do not know how yet, and it will likely differ per benchmark.

Show thread

Eric Brachmann Jan 11, 2023

@ducha_aiki What you write all makes sense to me.

Whether it's unavoidable or not... The physical device had a precise location in 3D space, sure. But our model of a camera/sensor does not 100% represent the physical device. For example, some of our frequently used benchmark images have significant rolling shutter that was not taken into account.

Some benchmarks will have better models and better calibration than others. But in principle the issue remains, and I stand by that statement. 🙂

Show thread

Eric Brachmann Jan 11, 2023

@ducha_aiki @at For the SfM pGT, we had COLMAP re-estimate the intrinsics. It arrived to very similar values than what we used for D-SLAM, eg 526 vs. 525 focal length.

Note that our evaluation also includes a reprojection error metic, ie evaluation in image space. All our claims hold there as well. It's not the solution, unfortunately.

Show thread

Eric Brachmann Jan 11, 2023

@ducha_aiki Even the broken reloc benchmark I mentioned above might be useful. For example to compare robust optimization methods that just maximize inlier counts wrt a large point cloud. Doesn't really matter what that point cloud looks like.

But we should be aware what a benchmark is and what it isn't. And I think we have to repeat it again and again because new people are entering the field, and they might not be aware of the "details".

Show thread

Eric Brachmann Jan 11, 2023

@ducha_aiki The user might see 9 deg error (success) but the benchmark tells you 11 deg (failure) because of imprecise GT. So, your criterion is well motivated, but not "safe".

A safe criterion could be that you only count failure if the estimated camera looks into the opposite direction. That's probably less affected by GT precision. But then there are all the issues I listed above, eg that such a benchmark is not very precise wrt differences between methods.

Show thread

Eric Brachmann Jan 11, 2023

@ducha_aiki I think I was not very clear, sorry. I was talking about success/failure criteria that are robust wrt imprecise GT. How you motivate those criteria, application-derived or not, is an important question but a different one.

Let's say you define a user-specific criterion, eg. 10 deg. You cannot measure anything user-specific directly, you can only measure what the benchmark tells you.

Show thread

Eric Brachmann Jan 11, 2023

@ducha_aiki And of course, ultimately I hope to one day get a better understanding of these problems, and to provide tools for measuring their impact. So we can have precise comparisons again.

Our discussions help me tremendously. At least to build the understanding :)