开源日报每天推荐一个 GitHub 优质开源项目和一篇精选英文科技或编程文章原文,坚持阅读《开源日报》,保持每日学习的好习惯。
2023年12月30日,开源日报第1059期:
今日推荐开源项目:《Spotube》
今日推荐英文原文:《Can CppRef be ergonomic?》


开源项目

今日推荐开源项目:《Spotube》传送门:项目链接

推荐理由: 该项目是一个开源的Spotify客户端,旨在跨多个平台兼容,而无需依赖Spotify Premium,它利用Spotify的数据API和其他音频来源,如YouTube、Piped.video或JioSaavn


英文原文

今日推荐英文原文:Can CppRef be ergonomic?

推荐理由:该文章主要在讲CppRef的Rust类型,用于处理C++类型的引用, 引入了CppPin确保Rust中对对象的所有权,同时允许在C++中引用。作者也提到字段访问需通过函数或宏,方法调用涉及泛型self类型的支持


Can CppRef be ergonomic?

In a previous post, I said that we simply can’t use Rust references to point to C++ types. This might work at small scale, but for any sizable C++ project, humans can’t promise that there are no other C++ references to the same data — so you run into aliasing violations, unexpected mutations, and the dreaded Undefined Behavior.

So, instead of using &T we’ll create a CppRef<T> to create a C++ reference (or pointer). There’s some early work here — see unsafe_references_wrapped and the linked type, plus this example). This ideally relies on a Rust feature called “arbitrary self types” which I’m working on here in an RFC along with some very fine other people (thanks!)

So far so good. But, one of the open questions has been — can CppRef<T> be ergonomic? And one specific question has come up during the RFC — is it desirable to support generic receivers? For example, is this monstrosity a good or bad idea?

impl SomeType {
  fn some_method(self: impl SomeTrait < Target = Self >) { ... }
}

It turns out that these questions are related.

First, let’s talk about CppPin<T> . If you have some data which may have C++ pointers or references to it, it’s simply not OK to have a Rust reference.

There are some circumstances where the object will be stored over in C++ and all you would ever have in Rust is CppRef s to it:

  • It’s stored in something like a cxx::UniquePtr
  • A C++ method has returned you a reference to something stored over in C++ land entirely.

But, you might sometimes want to own objects in Rust and yet make them available to C++. These might be C++ types or they might be Rust types. In such a case, you need a way to ensure there are only C++ references but no Rust references. That’s what CppPin<T> is for.

CppPin::new(something) consumes the something , thus proving there are no existing Rust references. It can create new C++ references — CppPin::as_cpp_ref() -> CppRef<T> — but there’s no way to get a &T or &mut T . You can safely do weird things to this type in C++, including storing references or pointers to it which you later manipulate, and there are guaranteed to be no Rust references which you discombobulate.

CppPin might seem a strange name in that it’s not exactly about preventing things moving — but it shares lots of the same properties as the regular Pin including an inability to vend references, complexity about “pin projections”, and a general level of annoyingness. CppJail or CppBubble might be better names — opinions welcomed.

Overall, though, I think CppPin<T> is necessary and fairly straightforward.

What about field access? We can’t have &T so we can’t have some_reference.some_field . So, all field access needs to be either via function calls over into C++, or via macros based around addr_of and read (which would be in a function call itself).

So,

// entirely auto-generated code from bindings generator
struct SomeCppType {
  // my_field: usize, // not actually represented
}

impl SomeCppType {
  fn get_my_field(self: CppRef<Self>) -> usize { ... }
  fn set_my_field(self: CppRef<Self>, val: usize) { ... }
  fn get_my_field_ref(self: CppRef<Self>) -> CppRef<usize> { ... }
}

I wanted to find out if this could be made slightly more ergonomic using a macro like field!(some_value, field_name) . This may depend upon stabilization of concat_idents! . I couldn’t get it to work, but ultimately I don’t think it’s a huge deal to need to call methods to get and set field values.

What about method calls?

The awesome thing about CppRef<T> is that it’s pretty much an opaque token. You’ll most commonly get a CppRef<T> from C++, and pass it back to C++, without any need to manipulate or touch the CppRef<T> at all. Most commonly, you’ll pass it back to C++ using as the this pointer in a method call:

fn main() {
  let vulture: CppRef<Vulture> = get_cpp_reference_to_vulture_from_cpp();
  vulture.squawk(); // autogenerated method
}

This is what the “arbitrary self types” feature allows.

However, it would also be nice to call squawk() on a CppPin<Vulture>:

fn main() {
  let vulture: CppPin<Vulture> = obtain_vulture_by_value_from_cpp();
  vulture.squawk(); // autogenerated method
}

This is where the question of generic self types first comes in.

Which of these is better for our (auto-generated) squawk method signature?

impl Vulture {
  // This code would be auto generated
  fn squawk(self: CppRef<Self>) {} // 1
  fn squawk(self: impl AsCppRef<Target=Self>) {} // 2
}

The second option seems appealing because we could implement AsCppRef even on CppPin . This works, but it turns out not to be especially ergonomic, because it consumes the CppPin each time. That is, you couldn’t do:

fn main() {
  let vulture: CppPin<Vulture> = obtain_vulture_by_value_from_cpp();
  vulture.squawk();
  vulture.squawk();
}

You would instead have to do:

fn main() {
  let vulture: CppPin<Vulture> = obtain_vulture_by_value_from_cpp();
  vulture.as_cpp_ref().squawk();
  vulture.squawk();
}

which is very similar to the annoying Pin::as_mut method.

Overall, it seems better to pick option 1, and force people to call as_cpp_ref() each time they want to call a method on the contents of the CppPin . This doesn’t yet seem like a sufficiently good motivation for generic self types.

Finally — what about code that wants to be generic over the type of reference? That is, code which can handle a &Vulture or a CppRef<Vulture> ? Is that even achievable?

Yes!

impl Vulture {
    /// This method can accept either &Self or CppRef<Self>
    /// because both of them impl a Ref trait
    fn squawk(self: impl Ref<Target = Self>) -> u32 {
        // What to do here?
    }

    fn squawk_only_in_rust(&self) {}
    fn squawk_only_in_cpp(self: CppRef<Self>) {}
}

One oddity here is that Rust method calls’ autoref functionality doesn’t work here, so if we want to call this method with a &Vulture we need to say (&my_vulture_by_value).squawk() . Here’s how we’d call this:

let rust_accessible_vulture = Vulture(1);
let cpp_accessible_vulture = CppPin::new(Vulture(2));
(&rust_accessible_vulture).squawk();
cpp_accessible_vulture.as_cpp_ref().squawk();

But more importantly, what could squawk actually do here? The impl Ref is pretty useless — it’s no longer even a useful opaque token to pass back into C++. However, since both references can emit raw pointers, we can do field access. Even though CppRef<T> promises nothing about aliasing or mutability, it can still uphold C++ reference-like promises around alignment and not being null. So the squawk function here could, with suitable use of macros and autogenerated code, access fields within the Vulture and do useful work.

This is a good use for generic receivers.

So. Conclusions are:

  • CppPin<T> is necessary and I don’t think it sucks, though it is annoyingly like Pin in some ways.
  • Field access to CppRef<T> and/or impl Ref<T> are ugly and will probably need methods calls or macros, but this is OK since usually a CppRef<T> is just an opaque token which will be passed back to C++, and field access will be rare.
  • We probably do want to support generic self types, since sometimes people will want to write code that’s generic over &T or CppRef<T> .

    下载开源日报APP:https://openingsource.org/2579/
    加入我们:https://openingsource.org/about/join/
    关注我们:https://openingsource.org/about/love/