Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is there a way of defining a label? #82

Open
burjui opened this issue May 30, 2019 · 10 comments
Open

Is there a way of defining a label? #82

burjui opened this issue May 30, 2019 · 10 comments

Comments

@burjui
Copy link

burjui commented May 30, 2019

I am generating code for RISC-V and I need to store a function pointer in a register. According to the documentation, this is done with a pair of instructions:

The R_RISCV_PCREL_LO12_I or R_RISCV_PCREL_LO12_S relocations contain a label pointing to an instruction with a R_RISCV_PCREL_HI20 relocation entry that points to the target symbol:

At label: R_RISCV_PCREL_HI20 relocation entry ⟶ symbol
    R_RISCV_PCREL_LO12_I relocation entry ⟶ label

The code would be:

label:
   auipc ra, %pcrel_hi(symbol)     # R_RISCV_PCREL_HI20 (symbol)
   addi ra, ra, %pcrel_lo(label)   # R_RISCV_PCREL_LO12_I (label)

So I need to refer to that label in the relocation for addi, but I cannot find a way to even define a label in an Artifact.

@philipc
Copy link
Collaborator

philipc commented May 30, 2019

Just to make sure I understand how this is encoded: the label needs to be an additional symbol in the resulting object file so that the relocation can reference it, right? Does the R_RISCV_PCREL_LO12_I entry need to specify an addend too, or can it reuse the addend from the R_RISCV_PCREL_HI20 entry?

There is currently no way to do this. One way we might be able to extend the API to support it is for the R_RISCV_PCREL_LO12_I link to specify an addend that is the offset of the label, and then faerie could automatically create the symbol for the label.

@burjui
Copy link
Author

burjui commented May 30, 2019

I am not educated enough in linker stuff, and the docs are still not very detailed, but I believe yes, it must be an additional symbol pointing at the first of two instructions. The docs say nothing about the addend, but I have tried to put the auipc location into it, and the linker complained about addend being set. Here is what docs say:

To get the symbol address to perform the calculation to fill the 12-bit immediate on the add, load or store instruction the linker finds the R_RISCV_PCREL_HI20 relocation entry associated with the AUIPC instruction.
...
Note the compiler emitted instructions for PC-relative symbol addresses are not necessarily sequential or in pairs. There is a constraint is that the instruction with the R_RISCV_PCREL_LO12_I or R_RISCV_PCREL_LO12_S relocation label points to a valid HI20 PC-relative relocation pointing to the symbol.

So it mentions a "relocation label", which I assume must be visible to the linker as a symbol. I guess that is how the linker knows where to look for matching R_RISCV_PCREL_HI20. It may be also the reason the linker emits the following error:

dangerous relocation: %pcrel_lo missing matching %pcrel_hi

when I point R_RISCV_PCREL_LO12_I to the same symbol as R_RISCV_PCREL_HI20. So it has to be a separate symbol.

Probably when the linker's encounters a R_RISCV_PCREL_HI20 relocation on a labeled instruction, it stores the relocation in some label->relocation map, and when it encounters R_RISCV_PCREL_LO12_I relocation to a symbol, it looks up the symbol in that map, retrieves the R_RISCV_PCREL_HI20 relocation and then it has both pieces - the high 20 bits and the lower 12 bits of the offset that will be patched into the auipc+addi instruction sequence.

I hope I got it correctly.

@burjui
Copy link
Author

burjui commented May 30, 2019

If that is correct, then there should be an API to add labels to declared symbols, and once I tell faerie to assemble the object, it should verify that the corresponding symbols are defined too, and labels are not out of bounds. Right?

@philipc
Copy link
Collaborator

philipc commented May 30, 2019

Ok I'm not sure about my addend idea, it may be bending things too much.

Can you flesh out what you think the API should look like?

Here's a straw man proposal: Rather than declaring labels, they are specified simply as a symbol + offset. We add pub fn link_label<'a>(&mut self, link: Link<'a>, label_offset: usize) -> Result<(), Error>, and this will cause faerie to generate a relocation that refers to a symbol at link.to + label_offset, where link.to must be a previously defined symbol and label_offset must be the offset of a R_RISCV_PCREL_HI20 relocation within that symbol.

@m4b Do you have any better ideas about what this API should look like? Maybe add more fields to Link instead? The problem with that is it is inconvenient for normal uses. Maybe change Link to a builder?

@m4b
Copy link
Owner

m4b commented May 30, 2019

Just breezing over this my first thought re APIs was what you wrote @philipc; but before we get into details maybe @sunfishcode can give some insight into labels and how they’re assembled. I have a feeling you might know this area well :)

My fuzzy memory with labels in assembler is they are assembled into a final relative offset to some memory address, and aren’t a symbol at all, but that could be wrong. Easiest way to check is to write some assembly with a label and see what object file gets emitted with what relocations ?

This also assumes that labels in this generic context is the same terminology / meaning as the label used in the sense for RISKV above; perhaps we’re writing a specialized api that isn’t applicable outside of riskv?

Lastly if it is just a symbol, why wouldn’t we use the same api, and the second instruction would have a data reference to the symbol? But maybe this is what philipc proposed along with addends?

@burjui
Copy link
Author

burjui commented May 30, 2019

But should such a feature be platform-specific and tied to R_RISCV_PCREL_HI20? I do not know if it can be generalized for other uses. I think I should study the ELF format and experiment with libelf first. That would give me the basic idea of what I want and how to implement it in general. Maybe then it will be possible to devise a proper API for faerie, knowing what exactly needs to be done.

@burjui
Copy link
Author

burjui commented May 30, 2019

Also I do not think that it is a good idea to use addends for that, since they have their use in other types of relocations, so it feels be a dirty hack to use them where they are not supposed to be. See all the relocations containing "+ A" in the relocation type table. Under the "Address Calculation Symbols" title below it, there is a table explaning that "A" means the addend field.

@burjui
Copy link
Author

burjui commented May 30, 2019

Update: found this comment by jim-wilson, which confirms my hypothesis about the workings of this type of relocatioon.

GNU ld relocates one section at a time. It is while we are relocating a section that we match pcrel_lo relocs to pcrel_hi relocs. In order to resolve a pcrel_lo reloc, we need both the address of the pcrel_hi reloc which is part of the pcrel_lo reloc, and the address of the symbol that the pcrel_hi reloc points to, which is part of the pcrel_hi reloc. So we have to find the matching pcrel_hi reloc to resolve a pcrel_lo reloc. As we scan through the section resolving relocs, we record all of the pcrel_hi and pcrel_lo relocs we see, then match them together at the end to resolve them.

@philipc
Copy link
Collaborator

philipc commented May 31, 2019

For a concrete example, gcc (8.3.0-6ubuntu1~18.04) emits the following:

00000000000000e0 <.L0 >:
  e0:   00000517                auipc   a0,0x0
  e4:   00050513                mv      a0,a0
...
00000000000000f2 <.L0 >:
  f2:   00000517                auipc   a0,0x0
  f6:   00050513                mv      a0,a0

Relocation section '.rela.text' at offset 0xeae0 contains 304 entries:
  Offset          Info           Type           Sym. Value    Sym. Name + Addend
0000000000e0  028400000017 R_RISCV_PCREL_HI2 0000000000000000 .LC0 + 0
0000000000e0  000000000033 R_RISCV_RELAX                        0
0000000000e4  028500000018 R_RISCV_PCREL_LO1 00000000000000e0 .L0  + 0
0000000000e4  000000000033 R_RISCV_RELAX                        0
0000000000f2  028400000017 R_RISCV_PCREL_HI2 0000000000000000 .LC0 + 0
0000000000f2  000000000033 R_RISCV_RELAX                        0
0000000000f6  028600000018 R_RISCV_PCREL_LO1 00000000000000f2 .L0  + 0
0000000000f6  000000000033 R_RISCV_RELAX                        0

Symbol table '.symtab' contains 1395 entries:
   Num:    Value          Size Type    Bind   Vis      Ndx Name
   644: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT    7 .LC0
   645: 00000000000000e0     0 NOTYPE  LOCAL  DEFAULT    1 .L0 
   646: 00000000000000f2     0 NOTYPE  LOCAL  DEFAULT    1 .L0 

Note that it reuses the same name (.L0) for many of these labels. For ELF the linker recognizes .L as the label prefix, and discards them if --discard-locals is given. @m4b Does that answer your question about how labels are assembled?

Lastly if it is just a symbol, why wouldn’t we use the same api, and the second instruction would have a data reference to the symbol?

So the label would be another type of declaration (e.g. DefinedDecl::Label)? That could work too. My instinct was to avoid that because it requires assigning unique names to all the labels, but that isn't a blocker.

@philipc
Copy link
Collaborator

philipc commented May 31, 2019

Also need to consider what is required for labels in Mach-O and COFF.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants