Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

On the fly contextual adaptation, #4571

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

KarelVesely84
Copy link
Contributor

adding code for doing 'lattice boosting' and 'HCLG boosting' to do on-the fly contextual adaptation by WFST composition

  • this PR contains the code to do the composition, each utterance can be adapted with different FST
  • this PR does not contain the code to prepare the WFST 'adaptation graphs',
  • for HCLG boosting the graph should be simple (single state graph), for more complicated boosting graphs the HCLG o B composition is too slow...

@KarelVesely84 KarelVesely84 requested a review from danpovey June 18, 2021 15:19
egs/wsj/s5/steps/nnet3/decode_compose.sh Outdated Show resolved Hide resolved
egs/wsj/s5/steps/nnet3/decode_compose.sh Outdated Show resolved Hide resolved
egs/wsj/s5/steps/nnet3/decode_compose.sh Show resolved Hide resolved
egs/wsj/s5/steps/nnet3/decode_compose.sh Show resolved Hide resolved
egs/wsj/s5/steps/nnet3/decode_compose.sh Show resolved Hide resolved
src/nnet3bin/nnet3-latgen-faster-compose.cc Outdated Show resolved Hide resolved
src/nnet3bin/nnet3-latgen-faster-compose.cc Outdated Show resolved Hide resolved
src/nnet3bin/nnet3-latgen-faster-compose.cc Outdated Show resolved Hide resolved
src/nnet3bin/nnet3-latgen-faster-compose.cc Outdated Show resolved Hide resolved
Comment on lines +111 to +115
## Set up features.
if [ -f $srcdir/online_cmvn ]; then online_cmvn=true
else online_cmvn=false; fi

if ! $online_cmvn; then
Copy link
Contributor

@kkm000 kkm000 Jun 18, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

online_cmvn is used only once.

Suggested change
## Set up features.
if [ -f $srcdir/online_cmvn ]; then online_cmvn=true
else online_cmvn=false; fi
if ! $online_cmvn; then
## Set up features.
if [[ ! -f $srcdir/online_cmvn ]]; then

Copy link
Contributor Author

@KarelVesely84 KarelVesely84 Jun 21, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i prefer the script not to deviate too much from steps/nnet3/decode.sh, people might be willing to diff against it.
but, yes it could be shorter and the original idea was that '$online_cmvn' might be used in more than one place...

@kkm000
Copy link
Contributor

kkm000 commented Jun 18, 2021

This is a super interesting observation, thanks @vesis84!

        //       OpenFst docs say that more specific iterators
        //       are faster than generic iterators. And in HCLG
        //       is usually loaded for decoding as ConstFst.
        //
        //       auto decode_fst_ = ConstFst<StdArc>(decode_fst);
        //
        //       In this way, I tried to cast VectorFst to ConstFst,
        //       but this made the decoding 20% slower.

Very related to #4564. While we're trying to squeeze 5% out of it...

@LvHang
Copy link
Contributor

LvHang commented Jun 19, 2021

Hm....This is interesting.

IIRC, the "ConstFst" is actually faster to read ,write and access than "VectorFast" as it has less fragmentation, so the "ReadFstKaldiGeneric" related functions are added and cast the "VectorFst" HCLG to "ConstFst" one when the graph is built.
I checked my old emails and found the following paragraph which is forwarded by Dan, someone said
"We find that const FSTs load from storage significantly faster than vector FSTs. They also take up less space in memory. Const FSTs also speed up decoding by a measurable amount, as they seem to traverse more quickly. Curiously, we see that const FSTs take up approximately 10% more space on disk when compared to vector FSTs."

For the specific iterators are faster than generic iterators, I remembered that's why the codes as follows are added

if (fst_->Type() == "const") {
  LatticeFasterDecoderTpl<fst::ConstFst<fst::StdArc>, Token> *this_cast =
    reinterpret_cast<LatticeFasterDecoderTpl<fst::ConstFst<fst::StdArc>, Token>* >(this);
  this_cast->AdvanceDecoding(decodable, max_num_frames);
....

@kkm000
Copy link
Contributor

kkm000 commented Jun 19, 2021

@LvHang, AFAICR, ConstFst is not transformed in any way when loaded; it can be directly mapped with mmap(2) into the address space. This means it has offsets there the VectorFst has pointers. Offset arithmetics may impose certain overhead, although the 20% difference is hard to explain by it. Possibly, a loss of cache locality is at play here. A good profiler (Intel VTune is freely available now), when best of all running on real hardware (even if a VM has a vPMU, I've personally seen a lot of glitches with these; without PMU, this statistics is unavailable) will collect reliable cache miss statistics.

Also, it's important to do the comparison at the optimization turned to 11: A recent compiler, like GCC 10.2 or Clang 11 or 12, -O3 is mandatory (especially with GCC, as -O2 does not imply loop unrolling, unlike clang), and all architecture advantages taken (e.g., if you run it on Skylake-X Xeon with AVX512, specify -march=skylake-avx512; the default -msse2 does not cut it at all).

fst::ILabelCompare<StdArc> ilabel_comp;
ArcSort(fst2, ilabel_comp);
}
/* // THIS MAKES ALL STATES FINAL STATES! WHY?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fishy line

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the ρ-matcher treats ε specially? I vaguely remember it doesn't. If this is so, and it falls under the "everything else" category, I can guess why it might happen.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@galv What is wrong with that ? Do you react on the present ArcSort or the disabled PropagateFinal ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

He may have been referring to the comment
// THIS MAKES ALL STATES FINAL STATES! WHY?
...
It is only relevant if the --phi-label option was specified (0 is not allowed for phi label),
in which case it will use backoff semantics.
I think it makes sense to keep that commented-out code since the phi stuff is supported.
It is intended that all states in G.fst should be final, because EOS is allowed from all states.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.. PropagateFinal() fixes the semantics of EOS, because it is done as a final-prob, not an arc, so the phi-matcher does not naturally handle it. (In k2 this kind of problem could not happen because final-probs are done as a special transition to the superfinal-state, so we wouldn't need special-purpose code).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @vesis84 the main concern is that you are trying to commit code with a commented out block that you apparently don't understand, from a software engineering point of view.

Also, I thought it was odd that you say that PropagateFinal makes all states final, but fst2 is expected to be a single-state FST based on what you wrote. In this case, I would expect you to set the only state in the FST to be final anyway. Unfortunately, I don't know how phi-symbol matching works, though, so I can't really say more about what your specific problem may be.

Copy link
Contributor Author

@KarelVesely84 KarelVesely84 Jun 22, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In FST composition with ρ-matcher, when there are the ε symbols the output of the composition should be the same regardless if ε is removed or not. On the other hand, for me the runtime of composition was faster, when i removed ε from the boosting graph B in the HCLG o B composition.

It might be that there are many ε in HCLG output symbols, and it is better
to remove ε from B so that only one of the composition arguments has ε.

timer_compose.Reset();

// RmEpsilon saved 30% of composition runtime...
// - Note: we are loading 2-state graphs with eps back-link to the initial state.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought that these are one state graphs, with a bunch of self loops on that single state.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is another reason why I would want to see a recipe to create an appropriating boost fst.

Copy link
Contributor Author

@KarelVesely84 KarelVesely84 Jun 21, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well, it could be 1 state, and i made it as 2 state with <eps> link to state 1, so that the figures look better.
semantically, it is identical...

@galv
Copy link
Contributor

galv commented Jun 19, 2021

this PR does not contain the code to prepare the WFST 'adaptation graphs',

Is it a good idea to merge this PR as is without an example WFST that this is known to work with? Think about how there was an example recipe with the lookahead decoding recipe PR.

@LvHang @kkm000 I suspect something much, much simpler is going on. FST Composition creates a VectorFst as output, and that VectorFst is used only once (to decode a single utterance). Why convert to ConstFst if you're only going to run the Fst just once? Keep in mind that decoding with FSTs typically reads <10% of the FST states, so the overhead of conversion (with touches 100% of states and 100% of arcs) is not worth it here.

@kkm000
Copy link
Contributor

kkm000 commented Jun 21, 2021

@galv, you're almost certainly right.

@vesis84, only the profiler knows the ultimate truth. When I profile a piece of sluggish code for the first time, I much more often get surprises 🎉🎉 than not. Like, InfoZip's unzip spends 90% of decompression time in libz's crc32() function. Who's expect that!

@KarelVesely84
Copy link
Contributor Author

KarelVesely84 commented Jun 21, 2021

Hi, thank you for the feedback. I adopted some of the suggestions.

Some other suggestion are pointing into the 'old' code that was taken, copied and modified.
I don't think we should deviate from the style of that original code too much,
it would be good if it stays possible to diff against the original code.

Other thing is that, sometimes the code seems to be 'over-explicit',
and this is for a good reason of simple readability. It might annoy
some people, but also help other people. So it is difficult to decide
about the shortenings. I try to stick with the style of other code
as much as possible.

I'll process your inputs and re-open later.
Thanks
Karel

// latbin/lattice-compose-fsts.cc

// Copyright 2020 Brno University of Technology; Microsoft Corporation

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this file actually intended to be added? It seess to be the same as lattice-compose, with some relatively small changes, and essentially the same interface; I don't understand why it is a new program instead of a modification to the old one.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, they are similar, the original lattice-compose does compose 'N lattices' with 'N lattices', or 'N lattices' with '1 FST'...

yes, it could be told externally that there are rxpecifier has 'N lattices' by default or 'N FSTs' with some boolean option like '--read-fsts-from-rxspec', but that would be lead to a more complicated interface, which goes against the principle of 'copy and modify' to keep things clear that we were using in the past...

and yes i'll do the change, if it is needed

"The FST weights are interpreted as \"graph weights\" when converted into the Lattice format.\n"
"\n"
"Usage: lattice-compose-fsts [options] lattice-rspecifier1 "
"(fst-rspecifier2|fst-rxfilename2) lattice-wspecifier\n"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am thinking this can be done with a string arg called e.g. "--compose-with-fst", defaulting to "auto" which is the old behavior, meaning: rspecifier=lats, rxfilename=FST; and true/True or false/False is FST or lattice respectively.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to cosult about this PropagateFinal(phi_label, fst2).
Is it needed there ?

I did not want its behavior that all the states in the FST graph were becoming final states
(2nd arg of FST composition) and the contextual adaptation was not working well due to that...

It was maybe helpful when composing lattices with lattices,
to make them better 'match' and avoid empty outputs?

Could that be also disabled by an CLI option (--propagate-final=false) ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather modify the existing program if it can be done while clearly not affecting existing usages, just to avoid bloat.
The final propagation is needed to have expected behavior when the FST represents a backoff n-gram language model, because if you don't have that, the proper semantics of backing off to lower order states doesn't happen when it comes to final-probs. I'm not sure what your topology is.. perhaps you can describe it?

@kkm000
Copy link
Contributor

kkm000 commented Jun 22, 2021

@vesis84

sometimes the code seems to be 'over-explicit', and this is for a good reason of simple readability. It might annoy some people, but also help other people.

Readability = good. I have not seen anyone annoyed because he had too little trouble reading code :) I've been annoyed by unreadable code myself. Unless you are referring to the comment about a stack object destroyed at end of scope, which is really excessive, I'd say. One needs to know basic C++ to read code.

I don't think we should deviate from the style of that original code too much

I think that we should do what we say we do and follow the coding style, even though we have old code which doesn't, and fix the old code when we touch it. Or stop saying that we have the standard. Being of two minds in this respect is the worst.

One's logical order of includes is another's illogical order. Alphabetical order is unambiguous. Imagine I edit something unrelated in this file and also casually rearrange includes because I think my order is more logical. You then rearrange them again. Last thing I want to see is the battle of commits. Unless you have any idea how to unambiguously standardize the "logical order," it's better to stick tight to written guidelines.

One day I'll unleash clang-format on the codebase, and it will ultimately squish all style arguments. Which is a good thing, as I want to spend the first day of the rest of my life kaldying, not arguing about our coding style. :)

@danpovey
Copy link
Contributor

danpovey commented Jun 23, 2021 via email

@KarelVesely84
Copy link
Contributor Author

Okay, i give up for the moment. And Dan's comment about extending lattice-compose is right.
I'll work on it later in a separate branch. And working contextual adaptation on lattice-level will be a good start.

One more thing, when i was going through Google papers, i noticed they are using sigma-matcher for contextual adaptation.
The boosting graph contains just the text snippets to be boosted and a sigma arc to match everything else.

I am not sure yet, how difficult would it be to add support of sigma-composition to kaldi for composing
a lattice with FST having a sigma-arc. But the 'wirings' might be analogical to the phi-composition that
already is there... So, maybe one day... Let's see...

@daanzu
Copy link
Contributor

daanzu commented Jun 24, 2021

@vesis84 I am quite interested in this feature, and appreciate you making the PR. Regardless of whether it gets merged right now, I would be curious to see an full example including the "adaptation graph" FST. Even seeing a pre-prepared FST would be enlightening, if you don't want to release the code to generate them.

@stale
Copy link

stale bot commented Aug 23, 2021

This issue has been automatically marked as stale by a bot solely because it has not had recent activity. Please add any comment (simply 'ping' is enough) to prevent the issue from being closed for 60 more days if you believe it should be kept open.

@stale stale bot added the stale Stale bot on the loose label Aug 23, 2021
@kkm000 kkm000 added stale-exclude Stale bot ignore this issue stopped development On backburner. Please comment if you think it should not be. and removed stale Stale bot on the loose labels Sep 22, 2021
KarelVesely84 added a commit to KarelVesely84/kaldi that referenced this pull request Jan 25, 2022
- This is a follow-up of kaldi-asr#4571

- Refactoring 'lattice-compose.cc' to support composition with ark
  of fsts, so that it is done as Dan suggested before:

  I am thinking this can be done with a string arg called e.g.
  "--compose-with-fst", defaulting to "auto" which is the old behavior,
  meaning: rspecifier=lats, rxfilename=FST; and true/True or false/False
  is FST or lattice respectively.

- I added there possibility of rho-composition, which is useful for
  biasing lattices with word-sequences. Thanks to rho-composition,
  the biasing graph does not need to contain all words from lexicon.

- Would you be interested in an example how to use this?
  (i.e. create graphs from text file with python script
   using openfst as library, but that would need to change
   build of openfst to enable python extensions)

- Also which 'egs' recipe would be convenient to use it with?
@danpovey
Copy link
Contributor

I'm going to see if I can find someone to address the issues in this PR. I think it is an important topic that we should pursue.

danpovey pushed a commit that referenced this pull request Jan 31, 2022
… support RhoMatcher (#4692)

* Extending 'lattice-compose.cc' to compose with ark of fsts,

- This is a follow-up of #4571

- Refactoring 'lattice-compose.cc' to support composition with ark
  of fsts, so that it is done as Dan suggested before:

  I am thinking this can be done with a string arg called e.g.
  "--compose-with-fst", defaulting to "auto" which is the old behavior,
  meaning: rspecifier=lats, rxfilename=FST; and true/True or false/False
  is FST or lattice respectively.

- I added there possibility of rho-composition, which is useful for
  biasing lattices with word-sequences. Thanks to rho-composition,
  the biasing graph does not need to contain all words from lexicon.

- Would you be interested in an example how to use this?
  (i.e. create graphs from text file with python script
   using openfst as library, but that would need to change
   build of openfst to enable python extensions)

- Also which 'egs' recipe would be convenient to use it with?

* lattice-compose.cc, resolving remarks from PR #4692

* fixing issue in std::transform with std::tolower, suggesting variant of overloaded function

* lattice-compose, extending the rho explanation
@kkm000
Copy link
Contributor

kkm000 commented May 9, 2022

Ahoj, @vesis84, why? This is a very useful feature. Was it merged in another PR?

@KarelVesely84
Copy link
Contributor Author

KarelVesely84 commented May 10, 2022

Hi,
i thought it was not suitable for integration and stale PR.
Also, part of this PR was reworked and integrated into #4692 (the lattice boosting part).

The remainder is the HCLG boosting part. It worked for me in the paper with noisy input data:
http://www.fit.vutbr.cz/research/groups/speech/publi/2021/kocour21_interspeech.pdf
For HCLG boosting, only isolated words could be boosted. For more
complicated graphs, the composition would be very slow.

I was doing the composition here:
https://github.com/vesis84/kaldi/blob/atco2/src/nnet3bin/nnet3-latgen-faster-compose.cc#L239
For this, the HCLG graph is internally converted from ConstFst to VectorFst,
so it becomes mutable.

The HCLG boosting can also be dangerous. If the boosting values are too big,
the recognition output becomes very bad easily. Lattice boosting usually
produces decent results even with high boosting values.

Are you sure, you want to continue with the HCLG boosting ?
Or what should be the direction to continue ?
I can work on that... Let me know what is the intention...
(i can remove the lattice boosting part from the PR, create some examples...)

K.

@KarelVesely84 KarelVesely84 reopened this May 10, 2022
@jtrmal
Copy link
Contributor

jtrmal commented May 10, 2022 via email

@kkm000
Copy link
Contributor

kkm000 commented Oct 11, 2022 via email

@jtrmal
Copy link
Contributor

jtrmal commented Oct 12, 2022 via email

@jtrmal
Copy link
Contributor

jtrmal commented Oct 12, 2022

@vesis84 do you have the bandwidth to work on this? it would be a welcome feature

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stale-exclude Stale bot ignore this issue stopped development On backburner. Please comment if you think it should not be.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants