Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Analyzer plugin framework #2660

Draft
wants to merge 7 commits into
base: master
Choose a base branch
from
Draft

Conversation

tulinkry
Copy link
Contributor

@tulinkry tulinkry commented Feb 2, 2019

Introducing an analyzer plugin framework

  • loading analyzers implementing an interface from a directory
  • augmenting analyzer guru to use the loaded plugins
  • making analyzer guru a property of RuntimeEnvironment
  • loading the default analyzers as it used to be
  • I tried loading analyzers from opengrok.jar when implementing the interface and it worked

depends on #2659
fixes #2027
approaches #2588

@tulinkry tulinkry force-pushed the analyzer-wrapper branch 2 times, most recently from 2022f6d to 2fc9f32 Compare February 3, 2019 02:25
@coveralls
Copy link

coveralls commented Feb 3, 2019

@tarzanek tarzanek added this to the 1.2 milestone Feb 4, 2019
@tulinkry
Copy link
Contributor Author

tulinkry commented Feb 5, 2019

@vladak for review

@tulinkry tulinkry force-pushed the analyzer-wrapper branch 2 times, most recently from fc2c391 to b27c0d0 Compare February 8, 2019 15:06
*/
package org.opengrok.indexer.analysis;

public interface IAnalyzerPlugin {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please delete the I prefix? I know it is used in C# world but I don't see it that often in Java and I read in some books it should not be used because it shows unnecessary implementation details.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No doubt for that. The trouble here is that we’ve already exposed one interface for plugins with I so this is consistent with the current code.

We should make a change in a future to transfer both cases to the correct name but it’d be a breaking change and I think it shouldn’t be part of this pr.

private static void registerAnalyzer(AnalyzersInfo analyzersInfo, AnalyzerFactory factory) {
for (String name : factory.getFileNames()) {
AnalyzerFactory old = analyzersInfo.fileNames.put(name, factory);
assert old == null :
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, it's been so long I saw assertions :D Maybe last time in Lucene code.

Do we run the code with assertions enabled? I personally don't.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a very old code and should be refactored. No production code should ever rely on assert statements.

* @throws IllegalAccessException if the constructor cannot be accessed
* @throws InstantiationException if the class cannot be instantiated
* @throws NoSuchMethodException if no-argument constructor could not be found
* @throws ClassCastException if the class is not a subclass of {@code
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are you aligning the description? I personally don't like it because if I were to add VeryLongLongLongLongException then I would have to modify other places that have nothing to do with my changes which just makes the diff hard to read. That's the reason we don't do following:

int abc = 1;
int a   = 2;
int ab  = 3;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Idea does that for all of javadoc stuff.

Descriptions and the actual code are two different things.

* {@link AnalyzerFramework#addPrefix(String, AnalyzerFactory)}
* are called to augment the value in {@link AnalyzerGuru#getVersionNo()}.
*/
public final SortedSet<String> customizations;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please add getters? It just does't feel right this way.

Copy link
Contributor Author

@tulinkry tulinkry Feb 10, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I intentionally don’t want to use getters as this class does no provide any functionality so final properties are enough

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even simple DTO classes are always written with getters/setters.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’ve seen codes which don’t follow that. If this is code style for opengrok I can change it of course.

@vladak vladak self-requested a review February 12, 2019 17:26
* @param b the second instance
* @return true if we consider them different, false otherwise
*/
private static boolean factoriesDifferent(AnalyzerFactory a, AnalyzerFactory b) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this class be Comparable then ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an old code, I can do this in another PR so I don't mix refactoring with actual changes.

*/
public synchronized AnalyzerGuru getAnalyzerGuru() {
if (analyzerGuru == null) {
analyzerGuru = new AnalyzerGuru(getPluginDirectory());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should just split these 2 plugin directories into analyzers and authorization plugins. This way it is kind of confusing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not necessary as it's called plugins directory and you can put all your plugins there be it analyzer or authorization plugin. The application frameworks will load only the plugins they support.

However, if it's desired I can split these.

@vladak
Copy link
Member

vladak commented Feb 12, 2019

I wonder how does this interacts with analysis performed from within the webapp.

@vladak
Copy link
Member

vladak commented Feb 12, 2019

It would be nice if you could produce dummy analyzer plugin, ideally using language other than Java. That would also serve as a basis for documentation.

*/
public static final AnalyzerFactory DEFAULT_ANALYZER_FACTORY = new FileAnalyzerFactory();

private static final IAnalyzerPlugin[] DEFAULT_PLUGINS = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume this is a temporary measure until the framework is fully implemented and in the final state it will dynamically load all the plugins from a directory - no hard-coding needed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes this is for seamless transition from the old approach to the new approach.

That is a good point though. I'm affraid that with that approach we would need to distribute two jar files instead of one and more to force people to actually put the second one somewhere (plugins directory) and point the configuration directory there and I think this is a little bit more than what I'd expect to get opengrok working.

@tulinkry
Copy link
Contributor Author

It would be nice if you could produce dummy analyzer plugin, ideally using language other than Java. That would also serve as a basis for documentation.

This is beyond my knowledge, maybe @tarzanek can produce the pdf analyzer to test this

@tulinkry
Copy link
Contributor Author

I wonder how does this interacts with analysis performed from within the webapp.

What do you mean?

 - loading analyzers implementing an interface from a directory
 - augmenting analyzer guru to use the loaded plugins
 - making analyzer guru a property of RuntimeEnvironment
 - loading the default analyzers as it used to be

approaches oracle#2588
@tulinkry
Copy link
Contributor Author

Rebased and updated. Make it a priority for next comments as any change in analyzer guru makes this hard to rebase correctly.

@tulinkry
Copy link
Contributor Author

It would be nice if you could produce dummy analyzer plugin, ideally using language other than Java. That would also serve as a basis for documentation.

This is beyond my knowledge, maybe @tarzanek can produce the pdf analyzer to test this.

I did a unit test with a dummy analyzer to test the basic functionality.

@idodeclare
Copy link
Contributor

Please hold on merging. I have some questions and issues and will write them up from work later today.

@idodeclare
Copy link
Contributor

I do like this patch's refactoring the static AnalyzerGuru -- though in some files you nicely added a local variable for accessing the environment's instance while in others you repeat over-and-over syntax like cfg.getEnv().getAnalyzerGuru() which to me is not as nice.


The approach of this patch treats OpenGrok's supported analyzers as merely a collection of fully independent AnalyzerFactory-derived classes and thus misses accommodating all the ways OpenGrok enlists analyzers.

While OpenGrok can match filename extensions or prefixes, it also selects analyzers in order-dependent non-heuristic and heuristic stages, relying on Matcher implementations running in the correct sequence. This patch does not accommodate these latter stages in the new model -- i.e. there is no capability for an AnalyzerFactory plug-in to register itself at a desired place in sequence. Alternatively there is no revision in this patch to implement e.g. an order-independent scoring model rather than a sequential model of Matcher instances.

You write that your approach provides "seamless transition from the old approach to the new approach" but really the new approach would not support properly registering the existing set of analyzers; the application does not break only because you copy the static ordering for DEFAULT_PLUGINS.

For supporting @tarzanek's desire for a PDF plug-in, only a plug-in analyzer that keys off of file extension would succeed reliably. Any implemented Matcher of the PDF magic number would be appended as the last of the sequence and never match PDFs because of the order-precedence of PlainAnalyzerFactory.


For another example of how misinterpreting OpenGrok's supported analyzers as fully independent does not accomodate extensibility as more analyzers are incorporated: to fully support Objective-C, it is not enough to add an Objective-C analyzer and factory. CAnalyzerFactory must be revised not to dominate for every .h file and to cooperate to pick the better analyzer for an .h file.


This patch is lacking any approach to modify AnalyzerGuru getVersionNo() to reflect plugged-in analyzers, and so it breaks the existing support for avoiding the need for full re-indexing when analyzer selection logic is altered by the user or when the user deploys a new version.


@vladak in #2588 directs that with an analyzer plug-in framework it "should be possible to add options to Ctags," which I agree is a necessary part of a pull request named "Analyzer plug-in framework" that is not just a trivial class loader.


Please may I head off any suggestion that this patch is but an introductory stage of an analyzer framework by remarking that this patch in its form is just a trivial jar/class loader [unlike your nice, stack-preserving AuthorizationFramework] and does not tackle any API revision needed for more complete support of an analyzer plug-in (e.g. the aforementioned Ctags options or Matcher order or versioning).

Perhaps we should work to revise the order-dependent Matcher stages instead to be numerically scoring so that they can be executed in any order and then ranked.

Overall I think some effort should be made to tackle some of the API improvements before adding jar/class loading. Otherwise we may find ourselves in a difficult architectural position with permanent, limited support of meaningful plug-ins.


any change in analyzer guru makes this hard to rebase correctly

You bring up the difficulty of rebasing large branches, which vexes me too. May I ask why you casually reordered a number of methods in AnalyzerGuru making for harder review of this patch and also for difficult merges for others? It's inconsiderate in my opinion.


BTW there are a couple of regressions and a log message error in this patch which I did not line-comment because of high-level concerns which I wanted to bring up first.


In my opinion that IDEA style of spaced-out JavaDoc should not be accepted by us.


Lastly I am reluctantly forced to name you as a serial offender in lifting code to different files without complying with CDDL's requirements for maintaining in the new files the copyright and descriptive text giving attribution for the original source. AnalyzerGuru's code is moved without attribution to AnalyzersInfo and AnalyzerFramework.

@tulinkry
Copy link
Contributor Author

Please may I head off any suggestion that this patch is but an introductory stage of an analyzer framework by remarking that this patch in its form is just a trivial jar/class loader [unlike your nice, stack-preserving AuthorizationFramework] and does not tackle any API revision needed for more complete support of an analyzer plug-in (e.g. the aforementioned Ctags options or Matcher order or versioning).

Perhaps we should work to revise the order-dependent Matcher stages instead to be numerically scoring so that they can be executed in any order and then ranked.

Overall I think some effort should be made to tackle some of the API improvements before adding jar/class loading. Otherwise we may find ourselves in a difficult architectural position with permanent, limited support of meaningful plug-ins.

This is exactly what this is meant to be. Transition from the old state to the new state with as minimal number of changes so it can be reviewed easily rather than fat PR with mixed refactoring and features.

@idodeclare
Copy link
Contributor

Yes I thought you might discount everything I brought up in my review by claiming a jar/class loader as a meaningful initial stage when the forgone API changes are more significant in my opinion.

As I wrote, "I think some effort should be made to tackle some of the API improvements before adding jar/class loading."

@tulinkry
Copy link
Contributor Author

I do like this patch's refactoring the static AnalyzerGuru -- though in some files you nicely added a local variable for accessing the environment's instance while in others you repeat over-and-over syntax like cfg.getEnv().getAnalyzerGuru() which to me is not as nice.

Will change, why not use an inline comment?

This patch is lacking any approach to modify AnalyzerGuru getVersionNo() to reflect plugged-in analyzers, and so it breaks the existing support for avoiding the need for full re-indexing when analyzer selection logic is altered by the user or when the user deploys a new version.

No problem I'll change it, why not use an inline comment?

@vladak in #2588 directs that with an analyzer plug-in framework it "should be possible to add options to Ctags," which I agree is a necessary part of a pull request named "Analyzer plug-in framework" that is not just a trivial class loader.

As it's in the description it only approaches the plugin framework - only the loading part. This is not to be meant as a complete solution rather a first step forward.

You bring up the difficulty of rebasing large branches, which vexes me too. May I ask why you casually reordered a number of methods in AnalyzerGuru making for harder review of this patch and also for difficult merges for others? It's inconsiderate in my opinion.

IDEA -> refactor –> move. I can walk though it again by myself.

BTW there are a couple of regressions and a log message error in this patch which I did not line-comment because of high-level concerns which I wanted to bring up first.

I'd like to know which ones as this does not really bring much new functionality nor changes any order for the analyzers.

In my opinion that IDEA style of spaced-out JavaDoc should not be accepted by us.

In my opinion it does not matter. If this is a consensus I can conform to it.

Lastly I am reluctantly forced to name you as a serial offender in lifting code to different files without complying with CDDL's requirements for maintaining in the new files the copyright and descriptive text giving attribution for the original source. AnalyzerGuru's code is moved without attribution to AnalyzersInfo and AnalyzerFramework.

That's why we have reviews. Why not using inline comments for affected files?

@tulinkry
Copy link
Contributor Author

For another example of how misinterpreting OpenGrok's supported analyzers as fully independent does not accomodate extensibility as more analyzers are incorporated: to fully support Objective-C, it is not enough to add an Objective-C analyzer and factory. CAnalyzerFactory must be revised not to dominate for every .h file and to cooperate to pick the better analyzer for an .h file.

This is not supported now and should be tackled when there is a need for it in my opinion.

@idodeclare
Copy link
Contributor

"why not use an inline comment"

When it's a broad problem, I put it in a top-level review which I don't think should affect your ability as the patch owner to correlate to the source code.

"This is not supported now and should be tackled when there is a need for it in my opinion"

I have raised pull requests related to this and have existing customizations for the same. If we're supporting a plug-in model where folks can customize OpenGrok without sharing their modifications, then my existing use cases are valid too.

@tulinkry
Copy link
Contributor Author

So as I understood the main concern is that we go from order-dependent approach to order-indepenent approach and the new loaded analyzers wouldn't quite work with that. I'll look into that.

@tulinkry
Copy link
Contributor Author

If we're supporting a plug-in model where folks can customize OpenGrok without sharing their modifications, then my existing use cases are valid too.

If you or folks already customize their OpenGrok analyzers then I don't think a plug-in model should affect your or folks's ability to customize it in any way further.

@idodeclare
Copy link
Contributor

@vladak asked, "I wonder how does this interacts with analysis performed from within the webapp."

The webapp can use AnalyzerGuru to re-analyze e.g. an historical version so the question is whether all customizations that happens while indexing are (properly) applicable when run from the webapp (including your changes).

@tulinkry
Copy link
Contributor Author

tulinkry commented Feb 24, 2019

The webapp can use AnalyzerGuru to re-analyze e.g. an historical version so the question is whether all customizations that happens while indexing are (properly) applicable when run from the webapp (including your changes).

As the webapp loads the same plugins, I wouldn't expect any differencies, however, will verify it.

*/

/*
* Copyright (c) 2019 Oracle and/or its affiliates. All rights reserved.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems derived from AnalyzerGuru so missing attribution.

ver |= (long)customizationHashCode << 32;
final int customizations = Objects.hash(this.pluginFramework.getAnalyzersInfo().customizations.toArray());
if (customizations != 0) {
ver |= (long) customizations << 32;
}
return ver;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this value should be plug-in-aware

* @throws ForbiddenSymlinkException if symbolic-link checking encounters
* an ineligible link
*/
public void populateDocument(Document doc, File file, String path,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why relocate?

* @throws java.io.IOException If an error occurs while accessing the data
* in the input stream.
*/
public AbstractAnalyzer getAnalyzer(InputStream in, String file) throws IOException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why relocate?

* @throws java.io.IOException If an error occurs while accessing the input
* stream.
*/
public String getContentType(InputStream in, String file) throws IOException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why relocate?

* @param project Project the file belongs to
* @throws java.io.IOException If an error occurs while creating the output
*/
public static void writeXref(AnalyzerFactory factory, Reader in,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why relocate?

* @param project project the file belongs to
* @throws java.io.IOException if an error occurs while creating the output
*/
public static void writeDumpedXref(String contextPath,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why relocate?

* @param factory the factory
* @see AnalyzerFramework#addPrefix(String, AnalyzerFactory)
*/
public void addPrefix(String prefix, AnalyzerFactory factory) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why relocate?

* @param factory the factory
* @see AnalyzerFramework#addExtension(String, AnalyzerFactory) (String, AnalyzerFactory)
*/
public void addExtension(String extension, AnalyzerFactory factory) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why relocate?

*/

/*
* Copyright (c) 2019 Oracle and/or its affiliates. All rights reserved.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing attributions?

@@ -149,8 +149,8 @@ public static void main(String[] argv) {
PrintStream helpStream = status != 0 ? System.err : System.out;
helpStream.println(helpUsage);
if (helpDetailed) {
helpStream.println(AnalyzerGuruHelp.getUsage());
helpStream.println(ConfigurationHelp.getSamples());
System.err.println(AnalyzerGuruHelp.getUsage(RuntimeEnvironment.getInstance().getAnalyzerGuru()));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

regression

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right

helpStream.println(AnalyzerGuruHelp.getUsage());
helpStream.println(ConfigurationHelp.getSamples());
System.err.println(AnalyzerGuruHelp.getUsage(RuntimeEnvironment.getInstance().getAnalyzerGuru()));
System.err.println(ConfigurationHelp.getSamples());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

regression

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right

@@ -216,7 +216,8 @@ private PluginType loadClass(String classname) throws ClassNotFoundException,

// check for implemented interfaces or extended superclasses
for (Class intf1 : getSuperclassesAndInterfaces(c)) {
if (intf1.getCanonicalName().equals(classType.getCanonicalName())
if (intf1.getCanonicalName() != null && // anonymous classes
intf1.getCanonicalName().equals(classType.getCanonicalName())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See line 226 below which logs re IAuthorizationPlugin that is no longer applicable for all PluginFramework subclasses

@ahornace ahornace modified the milestones: 1.2, 1.4 Aug 5, 2019
@vladak vladak mentioned this pull request Nov 18, 2019
@tarzanek
Copy link
Contributor

ordering of analyzers is important to fix (note to this issue from Krystof)

@tulinkry
Copy link
Contributor Author

I've thought about possible solutions:

  • topological sort based on hardcoded analyzer dependencies (other analyzers full name) - robust, but can require more configuration
  • inserting new analyzers to a specified index (kinda error prone when the array gets bigger)
  • scoring from the matcher for all analyzers, and then sorting based on score (I wasn't going deep in this)

@vladak vladak marked this pull request as draft November 9, 2020 10:24
@vladak
Copy link
Member

vladak commented Nov 9, 2020

Looks like this needs some work (and rebase), converting to draft.

@vladak vladak removed this from the 1.4 milestone Apr 9, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

static initializer of AnalyzerGuru causing hard to detect deployment problems
6 participants