Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

%collect line magic for using inside loops #432

Open
wants to merge 12 commits into
base: master
Choose a base branch
from

Conversation

edoardovivo
Copy link

This should solve #425.
It is probably just a working starting point. I haven't modified any test yet.
I have also added a notebook with some sample code.

Looking forward to your feedback. Cheers!

@aggFTW
Copy link
Contributor

aggFTW commented Jan 25, 2018

Hey, I recently changed jobs and need to get cleared to work on OS projects. Will review soon I hope 😄

Copy link
Contributor

@aggFTW aggFTW left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM overall 👍 thanks so much for doing this!

@argument("-e", "--coerce", type=str, default=None, help="Whether to automatically coerce the types (default, pass True if being explicit) "
"of the dataframe or not (pass False)")
@handle_expected_exceptions
def collect(self, line, local_ns=None):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we call it spark_collect?

args = parse_argstring_or_throw(self.spark, line)
coerce = get_coerce_value(args.coerce)
if (len(args.command) > 0):
command = " ".join(args.command)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please follow the 4-space indentation convention?

def collect(self, line, local_ns=None):
args = parse_argstring_or_throw(self.spark, line)
coerce = get_coerce_value(args.coerce)
if (len(args.command) > 0):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When would args.command be > 0? Sorry if I'm missing something super obvious.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I understand, args.command is the part of the line that does not contain arguments. This part can be empty (for instance if I do %spark_collect -o my_dataframe), or not (%spark_collect -c sql -o my_df SELECT * from my_table).

If it is empty, the join command would throw an error, so this is a way to get around it, but maybe there's a better way...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused... where is args.command coming from? I thought that for you to be able to do that, you needed to define an @argument called command, and I don't see it there. Am I missing something?

else:
self.ipython_display.send_error("Context '{}' not found".format(args.context))


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if (len(args.command) > 0):
command = " ".join(args.command)
else:
command = ""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a new line after every code block, like after this if/else code block.

else:
command = ""
if args.context == CONTEXT_NAME_SPARK:
return self.execute_spark(command, args.output, args.samplemethod, args.maxrows, args.samplefraction, None, coerce)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a new line after every return call in this method

@@ -0,0 +1,329 @@
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file is amazing! Way to go!

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! :)

@edoardovivo
Copy link
Author

Hey, I implemented all the suggested changes, and wrote some unit tests too.I know realize I have to change the sample notebook too, I'll do it ASAP.

Let me know how it looks. Thanks!

@@ -548,3 +548,290 @@ def test_logs_exception():
assert result is None
ipython_display.send_error.assert_called_once_with(EXPECTED_ERROR_MSG
.format(get_logs_method.side_effect))


# New Tests for the spark_collect line magic
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to have this comment :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So nice to see tests! Thanks!


# New Tests for the spark_collect line magic

#@with_setup(_setup, _teardown)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this commented out?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At first it was a mistake.. but actually I don't think this test is necessary in this case.. The analogous test for the %spark magic is meant to check for the "subcommand" parameter, which does not exist in this case. If anyone writes a bad command in %spark_collect line, it would always throw an error. What do you think?

spark_controller.run_command = run_cell_method

name = None
line = ""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This variable is never used in tests. Please remove in all tests where it's not needed.

method_name = "sample"
commandline = "command"
line = " ".join([context, context_name, meth, method_name, commandline])

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove extra white space

commandline = "command"
line = " ".join([context, context_name, meth, method_name,
output, output_var, coer, coerce_value, commandline])

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove extra white space

run_cell_method = MagicMock()
run_cell_method.return_value = (True, "")
spark_controller.run_sqlquery = run_cell_method

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove extra new line

def collect(self, line, local_ns=None):
args = parse_argstring_or_throw(self.spark, line)
coerce = get_coerce_value(args.coerce)
if (len(args.command) > 0):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused... where is args.command coming from? I thought that for you to be able to do that, you needed to define an @argument called command, and I don't see it there. Am I missing something?

@edoardovivo
Copy link
Author

Don't understand why it's failing...

@Tagar
Copy link

Tagar commented Jan 23, 2019

image

it seems to be tornadoweb/tornado#2331

although it was closed as environment specific issue..

Not sure if retesting this may help to pass the test.

@Tagar
Copy link

Tagar commented Jul 22, 2019

@itamarst perhaps that test is not broken against master?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants