Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Import-DbaCsv, map correct types for BulkCopy #9479

Merged
merged 11 commits into from
Oct 5, 2024

Conversation

niphlod
Copy link
Contributor

@niphlod niphlod commented Sep 26, 2024

Type of Change

  • Bug fix (non-breaking change, fixes DbaCsv needs support for GUID target columns #9433 and others )
  • New feature (non-breaking change, adds functionality, fixes # )
  • Breaking change (affects multiple commands or functionality, fixes # )
  • Ran manual Pester test and has passed (.\tests\manual.pester.ps1)
  • Adding code coverage to existing functionality
  • Pester test is included
  • If new file reference added for test, has is been added to github.com/dataplat/appveyor-lab ?
  • Unit test is included
  • Documentation
  • Build system

Purpose

Enlarge the usecase of Import-DbaCsv.
Right now it's either going to never fail (creating a table of all nvarchar(max)) or it will as soon as the table has some not-that-weird types (bit or guid, for example).
This should also greatly speed-up loads to existing tables without users needing to "stage" to a nvarchar(max) table and then copying over to the destination table.
I'm guessing it also helps without even counting the "stage-then-move" bit: loading a pre-existing table with the correct types should be faster than loading to a nvarchar(max) table.

Approach

It's long, and windy, but I made the code redundant and verbose (hoperfully readable and obvious as well) rather than short and speedy. My 2c here is that it's a bit of code that is rather convoluted and it's better to be explicit more than anything. Cost (as in milliseconds spent) on adding the proper machinery is minimal if you take into account loading even just 1k rows.

I'm unsure about approaching the "get the types from the table" using SMO (like this version) or if it'd be better to just use a quicker query going directly to system tables to get "name, datatype" resultset and use that for the mapping. Pro of that is that we don't open another connection, because at that point of the function one has already been established and a transaction is pending, so re-using $server for Get-DbaDbTable ends up in errors.

There's another bit that's missing which is enabling this on a csv that has no headers... not sure if it's needed by users but if so the code gets longer because we can't let lumen do "quite a bit of work" (the $reader.GetFieldHeaders() call)

That being said, @potatoqualitee is the original creator and MAY have an opinionated view on how to tackle that.
@petervandivier : I got it working for most of my tables, but you may want to give it a whirl and maybe test extensively with any type you can throw at it.

Commands to test

The one on the report is good, short and simple

Invoke-DbaQuery `
    -SqlInstance localhost `
    -Database tempdb `
    -Query "create table foo ([Guid] uniqueidentifier);"

New-Guid | Export-Csv ./foo.csv 

Import-DbaCsv `
    -SqlInstance localhost `
    -Database tempdb `
    -Table foo `
    -Path ./foo.csv `
    -SingleColumn

NB: this is a first PROPOSAL. If this approach is "accepted" I'm going to add a few tests for it in new commits.

$tableDef = Get-DbaDbTable $instance -SqlCredential $SqlCredential -Database $Database -Table $table -Schema $schema

if ($tableDef.Count -ne 1) {
Stop-Function -Message "Could not fetch table definition for table $table in schema $schema" -ErrorRecord $_
Copy link
Contributor Author

@niphlod niphlod Sep 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm unsure if this is the right thing to do here ... at this point I don't see any branches in code that could allow reaching here and not having a table present, but ... "what if"?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd leave it cuz we encounter "what if" more than we'd expect.

@potatoqualitee
Copy link
Member

My only opinion is that it's wonderful to have this addressed 😁 Thank you! I'll re-run whatever is failing to see if we can get it all green.

@potatoqualitee
Copy link
Member

Oh, oops. it's a failure on Import-DbaCsv, @niphlod

@niphlod
Copy link
Contributor Author

niphlod commented Sep 27, 2024

Oh, oops. it's a failure on Import-DbaCsv, @niphlod

yeah, I know. did need some sleep yesterday and hadn't time to look into it.
If you like the implementation I can go ahead and iron out bits and pieces here and there to make it work as it is, and I'll try to do that even when headers are missing

@potatoqualitee
Copy link
Member

Yes! Thank you 🙏🏼

@potatoqualitee
Copy link
Member

This is so good, @niphlod ! thank you. WIll merge when andreas approves

Copy link
Contributor

@andreasjordan andreasjordan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. added some minor suggestions.

# we do not use $server because the connection is active here
$tableDef = Get-TableDefinitionFromInfoSchema -table $table -schema $schema -sqlconn $sqlconn
if ($tableDef.Length -eq 0) {
Stop-Function -Message "Could not fetch table definition for table $table in schema $schema" -ErrorRecord $_
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think -ErrorRecord $_ should only be used in a catch block. Here we don't have an error.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I knew I was missing something

# start by getting the table definition
$tableDef = Get-TableDefinitionFromInfoSchema -table $table -schema $schema -sqlconn $sqlconn
if ($tableDef.Length -eq 0) {
Stop-Function -Message "Could not fetch table definition for table $table in schema $schema" -ErrorRecord $_
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove -ErrorRecord $_

@@ -663,8 +830,8 @@ function Import-DbaCsv {
$bulkCopy.Add_SqlRowsCopied( {
$script:totalRowsCopied += (Get-AdjustedTotalRowsCopied -ReportedRowsCopied $args[1].RowsCopied -PreviousRowsCopied $script:prevRowsCopied).NewRowCountAdded

$tstamp = $(Get-Date -Format 'yyyyMMddHHmmss')
Write-Message -Level Verbose -Message "[$tstamp] The bulk copy library reported RowsCopied = $($args[1].RowsCopied). The previous RowsCopied = $($script:prevRowsCopied). The adjusted total rows copied = $($script:totalRowsCopied)"
#Write-Message -Level Verbose -FunctionName "Import-DbaCsv" -Message " The bulk copy library reported RowsCopied = $($args[1].RowsCopied). The previous RowsCopied = $($script:prevRowsCopied). The adjusted total rows copied = $($script:totalRowsCopied)"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be removed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep. BTW, found instances of the same logging which results in a "bad-looking" message so once this goes in I'll replace other occurrences through our functions as well

public/Import-DbaCsv.ps1 Outdated Show resolved Hide resolved
$sqlconn
)

$query = "SELECT c.COLUMN_NAME, c.DATA_TYPE, c.ORDINAL_POSITION - 1 FROM INFORMATION_SCHEMA.COLUMNS AS c WHERE TABLE_SCHEMA = @schema AND TABLE_NAME = @table;"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any chance this will ever get math error on the ordinal position?

The ExecuteReader() should be in a try/catch just in case it can/does.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ORDINAL_POSITION always starts as 1. I'll wrap executereader in a bit

@potatoqualitee
Copy link
Member

hell yeah, thanks everyone! 🔥

@potatoqualitee potatoqualitee merged commit 6025137 into dataplat:development Oct 5, 2024
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

DbaCsv needs support for GUID target columns
4 participants