Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

Improved cursor comparison - only showing different lines #527

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 18 commits into from
Jan 5, 2018

Conversation

pesse
Copy link
Member

@pesse pesse commented Nov 29, 2017

At least partly fixes #454

@@ -0,0 +1,4 @@
create global temporary table ut_cursor_data_diff(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comments in tables are allowed but no empty lines.

Copy link
Member

@Pazus Pazus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Except a small refactor suggestion everything is ok.

where nvl(dbms_lob.compare(xmlserialize( content exp.row_data no indent), xmlserialize( content act.row_data no indent)),1) != 0
and rownum <= 1;
where nvl(dbms_lob.compare(xmlserialize( content exp.row_data no indent), xmlserialize( content act.row_data no indent)),1) != 0;
select count(1) into l_result from ut_cursor_data_comp where rownum <= 1;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use SQL%ROWCOUNT instead of selecting rows number after select.

@jgebal
Copy link
Member

jgebal commented Nov 29, 2017

Can you add tests, so are covered for regression?

  • Succeeds when cursors are same.
  • Succeeds bot when cursors are empty.
  • Fails and returns rows from actual that are not in expected
  • Fails and returns rows from expected that are not in actual

any others?

@jgebal
Copy link
Member

jgebal commented Nov 29, 2017

My concern is that there is no limit on the data returned.
If the cursor will contain 100 cols and 10M rows, it will cause some serious issues.

@Pazus
Copy link
Member

Pazus commented Nov 29, 2017

Good point, Jacek. We still need a limit...

from (select case when l_xpath is not null then deletexml( ucd.row_data, l_xpath ) else ucd.row_data end as row_data,
ucd.row_no
from ut_cursor_data ucd where ucd.cursor_data_guid = self.data_value) exp
full outer join (select case when l_xpath is not null then deletexml( ucd.row_data, l_xpath ) else ucd.row_data end as row_data,
ucd.row_no
from ut_cursor_data ucd where ucd.cursor_data_guid = l_other.data_value) act
on (exp.row_no = act.row_no)
where nvl(dbms_lob.compare(xmlserialize( content exp.row_data no indent), xmlserialize( content act.row_data no indent)),1) != 0
and rownum <= 1;
where nvl(dbms_lob.compare(xmlserialize( content exp.row_data no indent), xmlserialize( content act.row_data no indent)),1) != 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would add limit here (in the insert statement), so that we only process first 50 - 100 differences.
Why 50 - 100?
10 seems not enough, but anything more than 50 - 100 doesn't bring much value anyway.
If you have so may differences, you're in serious trouble anyway.

If first 100 rows doesn't help you identify what the problem is - your test is not distilled properly and you're testing all in one go probably.

Copy link
Member

@jgebal jgebal Dec 1, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Second thought on this idea.
Compare all rows.
Remember toe count of:

  • rows in A
  • rows in B
  • differences
    Report first x differences (having x 50 .. 100)
    Report also number of rows in A, number of rows in B and total number of differences.

That will give much more information about failed expectations,

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had something similar in mind myself - would definitely help to output exact number of mismatches because we got them anyways...

@mathewbutler
Copy link

I said a long time ago that I’d look at an alternative implementation. The best I’ve done is to put together a simple SQL implementation of a compare. The below does two full table scans to identify differences. It’s an asktom special - forget the name of the person on the three that worked through the problem to come up with this.

select c1, c2, c3,
2 count(src1) CNT1,
3 count(src2) CNT2
4 from
5 ( select a.,
6 1 src1,
7 to_number(null) src2
8 from a
9 union all
10 select b.
,
11 to_number(null) src1,
12 2 src2
13 from b
14 )
15 group by c1,c2,c3
16 having count(src1) <> count(src2)

My idea was to use this and dynamically produce the query. With testing to see if special handling needed for all data types.

Laptop times for comparing two 1M tables were .8 seconds and low resource usage.

Leaving this here, as above forms the basis of a POC. We said that we wouldn’t change the implementation unless there were clear benefits or new requirements.

Cheers.

@jgebal
Copy link
Member

jgebal commented Dec 1, 2017

@mathewbutler Agreed.
We can always do this for simple datatypes, on top of what we do.

That is, if not clob, blob, xmltype etc - use plain SQL compare, else do to XML and compare CLOBs.

For now, we definitely want to improve reporting side, so that we show rows that differ instead of first 'x' rows from data-set.
I think it's a nice step forward.

@pesse
Copy link
Member Author

pesse commented Dec 1, 2017

Seems the changes broke the exclude-possibility of columns. Might take me some time to get it sorted out.

@jgebal jgebal merged commit 2137439 into develop Jan 5, 2018
@jgebal jgebal deleted the feature/improve_cursor_comparison branch January 5, 2018 19:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add ability to report only the rows that are different
4 participants
Morty Proxy This is a proxified and sanitized view of the page, visit original site.