Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

Unstable hash values for k>64 #41

Copy link
Copy link
Closed
@shenwei356

Description

@shenwei356
Issue body actions

Hi, thanks for inventing this great algorithm! And glad to see ntHash2 is also published.

I've used ntHash in several projects of mine, specifically, using a Golang implementation (by @will-rowe) of ntHash v1.0.4. It's really fast!

I just find that the hash values are unstable for k > 64. I guess it's related to the algorithm itself, cause it happens to be 64, not another number. I am not familiar with C/C++, so I can't check the original implementation. Here are steps with tools using the Golang implementation to reproduce the issue.

$ echo -ne ">s\nACGAAGAATACACAACTATGTACCGGGGGGCTTTGGGGAGAAAAAGGAAAAAATAAAATCTTTAA\n" | seqkit stats
file  format  type  num_seqs  sum_len  min_len  avg_len  max_len
-     FASTA   DNA          1       65       65       65       65

# compute canonical ntHash with https://github.com/shenwei356/unikmer
$ echo -ne ">s\nACGAAGAATACACAACTATGTACCGGGGGGCTTTGGGGAGAAAAAGGAAAAAATAAAATCTTTAA\n" \
    | unikmer count -H -K -k 65 -l | unikmer view
14872199115326727818  ****

# now we add one base in the start position.
$ echo -ne ">s\naACGAAGAATACACAACTATGTACCGGGGGGCTTTGGGGAGAAAAAGGAAAAAATAAAATCTTTAA\n" \
    | unikmer count -H -K -k 65 -l | unikmer view
5042997269030491403
13219011773654478434  ****

$ echo -ne ">s\ncACGAAGAATACACAACTATGTACCGGGGGGCTTTGGGGAGAAAAAGGAAAAAATAAAATCTTTAA\n" \
    | unikmer count -H -K -k 65 -l | unikmer view
5252439041897790003
1248486797404628174  ****

$ echo -ne ">s\ngACGAAGAATACACAACTATGTACCGGGGGGCTTTGGGGAGAAAAAGGAAAAAATAAAATCTTTAA\n" \
    | unikmer count -H -K -k 65 -l | unikmer view
6432712771380638299
10232415768797241538  ****

$ echo -ne ">s\ntACGAAGAATACACAACTATGTACCGGGGGGCTTTGGGGAGAAAAAGGAAAAAATAAAATCTTTAA\n" \
    | unikmer count -H -K -k 65 -l | unikmer view
5774425108696765737
11299031900349869606  ****

While for k=64, it's stable (7718595180140858881):

$ echo -ne ">s\nCGAAGAATACACAACTATGTACCGGGGGGCTTTGGGGAGAAAAAGGAAAAAATAAAATCTTTAA\n"  \
    | unikmer count -H -K -k 64 -l | unikmer view
7718595180140858881

$ echo -ne ">s\naCGAAGAATACACAACTATGTACCGGGGGGCTTTGGGGAGAAAAAGGAAAAAATAAAATCTTTAA\n"  \
    | unikmer count -H -K -k 64 -l | unikmer view
8752650216170443135
7718595180140858881

$ echo -ne ">s\ncCGAAGAATACACAACTATGTACCGGGGGGCTTTGGGGAGAAAAAGGAAAAAATAAAATCTTTAA\n"  \
    | unikmer count -H -K -k 64 -l | unikmer view
9222164657016850147
7718595180140858881

$ echo -ne ">s\ntCGAAGAATACACAACTATGTACCGGGGGGCTTTGGGGAGAAAAAGGAAAAAATAAAATCTTTAA\n"  \
    | unikmer count -H -K -k 64 -l | unikmer view
8329673441993552238
7718595180140858881

echo -ne ">s\ntCGAAGAATACACAACTATGTACCGGGGGGCTTTGGGGAGAAAAAGGAAAAAATAAAATCTTTAA\n"   \
    | unikmer count -H -K -k 64 -l | unikmer view
8329673441993552238
7718595180140858881

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      Morty Proxy This is a proxified and sanitized view of the page, visit original site.