0

Based on this question, I try to find duplicates in my generated hash list. These could be there, because I accidentally used the same X & Y values or because the first three digits of the hash's are equal. In this case, I want to give a compiler error like this \errmessage{Hash \tempHash already used!}.

Everything that is inside the \calcHash function seems to be very fragile and I don't know, how to add functionality without destroying the hash calculation.

In the following MWE I tried to find duplicates via a datatool DataBase. But this is not mandatory, when there is another easy possibility to search for duplicates, I'm absolutely fine with :)

\documentclass[10pt,a4paper]{article}
\usepackage{pgfplotstable}
\usepackage{xstring}
\usepackage{datatool}
\pgfplotsset{compat=newest}

\DTLnewdb{hashDB} \newcommand{\calcHash}[1]{ \noexpand\StrLeft{\pdfmdfivesum{#1}}{3} \newcommand{\tempHash}{\StrLeft{\pdfmdfivesum{#1}}{3}} %\DTLifdbempty{hashDB} %{ %\DTLnewrow{hashDB} %\DTLnewdbentry{hashDB}{Hash}{\tempHash} %} %\DTLforeach{hashDB}{\hash=hash} %{ %\ifthenelse{\equal{\tempHash}{\hash}} %{ % \errmessage{Hash \tempHash already used!} %}{} %} }

\pgfplotstableread[]{ X Y 1 a 2 b 5 c }\mydata

\begin{document}

\pgfplotstablecreatecol[ create col/assign/.code={ \edef\myHash{\noexpand\calcHash{\thisrow{X}\thisrow{Y}}} \pgfkeyslet{/pgfplots/table/create col/next content}\myHash }]{ID}{\mydata} \pgfplotstablegetrowsof{\mydata} \pgfmathtruncatemacro\myDataRows{\pgfplotsretval-1}

\pgfplotstabletypeset[string type]{\mydata} \end{document}

PascalS
  • 826

1 Answers1

1

The following uses an L3 sequence and the L3 md5sum function to implement your \calcHash. Note that the \calcHash is used where it is, not stored in some other macro which is then assigned to the next content.

\documentclass[10pt,a4paper]{article}
\usepackage{pgfplotstable}
\pgfplotsset{compat=newest}

\ExplSyntaxOn \str_new:N \l__pascals_hash_str \seq_new:N \g__pascals_hashes_seq \msg_new:nnn { pascals } { duplicate-hash } { Hash~ #1~ already~ used! } \cs_generate_variant:Nn \str_set:Nn { Ne } \cs_new_protected:Npn __pascals_calc_hash:n #1 { \str_set:Ne \l__pascals_hash_str { \str_mdfive_hash:e {#1} } \seq_if_in:NVTF \g__pascals_hashes_seq \l__pascals_hash_str { \msg_error:nnV { pascals } { duplicate-hash } \l__pascals_hash_str } { \seq_gput_right:NV \g__pascals_hashes_seq \l__pascals_hash_str } \pgfkeyslet { /pgfplots/table/create~ col/next~ content } \l__pascals_hash_str } \NewDocumentCommand \clearHashes {} { \seq_gclear:N \g__pascals_hashes_seq } \NewDocumentCommand \calcHash { m } { __pascals_calc_hash:n {#1} } \ExplSyntaxOff

\pgfplotstableread[]{ X Y 1 a 2 b 5 c }\mydata

\begin{document}

\clearHashes \pgfplotstablecreatecol[ create col/assign/.code={% \calcHash{\thisrow{X}\thisrow{Y}}% }]{ID}{\mydata} \pgfplotstablegetrowsof{\mydata} \pgfmathtruncatemacro\myDataRows{\pgfplotsretval-1}

\pgfplotstabletypeset[string type]{\mydata} \end{document}


A variant that only uses the first three tokens from the resulting hash:

\documentclass[10pt,a4paper]{article}
\usepackage{pgfplotstable}
\pgfplotsset{compat=newest}

\ExplSyntaxOn \str_new:N \l__pascals_hash_str \seq_new:N \g__pascals_hashes_seq \msg_new:nnn { pascals } { duplicate-hash } { Hash~ #1~ already~ used! } \cs_generate_variant:Nn \str_set:Nn { Ne } \cs_generate_variant:Nn \str_range:nnn { e } \cs_new_protected:Npn __pascals_calc_hash:n #1 { \str_set:Ne \l__pascals_hash_str { \str_range:enn { \str_mdfive_hash:e {#1} } { 1 } { 3 } } \seq_if_in:NVTF \g__pascals_hashes_seq \l__pascals_hash_str { \msg_error:nnV { pascals } { duplicate-hash } \l__pascals_hash_str } { \seq_gput_right:NV \g__pascals_hashes_seq \l__pascals_hash_str } \pgfkeyslet { /pgfplots/table/create~ col/next~ content } \l__pascals_hash_str } \NewDocumentCommand \clearHashes {} { \seq_gclear:N \g__pascals_hashes_seq } \NewDocumentCommand \calcHash { m } { __pascals_calc_hash:n {#1} } \ExplSyntaxOff

\pgfplotstableread[]{ X Y 1 a 2 b 5 c }\mydata

\begin{document}

\clearHashes \pgfplotstablecreatecol[ create col/assign/.code={% \calcHash{\thisrow{X}\thisrow{Y}}% }]{ID}{\mydata} \pgfplotstablegetrowsof{\mydata} \pgfmathtruncatemacro\myDataRows{\pgfplotsretval-1}

\pgfplotstabletypeset[string type]{\mydata} \end{document}


Yet another variant, this one defaulting to using the full hash, but with an optional argument to only use the first n characters.

\documentclass[10pt,a4paper]{article}
\usepackage{pgfplotstable}
\pgfplotsset{compat=newest}

\ExplSyntaxOn \str_new:N \l__pascals_hash_str \seq_new:N \g__pascals_hashes_seq \msg_new:nnn { pascals } { duplicate-hash } { Hash~ #1~ already~ used! } \cs_generate_variant:Nn \str_set:Nn { Ne } \cs_generate_variant:Nn \str_range:nnn { e } \cs_new_protected:Npn __pascals_calc_hash:nn #1#2 { \str_set:Ne \l__pascals_hash_str { \str_range:enn { \str_mdfive_hash:e {#1} } { 1 } {#2} } \seq_if_in:NVTF \g__pascals_hashes_seq \l__pascals_hash_str { \msg_error:nnV { pascals } { duplicate-hash } \l__pascals_hash_str } { \seq_gput_right:NV \g__pascals_hashes_seq \l__pascals_hash_str } \pgfkeyslet { /pgfplots/table/create~ col/next~ content } \l__pascals_hash_str } \NewDocumentCommand \clearHashes {} { \seq_gclear:N \g__pascals_hashes_seq } \NewDocumentCommand \calcHash { O{-1} m } { __pascals_calc_hash:nn {#2} {#1} } \ExplSyntaxOff

\pgfplotstableread[]{ X Y 1 a 2 b 5 c }\mydata

\begin{document}

\clearHashes \pgfplotstablecreatecol[ create col/assign/.code={% \calcHash[3]{\thisrow{X}\thisrow{Y}}% }]{ID}{\mydata} \pgfplotstablegetrowsof{\mydata} \pgfmathtruncatemacro\myDataRows{\pgfplotsretval-1}

\pgfplotstabletypeset[string type]{\mydata} \end{document}

Skillmon
  • 60,462
  • Hmm this is not compiling without errors at my side (Tested with Overleaf). Right now, there should not be thrown any error, because the first three digits of \calcHash{1a}, \calcHash{2b} and \calcHash{5c} are not equal. There should be only an error, when I add again in the 4th row an 1a for example. – PascalS Mar 11 '24 at 12:44
  • @PascalS it works correctly for above's MWE, and if you add a duplicate row you'll get the error thrown. If it doesn't work for you it's because of something you don't show us. You might want to search for the error in your document and once you found the culprit post a follow up question with an updated MWE. If this throws an error on Overleaf as it's currently is, then you might want to show the error message it throws in an edit to your current question. – Skillmon Mar 11 '24 at 12:55
  • @PascalS note that depending on the LaTeX-version your Overleaf project is running, it might be that some of the used L3 functions aren't available. I've added a \cs_generate_variant:Nn for the most likely culprit, you might want to test again with the current code. If it still errs for you, please provide the error message. – Skillmon Mar 11 '24 at 12:57
  • The error was based on Overleaf's LaTeX version! In my complete project it works! Thank you! There are just two open things for me. 1. How can I use just the first three digits out of the whole Hash? Like I did it with \StrLeft in my MWE? 2. What are you using \clearHashes for? I have commented it out, without any recognizable effect. – PascalS Mar 11 '24 at 13:46
  • 1
    @PascalS if you wanted to include two different tables for which hashes are allowed to clash you can use \clearHashes for that (it clears the list of known hashes). For only the first three letters of the hash, see my edit. – Skillmon Mar 11 '24 at 14:40
  • 1
    @PascalS please note that I had a bug in the false branch of \seq_if_in:NVTF, the previous version of all three code blocks had undefined behaviour if the hash was already in the sequence, this is now fixed. – Skillmon Mar 11 '24 at 14:48
  • Okay, \clearHashes is clear now! It's good to have it in this answer, but for me it is not relevant, because I have more than one table and the clue is to check all of these tables for the same duplicates :) – PascalS Mar 11 '24 at 14:56
  • It just would be great to shorten the \hash value result to three digits... Otherwise I have to do it twice. Once in my Tikz Pictures and the second time in my Tables... – PascalS Mar 11 '24 at 14:57
  • @PascalS they are already shortened, just take the second or third code block instead of the first. – Skillmon Mar 11 '24 at 15:08
  • Amazing! Thank you! – PascalS Mar 11 '24 at 15:20
  • Locally it works really well, but when I try to compile it within my bitbucket pipeline, I get the error ! Undefined control sequence. <argument> \str_mdfive_hash:e Even with This is pdfTeX, Version 3.141592653-2.6-1.40.25 (TeX Live 2023) (preloaded format=pdflatex) & LaTeX2e <2022-11-01> patch level 1 & L3 programming layer <2023-05-05> This should be new enough, isn't it? – PascalS Mar 11 '24 at 20:44
  • I created a follow up question here :) – PascalS Mar 12 '24 at 16:29