<div dir="ltr"><div class="gmail_extra"><div class="gmail_quote">On Sat, Apr 30, 2016 at 9:00 AM, Henry Baker <span dir="ltr"><<a href="mailto:hbaker1@pipeline.com" target="_blank">hbaker1@pipeline.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">I just run Linux's 'sha1sum' on a number of very large files, and the calculation took significantly longer than I expected.<br>

<br>

'sha1sum' is only modestly faster on a very large file than copying the file.<br></blockquote><div> ....</div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

1) the cpu meter wasn't pinned at 100%; and<br>

2) multiple cores weren't being fully utilized.</blockquote><div><br></div><div>This sounds like disk I/O congestion. <br>To speed it up look to mmap() of the input file</div><div>and bypass the file system block cache. <br><br>Tinker with files of increasing size and see where the edges</div><div>are.  Simple benchmark sets like lmbench, dd and bonnie </div><div>are worth a look.</div><div><br></div><div>trace or ptrace can let you see how the file is opened as well</div><div>as a source code inspection. <br><br>Another trick is to run a program that uses calloc() calls to claim</div><div>and use memory allowing the system to read the input files</div><div>into free pages of memory.  Not too much memory but a lot.</div></div><div class="gmail_extra"><br></div>Once disk I/O dominates the choice of hash or sum is no longer</div><div class="gmail_extra">a worry.   Fast SSD devices move the bar a lot and will be the </div><div class="gmail_extra">most common future first layer storage.</div><div class="gmail_extra">  <br><br clear="all"><div><br></div>-- <br><div class="gmail_signature"><div dir="ltr">  T o m    M i t c h e l l</div></div>

</div></div>