Sometimes it can be good to look at lots of data even when you only want to understand just a little. If you look only at the 55-year-old widower and his 20-year-old son in the 1841 census of the UK, you wouldn’t think twice about the ages. Look beyond the family that interests you and you would notice that the ages reported for adults usually end in a “5” or a “0.” That should seem strange, but it was what the enumerators were told to do. If you didn’t know that, it would have paid to notice the pattern in the census itself. I’ve notice that pattern to reported ages in other places as well. Years ago, I created a decent sized spreadsheet to show that the age that would seem to suggest that a man was not my ancestor, was actually consistent with him being the right man, given the way that other ages were reported. People seem to like to round to the nearest multiple of 5. At some times and in some places that tendency to simply be in the right ballpark seems strong. Only by looking at a lot of data can one tell.
The other day I was looking at immigration years in the U.S. Federal census. That information is known to be less than reliable, but this seemed strange. The years seemed to be almost all even. Some years ended in “5,” a few ended in “1”, “3”, “7”, or “9” but many more were even. That tells you something about the accuracy of the information. Numbers of immigrants that should have smoothly changed from year to year, sometimes increasing over the years, sometimes decreasing, instead went up and down every other year. They showed a tendency to be even, and that is, in fact, odd. Why would that be? People don’t report what is true, they report what they remember. We hope that those two things are the same, but we need to acknowledge that they often differ. When someone is asked about the year, several decades earlier, when their spouse immigrated, can we blame them if some bias toward easy to remember numbers creeps in?