Ok, going to write a book, and as always just my opinion after years of dealing with stats for small web sites to really large corporate sites.
Something to consider is that at least in the Windows world most of the “system/internet protection” tools, like Norton, and others, have features turned on, at times by default, for privacy. So they will block things like 3rd party cookies, so there went Google Analytics cookie being set to track the user properly. Many also flush cookies each time your browser is closed, so this will throw off first time vs. repeat visitors, all of those users will almost always show up a “new” visitors.
Generic log analysis tools, without the aid of javascript components, will track a “session” which is the visitation of a single IP address over a defined period of time. Most default to 20 minutes. Some do this based on IP + browser type & version/build # to get more accuracy even with multiple people on the IP. So if a user visits from an IP address, wanders your site for 5 minutes, an hour later someone else comes to visit from that same IP address they are now a new session, or even at the same time, if the log tool processes IP + browser agent info. This works well too if like me, your view has a static IP addresses at my home office and at work. If I visit your site at 9am, 10:30am, 1pm, 3pm, and 9pm your logs will show that, and your stats package will track that as 5 separate visitations/sessions. If its still in my browser cache, in most cases, the browser is going to hit the server to check to see if the page as been modified since last visit, tripping the fact that I hit the site, and will chalk up another “session”.
To touch on the part about combining logs if you have your site on multiple servers. If you have your site hosted on load balanced servers, or serve dynamic content from one server and all static from another, any decent log analysis tool will process the logs and collate the data based on date/time and the fact it was separate logs, so will not duplicate data, but rather combined data without the need of self “merging” log files.
There simply is not a perfect solution, you can not get a 100% (not even close) picture of who or what is visiting your site. Log files have weaknesses in tracking “true” individual visitors, but have a good work around to track general sessions based on IP address, time limits, and possibly browser agent string uniqueness. Javascript + cookies can both be blocked completely by some personal security software so they will not show up at all, or as conflicting/skewed data. Using them together presents different challenges because which do you trust more, the data directly from the logs if the javascript/cookies was not accepted, or do you trust the cookie/javascript data if the IP address is the same, though the cookie could be being reset each time, and a person that reads your site 10 times a day is showing up as 10 “new” visitors, rather than 1 “repeat” visitor.
The only thing I recommend, and have explained time and again, is select an option, and stick to it. You should be tracking trends, ie: more traffic, less traffic, time of day, certain types of articles or content. IF you get stuck on the actual accuracy of the “numbers” you are missing the point of long term statistical analysis. But as long as you are tracking your overall trends in a consistent manner you will have a better understanding of your site, your traffic, and your visitors likes and dislikes.
If you are going to do both logs and something like Mint/Google/etc, don’t compare the two directly, rather track trends within each tool separately. They are always going to provide different data from one another as they track data differently. Can’t compare Apples and Oranges. Even comparing say Apples to Apples has to be done with the same log processing tool, as each log processing application is going to track what a “session” is differently almost 100% of the time.
Watch the overall trends, and focus more on your sites content, rather than focusing massive effort to get ‘accurate’ visitor information, and wasting that time when you could be providing more or improved content for your site.