Once again regarding the Apple response to the motion profile allegations and why Apple is right, but there is still a problem (but one that is significantly smaller than the dramatized problem in the press).
Apple produces a database with - anonymously collected, there are no indications so far that it is not anonymous - position data of iPhones with activated GPS, in which positions of networks are stored. Networks in this context are radio masts for GSM, 3G and WLANs that the iPhone sees at that time. However, this is not what is stored in the database that everyone is talking about. This is only the basis on which something is built that then ends up in the database.
The data sent to Apple is averaged internally and a "center" is determined for the networks reported by various iPhones (since the exact position of WLAN routers or radio masts is not simply provided - this must first be determined in some way). This data is stored in a large database at Apple. The position data therefore refers to the center of radio identifications. The original position data is only basic material for the determined position data.
The iPhone can now determine an approximate position via the visible radio identifications and their position information and a weighted average of the data based on transmission strength - but internet access is required for this. And internet access to the database at Apple. Therefore, the iPhone downloads the information about radio identifications and caches this locally. But of course not the entire database - that would be too much. Rather, a relevant excerpt determined by algorithms. This is now the database on the iPhone.
Apparently, Apple not only downloads the networks that the iPhone currently sees, but also neighboring networks - which makes sense, as the user moves around more often and the data from neighboring networks will be needed (potentially - the iPhone does not know in advance where I am going). Presumably, the iPhone will say "I see networks A, B, C" and the database will then provide "here are the networks A-M from the metropolitan area where you are located". The iPhone then takes X% of A, Y% of B and Z% of C as a basis and calculates a rough position and says "here I am". If it then moves into the visibility of network D, its position is already known and the iPhone can perform the position calculation directly without downloading.
In addition, the iPhone seems to store a temporal history of these downloads - presumably the developer assumed that if the user has been there before, there is a high chance that he will go there again. For this purpose, the iPhone keeps these data ready for one year. The claim by Apple that the duration of storage is a bug is certainly rather an embellishment - presumably a developer simply made up a duration and used it without considering how much would really be sensible - after all, these were not special data in his understanding. Only technical caches for downloads that he anyway makes when the user asks for his position.
What does this mean for the user? The data does not reproduce where he was in the coordinates - it only reproduces where the radio identifications are, in whose vicinity he was approximately. And since it also contains neighboring networks, this is really very approximate. Of course, a rough spatial profile of the user can be derived from this - for example, in my data I can indeed see that I have been in Amsterdam, in Frankfurt and in Berlin.
But for example, it also means in reverse that only the approximate regions are included if you also had network reception there, with download options. I was in Copenhagen - there I also had network access via the hotel, so traces of this are present. In Malmö and at the turn of the year in Russia I did not have network access - so GSM, but no internet access - and therefore the iPhone could not access these location data and could not download radio identifications with positions. Therefore, these data are also completely missing from my iPhone and there are no traces of Malmö, Ekaterinburg or Nischni Tagil (the same should apply if you have activated airplane mode or simply turn off WLAN and mobile data).
Furthermore, the spaces should become larger when you come to more rural regions - few WLANs, so mainly GSM cells and these with a larger range and more scattered. If you store a cell with the neighbors, this is already a fairly large area that is covered. In large cities, on the other hand, the covered area should be significantly smaller, simply because WLANs have significantly smaller ranges and there are more of them there. And radio cells there are also usually smaller (just because a cell can only cover a finite number of users, but the user density in cities is greater).
This is particularly interesting for programmers: do you think about what can be derived from cached data when you program? Take as a basis for consideration that someone has access to your DNS cache - which every system has internally, simply to reduce DNS queries. What picture of you as an image could this technically harmless information produce? These are the small pitfalls that programmers like to stumble over. It is actually harmless - auxiliary data that you get from the network is the beginning. Throwing away after use - well, if they are needed again, then it makes sense to have the most frequent ones ready, or? And it is exactly then that you run into problems like Apple currently has.
The discussion about why your browser cache contains porn pictures (because you read your mails with Outlook, for example, and opened a spam mail and had image display activated - not an outlandish situation!), if your wife finds them there, could already become quite interesting. The data no longer shows why they ended up where they ended up.
As stated in the title: I am referring here to the answer from Apple and have only checked this with my own data. My own data matches the information from Apple's statement and this statement itself is also consistent - both the contents and the specification of the use match quite well. I therefore see no reason why I should distrust the statement.
Apple's answer that the iPhone does not record the user's motion profile is therefore correct - it simply stores information for a position determination as an alternative to GPS. At the same time, however, it is at least a profile of the stay in large areas. Criticism is therefore quite appropriate. But in my opinion, it should be more intelligent than "Apple stores the user's positions in the last year", because this is simply wrong.
But as Apple says in the introduction to the answer: these are technical relationships that are more complicated than simply "does Apple store a motion profile Yes/No". And our press has massive problems with questions to which an answer contains more than two sentences. "Apple stores data from which the presence in large areas can be derived" does not sound so great and catchy as a headline.
Unfortunately, this very imprecise reporting can lead to problems arising - if I know that the data only covers regions where I have been, but not precise points of my stay, the explanation why my data from Frankfurt also includes the red light district (it's just near the train station) is much easier than if I have to assume that these are all places where I have been.
Apple must (and will, according to its own explanation) improve this - caching data for a year is nonsense. Backing up the data is also nonsense, they can simply be downloaded again if they are missing. Similarly, the data does not need to be stored if all location services are globally deactivated. It might also be generally interesting to have a switch "Pseudo-GPS Yes/No" or something like that, with which this type of position determination can be deactivated - then the user simply has to wait until the GPS satellites are logged in. Just as, in my opinion, the anonymous data collection for WLAN and radio masts should be switchable.
In my opinion, no cache should exist without a control function for this cache (just as you can also empty the browser cache). Because one thing must be clear: due to the general necessity of linking access time and loaded data (because only in this way can a cache with temporary storage function), every type of cache provides a kind of user profile. And this should be at least rudimentarily controllable by the user (in the sense of deleting). Setting up caches fundamentally with a clear function and a UI for this should become just as much a best practice as the encrypted storage of passwords on servers (hello Sony!).