A long, long time ago in a company not so far away I was head of data entry at an insurance company. It was a rather forward company and they were embracing all sorts of technologies such as imaging. We would go to one computer station, look up the document we wanted and then see it that disk was the one that was in the computer in another room. If it wasn’t we would call the mystery tech dude and he would change the disk. We would then select the document and wait about five minutes for the image to come up while we started at a blank screen. No one liked it. Prior to this we would walk over to the room, say hi to friends on the way over there. Chat it up with the file clerks while they got the file. It took longer, but there was no time spent starring at a blank screen waiting.
Today I am working on adding some faculty scholarship data to my database. I have a large spreadsheet with some bad data. Now, most of us would look at it and say, “It’s fine. I can see who published what when. I can see what each project entailed. The problem is when I want to do something with this information. For example, I have other faculty information in my database, such as the campus they work on, their education, work experience and the courses they teach. If the data identifying the faculty is the same for all data I can create a report that tells us all of those thing for one person. I can also pull information for faculty at one location or for one program. But the list I have has bad data. The faculty field has:
The full name
Only the last name
Two, three and four names
As a result I need to scrub the data and make it all the same and matching the format in the other areas.
While the Dean may be able to look at all three of those forms and know immediately who did this great work, we can not provide aggregate data. We can not drill down to identify levels of faculty commitment to scholarship by program.
Another example of bad data is the form of the scholarship. The author of this report created three columns for PowerPoint, abstract and poster. All data needed to show how our faculty are communicating their findings and something that leadership wishes to report to accreditors. However, if the three options were part of a single column, it is easier to summarize the methods and ensure that only one method is selected. Again, while viewing the report a Dean or Administrative Coordinator could easily get an idea of who is doing what, but to aggregate the information would be less streamline and errors could occur more easily.
So if I send this list back and say, I need a line for each faculty/scholarship combination with their full name (better yet their ID) and I need the method combined into one column, the writer of this report will find it annoying and time consuming. If they were old they would say “I never had to do this before”. That is because before computers a bank of analysts would compile the data by counting and tallying and reporting. That bank of analyst has been replaced by a bank of computers and part of the analyst’s work has trickled down to the initial reporters.
In both cases, the work of the front line has increased. More work is required and the fruits of that work are not apparent. The people staring at the screen waiting for the file, don’t realize that the document will not be lost and one day will come up faster than they can imagine and the gatherer of the faculty data does not realize that their work will enable a quick reporting of the data and no need for the bank of analyst.
So when you hear about bad data and think it means stopping big machines, remember that its more refined than you think and that someone who knows how to format the questions and the possible answers can save you and your company time and money and make the subtle machine work better.
There are no comments yet, add one below.