Social Security Index A Useful Search Tool – by Vince Summers

The Social Security Index (SSDI) is a great help when researching 1960s forward.
Suppose you have the maiden name of a woman, and only a couple of bits of information, but not her married name. Can you find her husband on SSDI? Quite possibly:

In Craig County, Virginia, I showed a Ruba LEFFEL. She was born 9 January 1891. That was all that I had. Craig is a fairly small county. The time period was likely to be in the right range for a social security record, assuming she lived an average or slightly longer life.

I tried entering “Ruba and Craig County,” in the SSDI. Then I realized the name Ruba was likely a mis-transcription for Ruby. I took the plunge and entered Ruby and Craig County and the birth date. Got one!

Ruby OHMER. Could be her, might not be. I saw a zip code listed, and so tried again, simply entering the zip code and the surname OHMER. Two entries: Ruby and Herbert. Yes, I found dates, locations, etc., even the mate . . . maybe. Would this prove to be valid? I checked other resources, and found it was.

Not only that, but I found dozens of other bits of data on other persons using such search techniques on SSDI, in which I did NOT use the surname. I did this all in one session, on one day.



Filed under: SSDI Articles | No Comments

Mine All (Data) Mine – Drew Smith

As genealogical researchers, we are awash in data. Our computer hard drives and 3.5″ disks are full of data. The Internet is full of data. Our file cabinets are full of data (data that isn’t even in digital form yet). Every now and again, we get this sneaking suspicion that we already possess the answers to some of our questions, but that those answers are buried somewhere amid the vast quantities of data we already own or have access to.

The situation reminds me of a science-fiction story I once saw broadcast on television, in which a rich man gives all of his money to the devil so that he can go back in time and purchase property he knows contains a huge reservoir of fossil fuels. Unfortunately, he fails to realize that the technology of that time period is incapable of accessing the energy resources. Like that unfortunate time traveler, we find ourselves rich with data, but without the technology to benefit from it.

During the past decade, information scientists have been studying and developing a technology that may soon prove to be of enormous value to genealogical researchers. Known as “data mining,” it is capable of analyzing our data and looking for patterns that might provide the answers we seek. Data mining, also known as “knowledge discovery in databases” (KDD), involves a process that could, in the very near future, be built into the genealogy research software we already use.

The first step in data mining is the collection of data from a variety of sources. Sources currently available only in print or microfilm form would be converted to digital information. Data is retrieved from government databases (such as the Social Security Death Index), news wires (think of the number of news stories and obituaries that may contain information you need), the Web (think of how much is added there everyday), genealogy message boards and mailing lists, your personal e-mail, and any other potential source of genealogical information.

The second step is for the data to be “cleaned.” As you might imagine, raw data from the sources previously mentioned needs to be standardized (date formats, abbreviations for locations, etc.). Next, the data must be stored. The repository for this large collection of data is referred to as a “data warehouse.”

Once our genealogical data has been warehoused, we then need software that can begin to sift through it looking for patterns we have defined. For example, we might ask the software to look for a pattern that involves a cluster of people moving from location A to location B during a certain time frame. By doing this, we might identify a group of people who migrated with our own ancestors—a group that contains relatives we have not yet identified. Another example would be to use the software to look for naming patterns among our ancestors and among people who lived in the same places during the same times.

As you can see, these questions are not the typical questions we can ask our genealogical database software today. But the cost of data storage continues to drop, making it feasible for us to create our own huge data warehouses for our research purposes. The software to analyze data for patterns is already being developed and used for financial and scientific purposes. Can it really be that far in the future before we see the same type of software available to us?

Drew Smith is an instructor with the School of Library and Information Science at the University of South Florida in Tampa. He is also a regular contributor to the quarterly journal Genealogical Computing, where he writes the “Cybrarian” column. He can be reached at drewsmith@aol.com.