What is Secondary Data?
Secondary data is defined as any information that has been already collected and ready to be accessed.
A common example of secondary data is online data where a user/ researcher may be studying published materials to gain understanding on a subject. This may be quantitative data that is derived from surveys and may be available for free or paid. Or it may be qualitative in nature where a user may be looking at a published interview on a video streaming platform.
Secondary data can be a valuable source of research for many who don’t want to invest in primary data collection to save time and cost. It may also be used as a precursor to primary data collection to gain understanding through what has already been published, and then the researcher may proceed to conduct a primary data collection to fill the gaps in details and context.
Given secondary data was originally primary data with the context of who conducted it and why, the precise relevance of such information may vary from source to source.
For example, popularly known research companies like Nielson or Gartner, often conduct primary research on the broader market and then make it available to companies or individuals. Companies often access this information as secondary data, then using this information as base knowledge, they may conduct their own primary research based on their own specific products and services.
Hence, primary and secondary data are quite intertwined in the field of research, with one informing the collection of the other. What is collected as primary data at first, becomes secondary data for others to access.
Key Characteristics of Secondary Data
Let us now understand secondary data in more depth with respect to its characteristics:
- Pre-existing
Secondary data’s primary characteristic is its pre-existing nature. This is not information that is to be collected, rather it is accessed.
For example, an academic research paper in a university database is secondary data for students. This may form the foundation for more research or answer the core query with no added primary research required.
- Wide availability
Secondary data is widely available in free or paid form. A blog or article, like the one you are reading now, is also secondary data that is qualitative in nature for concept clarity. Or it may be a well-sourced quantitative survey for more data-oriented subjects.
Academic papers are often easily available to be public through online portals, which inform doctors, researchers, engineers and so on, on various updates and progress made in their respective fields.
- Cost-effective and time-saving
Most subjects have freely available secondary data online, or the cost to access them is much less than conducting primary research on their own.
Furthermore, one of the biggest benefits of secondary data is its time-saving ability. Most primary research takes weeks and months to complete, sometimes even years. This is valuable time saved and can instead be used for gaining further depth and clarity on the subject.
- Variable quality and relevance
A natural drawback of secondary data is its lack of context for the researcher. Although in many cases it may be very relevant, in most cases it may lack the exact context of the research.
Similarly, there is no way to verify in-depth the quality of the secondary data. It may have a biased sample or some issues with the equipment used or the sample size may have been too limited etc.
For example, a software-as-a-service (SaaS) company that makes customer experience (CX) software, may purchase a research paper that did an analysis of the B2B SaaS market at large. While this may be helpful to understand the broader business software industry, it lacks analysis of the specific CX software industry, or the exact type of CX software that the company offers. In such cases, a follow up primary research data collection might be required to understand the market that will bring more contextual and relevant understanding to their own business.
Secondary Data Collection Sources and Methods with Examples
Let us now look into all the different methods and sources from where secondary data can be collected:
- Academic papers and literature
Researchers can gather information from academic research papers, journals, books, conference papers etc. This is especially the case in the field of medicine, therapy, psychology and other scientific fields, where individual practitioners don’t usually conduct large scale research and analysis and instead rely on academic studies.
For example, a mental health professional studying the influence of prolonged impact of earphone usage may cite existing academic/ PhD papers that have at least covered aspects of it.
A drawback here is that in certain cases academic studies may be influenced by financial donor interests and may not cover the full breadth and depth of a subject, or may not publish a report if it reflects badly on the donor.
- Government and public institutions records
Government and public institutions can be a good source of secondary data in the field of finance, economics, demographics etc.
Of course, that relevance and trust worthyness of such data depends on the country and whether they practise transparent data publishing practices. In the case of most countries, it may be better to cross-check the information with privately published data or data from other institutions in order to validate it. This is to ensure that the data is void of any biases and hidden political influences/ motivations.
For example, a business analyzing market trends might use census data to assess population growth, age distribution, and income levels in different regions to identify potential markets for expansion. Before drawing conclusions, the business may cross-verify this same information with other sources to validate its authenticity.
- Market research reports
Many companies and private institutions openly publish their market research studies in the form of free reports or for a paid amount. Such reports are a valuable source of secondary data for many researchers and decision makers.
Such reports may come in the form of broad market analysis, for example the Gartner Magic Quadrant. Or it can be very specific to an industry or vertical. In many cases they are made free to collect sales leads as you download a report and fill out a form with contact details such as email, phone number etc.
When it is paid, a sample report is made available for free, then the full report is a paid version. This is typically done by market research companies where the core business is creating and selling market research studies.
- Internal company records
For internal research and data analysis, the company records themselves can serve as valuable information in the form of secondary data.
For example, if the HR team wants to understand the impact of evolving company policies on employee retention, they will first need to check the internal HR records on employee attention and retention for a certain time period.
Similarly, for any research on a company’s financial health, growth, marketing reach etc, internally stored data is the main secondary data. Based on this initial analysis, they can conduct further primary data collection on the current state of affairs from executives and leaders in the company for deeper understanding.
- Media and news content
Media reports and news content is often a readily available secondary data, at least to some extent. This information is typically limited in scope and requires more information to be added to form a clear picture. Nonetheless it’s a cost effective way to gather initial secondary data.
For example, a researcher looking to create a report on newly funded startups in the cloud technology space and what they are working on, may look at business publications for initial news content on startups and their funding. Based on that, added filters may be applied and additional sources of secondary data may be purchased to fill the gaps, before conducting final analysis.
Best Practices for Collecting and Managing Secondary Data
When collecting and managing secondary data, here are some helpful best practices on how to ensure data’s relevance, quality, transparency and usability, to meet the core objectives:
1. Stay focused on research objectives
Since secondary data is derived from a pool of information other than your core objectives, sometimes it can lead you astray. In primary data collection, the entire process is designed around the core objectives, so room for distraction is less. But in secondary data, there is so much side-information available to be filtered out, that they can be distracting and can lead you away from your original intention.
There is nothing wrong with being open and checking out various other related information that may not directly be useful for your current research purposes. But just make sure you don’t move away and waste too much time simply browsing and collecting information that may not help with your current objectives.
If you do find some secondary information that may come in handy in some other project, bookmark them and check them out later. This will help you stay focused on your goals while also allowing room for future explorations.
2. Validate data through additional sources
Unless your data is coming from highly authoritative sources, it is always a good idea to validate the data through other related secondary data sources. For example, if you have come across a survey with certain conclusions, check if there are other studies that have also come to the same conclusions.
Furthermore, always check the sample size. If the number of respondents are too small or lopsided in a given study, this may require multiple crosschecks to validate or may simply not meet the right standards to even serve as a source of truth.
3. Add primary data for relevance and context
Some research requirements may be too specific to be met through secondary data alone. In such cases, while secondary data can help create a starting point, the gaps in details may be best filled with data from primary research.
For example, in product research one may be studying consumer preferences to create the right feature list. They may find secondary data but it may speak on the broader industry rather than a specific type of product within that industry. While this may serve to form an initial understanding and help prepare further research, to find the exact preferences in terms of the product itself, primary data collection may be the best way to draw accurate conclusions.
4. Document and cite your sources
When secondary data is being used to help inform decision making, it is key to ensure that the sources are well documented and cited through links. Whether it’s surveys, interviews, opinions from subject matter experts, documentaries, news clips etc, they should all be linked from where they have been used to draw conclusions in your report.
Sometimes researchers put all the sources at the end of the report, that’s an old practice from pen-and-paper days. Instead just hyperlink one or more words in the report so that they can be viewed directly.
If the data is offline, it needs to be digitized and made available online. These days no one wants to go through a bunch of papers to verify sources, instead scan and upload them and create easy links to them.
5. Periodically review and update secondary data
Secondary data such as annual surveys, company filings, news updates etc, need to be periodically updated especially if the research report is being used to inform current decision making. If an important information update happens via media, ensure to update the report and inform relevant stakeholders who are using this information.



