Website Content Quality Assurance Report Guide
Prepared for: Client Name | Your Company Name | www.your-website.com
Prepared by: Allan Kirsch | website: allankirsch.tech.officelive.com
Prepared on: 06/26/2009
LEGAL DISCLAIMER
THIS REPORT IS FOR INFORMATIONAL PURPOSES ONLY TO INDICATE THE POSSIBILITY OF POTENTIAL DOCUMENT ISSUES WHICH MAY OR MAY NOT EXIST. THIS REPORT CAN ONLY MAKE A BEST EFFORT TO DETECT DOCUMENT ISSUES DUE TO THE INCONSISTENCIES AND COMPLEXITIES OF THE ENGLISH LANGUAGE SOMETIMES COMBINED WITH OTHER FOREIGN LANGUAGES, ACRONYMS, ABBREVIATIONS, DOCUMENT FORMATING & DOCUMENT VERSIONING VARIATIONS, TECHNICAL TERMS, HYPHENATED-WORD VARIATIONS, CHARACTER ENCODING ERRORS, TEXT COLUMN / PAGE LAYOUT VARIATIONS, LATIN WORDS, INCONSISTENT FILE NAME EXTENSION USAGE, HYPHENATED-WORDS ACROSS PAGE BREAKS, SLANG AND PROPER NOUN CONFLICTS WITH COMMON WORDS TO NAME A FEW. THIS REPORT MAY FAIL TO FIND DOCUMENT MISSPELLINGS AND OTHER ISSUES OR SUGGEST INCORRECT SOLUTIONS. ALL INFORMATION IN THIS REPORT SHOULD BE VERIFIED WITHIN THE DOCUMENTS BEING IDENTIFIED BY THIS SERVICE AND SUGGESTED CORRECTIONS AND SOLUTIONS SHOULD BE VERIFIED USING MULTIPLE ALTERNATIVE RELIABLE REFERENCE SOURCES AND EXPERTS.
SEE "http://allankirsch.tech.officelive.com" FOR ADDITIONAL LEGAL INFORMATION.
WARNING: IF YOUR DOCUMENTS CONTAIN PROPHANITY THAT LANGUAGE MAY APPEAR IN THESE REPORTS.
Congratulations on your decision to be proactive to ensure your website has more accurate content!
Many businesses are reactive rather than proactive costing themselves both time and money.
This guide will help you get the most out of the information about your content provided by this service.
It uses a question and answer format allowing you to quickly access the information you need.
Questions are presented in the order needed to make the most efficient use of this information.
Let's get started!
Q: How do I set priorities for using this information to improve my website?
Everyone has different circumstances, goals, content, audiences and website tools; but you will most likely set your priorities based on the same reasons you decided to use this service.
- To project the highest levels of professionalism and creditability
- To improve your Internet search engine ranking thus improving website traffic and business
- To provide clear and concise website content to your readers
- To reduce miscommunications within your organization and with your clients
- To make a good first impression to new website visitors and clients
- To avoid costly snafus due to content misprints
- To add custom dictionary entries specific to your content to the spell checker in your document editor
- To increase your level of confidence about the accuracy of your document content
- To gain new insights about your content using the various perspectives provided by this information
- To have more accurate content as a prerequisite to employing advanced SEO techniques
- Knowing that a highly trained automated process is much more cost effective than any manual effort
- To gain an edge on your competition who fail to use this service
- To ensure your website maintenance team is doing a good job maintaining your website
- To conduct an independent audit of a website you are considering to purchase
- To minimize cutover issues for brand new or newly redesigned websites
You may be overwhelmed by the number of issues identified by this service if this is the first time you have used it. Keep in mind that even the best managed and maintained large business websites usually have between 20 to 30 issues. It may take some work using this information to correct your website, but it is still far less work than manually checking hundreds of web pages using conventional methods. Websites with dynamic content should use this service at regular intervals to prevent a backlog of issues.
Remember, the sooner you make your website corrections -- the sooner popular Internet search engines will get the opportunity register those corrections to possibly improve your search engine ranking. Popular Internet search engines can take months to sweep the entire Internet. This guide sets priorities based on that fact.
Important: Backup your entire website using a proven and reliable backup method before performing any modifications to your website. This gives you a fall back plan if something goes wrong during editing.
Important: This guide is for informational purposes only and should be treated like any other information you read. Use you own judgment to decide what is best for your particular situation. When in doubt, consult experts and reliable sources for a second opinion.
Q: What skills do I need to make use of this information?
You need to be proficient at conducting website maintenance tasks such as website backups, web page editing, computer document comparisons and file system management. You also need to have a good understanding of the English language. Consult a qualified webmaster on technical tasks you don't fully understand. The website
www.wikipedia.org is an excellent resource for explaining any technical terminology discussed in this document.
Q: Does my website contain duplicate document files (documents having the same content)?
This information is located in the "reports/website_file_management/file_duplicates/" folder.
Open the "
list_of_all_selected_document_file_duplicates_with_file_sizes_and_md5_checksums_log" file.
This file contains information about the files on your website that this service identified as duplicate documents. Why should you care about duplicate documents? Duplicate document files can result in multiple versions of the same document and multiply the number of content issues reported by this service. They also waste disk space on your web server. Consult an expert such as your webmaster if you are inexperienced at resolving and handling the removal of duplicate documents. The file duplicate information provided by this service is for informational purposes only as an indicator that possible duplicate documents exist. Use you own file comparison tools to confirm that the documents really are duplicates. The incorrect removal of files can result in a broken website.
Important: If you are inexperienced at resolving duplicate files then don't remove any files.
Continue reading this section to learn more about duplicate file information; otherwise go to the next question.
Let's look at a duplicate file report sample.
;[COUNT OF ALL SELECTED DOCUMENT FILE DUPLICATES WITH FILE SIZES AND MD5 CHECKSUMS] = 6
;[LIST OF ALL SELECTED DOCUMENT FILE DUPLICATES WITH FILE SIZES AND MD5 CHECKSUMS]
000055292<|>2a9433fc8b6fd074af74c9f267cca69d<|>www.your-website.com/index.html
000055292<|>2a9433fc8b6fd074af74c9f267cca69d<|>www.your-website.com/index.htm
000003985<|>acb938fec332b92b67a07a7db1ebc457<|>www.your-website.com/articles/funInTheSun.html
000003976<|>acb938fec332b92b67a07a7db1ebc457<|>www.your-website.com/articles/save/funInTheSun.html
000003857<|>acb938fec332b92b67a07a7db1ebc457<|>www.your-website.com/articles/backupCopies/funInTheSun.html
000003593<|>acb938fec332b92b67a07a7db1ebc457<|>www.your-website.com/articles/backupCopies/funInTheSunCopyA.html
000003492<|>acb938fec332b92b67a07a7db1ebc457<|>www.your-website.com/articles/backupCopies/funInTheSunCopyB.html
000002996<|>acb938fec332b92b67a07a7db1ebc457<|>www.your-website.com/articles/backupCopies/funInTheSunCopyC.html
SIZE OF ALL REDUNDANT SELECTED DOCUMENT FILES = 73,206
REPRESENTING %0.865801 OF ALL SELECTED DOCUMENT FILES.
NOTE: THE LARGEST SIZE FILES APPEAR AT THE TOP OF THIS LIST.
The sample above identifies six duplicate HTML files. Eight are listed, but we subtract the two originals to get six. The largest size files appear at the top of this list as the note states at the bottom of the report. The order of the list allows you to identify the largest files first to free the most disk space when there are a large number of duplicates. Duplicates are grouped together separated by a blank line. In the sample above we see two sets of duplicates.
Columns are delimited using the pattern "<|>" to provide a machine readable file format. The first column is the size of the file in bytes, the second column contains the MD5 checksum which is the unique signature of the content within the file. Files having the same content will have the same MD5 checksum value. Finally, the third column indicates the location and file name of the document. The size in bytes of the redundant files are added together so you can see how much disk space they are consuming on the web server along with the percentage they represent.
It is common for websites to have duplicate versions of the "index" document with different file name extensions such as "index.html" and "index.asp" for Internet search engine registration and indexing purposes.
Tip: Duplicate document identification is vital to proper website management.
Tip: Reducing the amount of space your content occupies on a web server may reduce your web hosting costs.
Q: Should I setup my local custom dictionary before correcting spelling issues on my website?
Yes. Identifying custom dictionary words specific to your content is one of the great benefits of this service. You can import these words into your local spell check dictionaries used to edit your web pages. You normally add unrecognized words one at a time to your custom dictionary as you add content to your documents. This can be a time consuming process. Now you can batch import most of them all at once and reap the benefits every time you edit a document locally.
Important: Consider making a backup copy of the existing custom dictionary file(s) for your local document editor as an added precaution. This is especially important if you have already added a number of entries from previous document editing sessions.
Q: Where do I find the custom dictionary word list prospects provided by this service?
This information is located in the "reports/custom_dictionary_prospects/" folder.
Open the "custom_dictionary_candidates_log" file.
Copy and paste the word list into your document editor. Now review the words in the list and delete any you feel don't belong in your custom dictionary or aren't flagged as a misspelling. Keep in mind that these words were not added to the core dictionaries of this service because they too closely resembled more commonly used words or names of famous people or famous places. This service also uses context based word pairs to identify words as being valid when they normally would be considered misspelled. For example, the word "Storey" in the phrase "Storey County" is not identified as a misspelling by this service, but it is considered a misspelling of the word "story" or "store" when it is not followed by the word "county". Therefore if you always fully identify this word in your content by following it with the word "county"; you don't have to add it to your custom dictionary.
Another source for custom dictionary words is in the file "unrecognized_items_and_custom_dictionary_prospects_log" located in the following folders:
"reports/document_content/document_body/unrecognized_items",
"reports/document_content/HTML_META_tags/descriptions/unrecognized_items",
"reports/document_content/HTML_META_tags/titles/unrecognized_items",
"reports/document_content/HTML_META_tags/keywords/unrecognized_items".
The largest source of information is in the "reports/document_content/document_body/unrecognized_items" folder.
Skip the "reports/document_content/HTML_META_tags/*" folders if you are in a hurry as they usually contain redundant entries also found in the document bodies.
Let's look at a "reports/document_content/custom_dictionary_prospects" sample.
000127<|>your-website
000053<|>linkedin
000044<|>Gallegly
000035<|>Capito
000021<|>Tiahrt
000020<|>Marchant
000011<|>tography
000010<|>Springgay
000006<|>SFU
000004<|>FFE
000004<|>Eby
[more]...
The first column is the number of times the word occurs in the content. As you can see, sometimes words get parsed out of URLs such as www.linkedin.com. This usually happens when URLs are typed into documents as regular text without the use of an HTML HREF. You can search the "
list_of_all_selected_document_files_containing_simple_internet_address_references_log" file in the
"reports/document_content/document_body/address_references/" folder to verify if a word occurs within a URL.
Words at the top of the list with the highest occurrence counts should be given the most consideration.
In the above list, possible candidates for your custom dictionary may be: Gallegly, Capito, Springgay and SFU.
We also see that "tography" may be a misspelling. Words that don't closely resemble valid English words(off by more than one letter), are placed in this file because no corrective fix suggestions are practical. Use your website's search tool or search the "unrecognized_items_and_their_locations_log" file in the "reports/document_content/document_body/unrecognized_items/" folder to locate the documents containing the word.
Tip: This service recognizes popular acronyms that don't closely resemble common English words.
Tip: One of the primary objectives of this service is to produce the smallest "unrecognized words list" possible. This is done using extremely robust dictionaries coupled with foreign language word identification. Less robust spell checking methods generate much larger lists costing you time you shouldn't have to spend reviewing valid English words, foreign language words and acronyms.
Tip: Some words found in this file may indicate that you are getting a little too creative with the English language because they are unique and not used by the general English speaking population.
Q: What about words such as "flickr" and "digg" used as names by popular Internet websites?
We are all in for a lot of extra unnecessary work if the current naming trend continues on the Internet without any thought about how these names impact spell checking. This service doesn't add these words to its core dictionaries because they are misspellings of valid commonly used words. If you decide to add these words to your custom dictionary then real misspellings of these words will go undetected. If you don't add them then you may end of with a lot of misspelling noise in addition to your other misspelled words. It's a lose-lose situation. Even context based spell checking will have difficulty dealing with this problem because a name such as "flickr" can be used with a wide variety of surrounding words.
Q: How do I import custom words into my document editor's spell check tool?
This procedure will vary from editor to editor. Look for an import or custom dictionary menu option. Also consult your editor's help feature.
Important: Save a copy of the custom word entries you import into your local document editor. Provide a copy of this information to this service the next time you use it to speed processing and save yourself time by not having to review redundant items you have already imported into your local document editor. This service can't import custom dictionary entries stored within proprietary binary files used by document editors. If your document editor can't export its custom dictionary entries into a plain text file then be sure to save a copy of them to your own plain text file.
Q: OK, I've loaded the custom words into my document editor. What's next?
Be certain you have a backup of your entire website using a proven and reliable backup method You are now ready to begin making corrections to your content using your document or web page editor. The next questions in this guide exclude PDF file content. Information about issues found in PDF content is stored separately in the "PDF_content" folder. Information about all other document types is stored in the other folders. We will examine issues found in PDF content later in this guide. Reference the "SOURCE:" text directly under a question to be certain of the document source.
Q: What are the different types of document sources?
SELECTED VS. UNSELECTED: This service can use file and folder name filters to select specific documents and ignore others.
READABLE VS. UNREADABLE: Document files may be unreadable due to corruption, encryption, unsupported versions or password protection.
WEB PAGE DOCUMENTS: Supported web page types only. Excludes all other document types.
DOCUMENTS EXCLUDING PDF FILES: All supported document types except PDF files.
PDF FILES: PDF files only. Excludes all other document types.
Tip: See the main page of allankirsch.tech.officelive.com for a current list of supported document types.
Q: How many possible spelling mistakes were found in web page document titles on my website?
SOURCE: ALL SELECTED READABLE WEB PAGE DOCUMENTS
This information is located in the "reports/document_content/HTML_META_tags/titles/" folder.
Open the "possible_misspellings_csv" file.
Tip: Document titles are highly ranked by popular Internet search engines.
Q: How many possible spelling mistakes were found in web page document descriptions?
SOURCE: ALL SELECTED READABLE WEB PAGE DOCUMENTS
This information is located in the "reports/document_content/HTML_META_tags/descriptions/" folder.
Open the "possible_misspellings_csv" file.
Tip: Document description information may be presented as the description for your web pages when displayed in organic search engine results. Typographical mistakes in a web page description may give someone reviewing organic search engine results a bad first impression of your website.
Q: How many possible spelling mistakes were found in the document bodies on my website?
SOURCE: ALL SELECTED READABLE DOCUMENTS EXCLUDING PDF FILES
This information is located in the "reports/document_content/document_body/" folder.
Open the "possible_misspellings_csv" file.
Tip: Correct words specific to your business before more common words for search engine ranking purposes.
Q: How many grammatical mistakes were found on my website in document bodies?
SOURCE: ALL SELECTED READABLE DOCUMENTS EXCLUDING PDF FILES
This information is located in the "reports/document_content/document_body/possible_grammar_Issues/" folder.
Open the "list_of_all_selected_document_files_containing_possible_articles_of_speech_issues_log" file.
Note: This service currently has a very simple grammar check function and is by no means comprehensive.
Additional grammar check functionality is currently in development and will be available in the future.
Q: How many double typed words were found on my website in document bodies?
SOURCE: ALL SELECTED READABLE DOCUMENTS EXCLUDING PDF FILES
This information is located in the "reports/document_content/document_body/possible_grammar_Issues/" folder.
Open the "list_of_all_selected_document_files_containing_the_most_probable_possible_double_typed_words_log" file.
Open the "list_of_all_selected_document_files_containing_the_less_probable_possible_double_typed_words_log" file.
Tip: This service knows about many valid double typed words such as "yo-yo" and will not list them.
Tip: This service provides a "less probable" and "most probable" list to help you prioritize your efforts.
Q: How can I see a summary of my web page document titles to check for a consistent format?
SOURCE: ALL SELECTED READABLE WEB PAGE DOCUMENTS
This information is located in the "reports/document_content/HTML_META_tags/titles/" folder.
Open the "summary_of_html_document_file_titles_log" file.
Tip: The placement of all titles together in one place allows you to visually identify any format inconsistencies.
Tip: Use a consistent web page document title format to give your readers a better user experience.
Tip: Title information usually appears in the title bar of web browsers.
Tip: Use this information to show an employee how to consistently format document titles.
Q: Is now a good time to perform another full website backup?
Yes. At this point you have made a substantial number of website changes. You don't want to lose them.
Be sure to save your backup to a different file name so you don't overwrite your previous backup.
Q: How many document files were unreadable and couldn't be processed by your service?
SOURCE: ALL SELECTED DOCUMENTS
This information is located in the "reports/website_file_management/document_lists/" folder.
Open the "list_of_all_selected_unreadable_document_file_log" file.
Open the "list_of_all_selected_unreadable_or_empty_document_file_log" file.
Open the "list_of_all_selected_unreadable_copy_protected_pdf_document_file_log" file.
Tip: Use your own website maintenance tools to validate these files and ensure they aren't corrupt.
Tip: Password protected, copy protected and encrypted files will also show up in these lists.
Q: How can I see a summary of the alternate text for images to check for a consistent format?
SOURCE: ALL SELECTED READABLE WEB PAGE DOCUMENTS
This information is located in the "reports/document_content/HTML_META_tags/alternate_text_for_images/" folder.
Open the "list_of_all_selected_readable_html_document_file_alternate_text_descriptions_for_images_log" file.
Tip: The placement of all image text together in one place allows you to visually identify any inconsistencies.
Tip: Use a consistent alternate text for images format to give your readers a better user experience.
Tip: Use this information to show an employee how to consistently format this text
Q: How many misspelled words were found in my alternate text for images content?
SOURCE: ALL SELECTED READABLE WEB PAGE DOCUMENTS
Sample list:
Missspelling1<|>www.your-website.com/page1.html
Missspelling2<|>www.your-website.com/page2.html
Tip: Popular Internet search engines also register and index image text.
Tip: Visually impaired individuals may rely on this text for image descriptions.
Tip: Use a proper description for an image instead of generic text such a "picture".
Q: How many possible spelling mistakes were found in my PDF files on my website?
SOURCE: ALL SELECTED PDF FILES
This information is located in the "reports/document_content/PDF_content/" folder.
Open the "possible_misspellings_csv" file.
Q: How populated are the major SEO HTML META tags on my website?
SOURCE: ALL SELECTED READABLE WEB PAGE DOCUMENTS
This information is located in the "reports/" folder.
Open the "report_summary_log" file.
See the "[HTML META TAG REPORT SECTION]" at the end of the file.
Q: How can I identify which web pages have unpopulated HTML description tags?
SOURCE: ALL SELECTED READABLE WEB PAGE DOCUMENTS
This information is located in the "reports/document_content/HTML_META_tags/" folder.
Open the "list_of_all_selected_readable_html_document_file_titles_and_descriptions_and_keywords_log" file.
Find all occurrences of the text "[NO DOCUMENT DESCRIPTION]" to identify each file.
For example, here is an entry:
www.your-website.com/blog/category/funshine/index.html
New Perspectives | The Funshine Blog
[NO DOCUMENT DESCRIPTION]
Ways to light up your life
The first line identifies the file missing this information.
Continue searching the file for any other unpopulated entries to find their locations.
Tip: Popular Internet search engine may index your descriptions so be sure to properly populate them.
Q: How can I see a list of URLs present within my website content to check for obsolete links?
SOURCE: ALL SELECTED READABLE DOCUMENTS EXCLUDING PDF FILES
This information is located in the "reports/document_content/document_body/address_references/" folder.
Open the "list_of_all_selected_document_files_containing_simple_internet_address_references_log" file.
Open the "list_of_all_selected_document_files_containing_other_internet_address_references_log" file.
These report files contain a summary list with occurrence counts then a listing of each URL and its location within your website. Use the summary lists to identify unwanted URLs and then search further down in the file for the file locations within your website.
Tip: Obsolete and invalid URL references in your documents create frustration and confusion for your readers.
Ensure they are valid and current for a better user experience.
Tip: Copy the above list into any Internet application that automatically converts them to hypertext links.
Now click on each link to validate them. Another option is to use a link validation service.
Tip: Use this information to identify links to undesirable websites posted by your website users.
Q: How can I see a list of the e-mail addresses present on my website to check for obsolete addresses?
SOURCE: ALL SELECTED READABLE DOCUMENTS EXCLUDING PDF FILES
This information is located in the "reports/document_content/document_body/address_references/" folder.
Open the "list_of_all_selected_document_files_containing_simple_email_address_references_log" file.
This report file contains a summary list with occurrence counts then a listing of each address and its location within your website. Use the summary list to identify unwanted addresses and then search further down in the file for the file locations within your website.
Tip: Incorrect e-mail addresses in your documents create frustration and confusion for your organization.
Ensure they are valid and current for better organizational operations.
Tip: Use this information to identify any missing e-mail addresses commonly provided by organizations such as
support@www.your-website.com.
Tip: Use this information to identify inconsistencies in your standandized e-mail addressing scheme.
Q: How can I see a list of URLs present in my PDF content to check for obsolete links?
SOURCE: ALL SELECTED READABLE PDF FILES
This information is located in the "reports/PDF_content/document_body/address_references/" folder.
Open the "list_of_all_selected_document_files_containing_simple_internet_address_references_log" file.
This report file contains a summary list with occurrence counts then a listing of each URL and its location within your website. Use the summary list to identify unwanted URLs and then search further down in the file for the file locations within your website.
Q: How can I see a list of the e-mail addresses present in my PDF to check for obsolete addresses?
SOURCE: ALL SELECTED READABLE PDF FILES
This information is located in the "reports/PDF_content/document_body/address_references/" folder.
Open the "list_of_all_selected_document_files_containing_simple_email_address_references_log" file.
This report file contains a summary list with occurrence counts then a listing of each address and its location within your website. Use the summary list to identify unwanted addresses and then search further down the file for the file locations within your website.
Tip: Ensure e-mail addresses in your PDF documents are consistent with addresses used in your web pages.
Q: How many possible proper noun miscapitalizations were found on my website?
SOURCE: BODY CONTENT OF ALL SELECTED READABLE DOCUMENTS
This information is located in the "reports/document_content/document_body/" folder.
Open the "possible_miscapitalizations_csv" file.
In the "reports/document_content/HTML_META_tags/titles/" folder.
Open the "possible_miscapitalizations_csv" file.
In the "reports/document_content/HTML_META_tags/decriptions/" folder.
Open the "possible_miscapitalizations_csv" file.
SOURCE: BODY CONTENT OF ALL SELECTED READABLE PDF FILES
This information is located in the "reports/document_content/PDF_content/" folder.
Open the "British_to_US_English_csv" file.
Q: How many British style words were found on my website?
SOURCE: BODY CONTENT OF ALL SELECTED READABLE DOCUMENTS EXCLUDING PDF FILES
This information is located in the "reports/document_content/document_body/" folder.
Open the "British_to_US_English_csv" file.
SOURCE: BODY CONTENT OF ALL SELECTED READABLE PDF FILES
This information is located in the "reports/document_content/PDF_content/" folder.
Open the "British_to_US_English_csv" file.
SOURCE: ALL SELECTED READABLE WEB PAGE DOCUMENTS
This information is located in the "reports/document_content/HTML_META_tags/titles/" folder.
Open the "British_to_US_English_csv" file.
SOURCE: ALL SELECTED READABLE WEB PAGE DOCUMENTS
This information is located in the "reports/document_content/HTML_META_tags/descriptions/" folder.
Open the "British_to_US_English_csv" file.
SOURCE: ALL SELECTED READABLE WEB PAGE DOCUMENTS
This information is located in the "reports/document_content/HTML_META_tags/keywords/" folder.
Open the "British_to_US_English_csv" file.
Tip: Converting British spelled words to U.S. English words will make your document content more consistent. Document content consistency makes your documents more machine readable and searchable by search engines and other automated processes.
Q: What are the top keyword phrases used in my document body content?
SOURCE: ALL SELECTED READABLE DOCUMENTS EXCLUDING PDF FILES
This information is located in the "reports/document_content/document_body/top_keyword_phrases/" folder.
Open the "top_correctly_spelled_keywords_deducting_the_min_number_of_stop_words_folded_lettercase_log" file.
Open the "top_unreviewed_keywords_deducting_the_min_number_of_stop_words_folded_lettercase_log" file.
Open the "top_correctly_spelled_keywords_deducting_the_max_number_of_stop_words_folded_lettercase_log" file.
Open the "top_unreviewed_keywords_deducting_the_max_number_of_stop_words_folded_lettercase_log" file.
Tips:
- Identify topics, VIPs, and trends in blogs, newsgroups and unclassified document sets.
- Gain new insight about the topics being discussed the most on your website.
- Identify offensive topics you may not want on your website posted by users.
- Use this information to populate fields for keyword information on business networking websites.
- Use this information for Internet Search Engine Analysis (SEO).
- Compare this information to the keywords used in your HTML keywords META tags.
- The "reviewed" list has been reviewed by a person.
- The "unreviewed" list is the product of an automated process.
- Misspelled words are not included in these lists to ensure you have a clean usable list.
- Use this service a second time after correcting your website to get more complete top keyword information.
- This feature can be used to create a comprehensive index for the back of large books as an additional service.
- Identify individuals posted on the U.S. Office of Foreign Assets Control (OFAC), SDN watch list.
- Only nouns, verbs and adjectives appear in the lists to give you a better perspective about your content.
- Folded lettercase means that a keyword phrase regardless of case is folded into a single entry with one count.
- A "stopword" is a word that is not an adjective, noun or verb. Examples: "the", "which", "he", "she", etc.
- Only nouns and adjectives are included when the maximum number of stopwords are used.
- Verbs, nouns and adjectives are included when the minimum number of stopwords are used.
Q: What are the top keyword phrases used in my web page titles?
SOURCE: ALL SELECTED READABLE WEB PAGE DOCUMENTS
This information is located in the "reports/document_content/HTML_META_tags/titles/top_keyword_phrases/" folder.
Open the "top_correctly_spelled_keywords_deducting_the_min_number_of_stop_words_folded_lettercase_log" file.
Open the "top_unreviewed_keywords_deducting_the_min_number_of_stop_words_folded_lettercase_log" file.
Open the "top_correctly_spelled_keywords_deducting_the_max_number_of_stop_words_folded_lettercase_log" file.
Open the "top_unreviewed_keywords_deducting_the_max_number_of_stop_words_folded_lettercase_log" file.
Q: What are the top keyword phrases used in my web page descriptions?
SOURCE: ALL SELECTED READABLE WEB PAGE DOCUMENTS
In the "reports/document_content/HTML_META_tags/descriptions/top_keyword_phrases/" folder.
Open the "top_correctly_spelled_keywords_deducting_the_min_number_of_stop_words_folded_lettercase_log" file.
Open the "top_unreviewed_keywords_deducting_the_min_number_of_stop_words_folded_lettercase_log" file.
Open the "top_correctly_spelled_keywords_deducting_the_max_number_of_stop_words_folded_lettercase_log" file.
Open the "top_unreviewed_keywords_deducting_the_max_number_of_stop_words_folded_lettercase_log" file.
Q: What are the top keyword phrases used in my PDF files?
SOURCE: ALL SELECTED READABLE PDF FILES
In the "reports/document_content/PDF_content/top_keyword_phrases/" folder.
Open the "top_correctly_spelled_keywords_deducting_the_min_number_of_stop_words_folded_lettercase_log" file.
Open the "top_unreviewed_keywords_deducting_the_min_number_of_stop_words_folded_lettercase_log" file.
Open the "top_correctly_spelled_keywords_deducting_the_max_number_of_stop_words_folded_lettercase_log" file.
Open the "top_unreviewed_keywords_deducting_the_max_number_of_stop_words_folded_lettercase_log" file.
Q: What are the top U.S. English words used in my document body content?
SOURCE: BODY CONTENT OF ALL SELECTED READABLE DOCUMENTS EXCLUDING PDF FILES
This information is located in the "reports/document_content/document_body/top_words/" folder.
Open the "top_correctly_spelled_words_deducting_the_min_number_of_stop_words_folded_lettercase_log" file.
Tip: This is similar to a word count function, but better because insignificant words such as "the" are excluded.
Q: What are the top U.S. English words used in my web page titles?
SOURCE: ALL SELECTED READABLE WEB PAGE DOCUMENTS
This information is located in the "reports/document_content/HTML_META_tags/titles/top_words/" folder.
Open the "top_correctly_spelled_words_deducting_the_min_number_of_stop_words_folded_lettercase_log" file.
Q: What are the top U.S. English words used in my web page descriptions?
SOURCE: ALL SELECTED READABLE WEB PAGE DOCUMENTS
This information is located in the "reports/document_content/HTML_META_tags/descriptions/top_words/" folder.
Open the "top_correctly_spelled_words_deducting_the_min_number_of_stop_words_folded_lettercase_log" file.
Q: What are the top U.S. English words used in my web page keywords?
SOURCE: ALL SELECTED READABLE WEB PAGE DOCUMENTS
This information is located in the "reports/document_content/HTML_META_tags/keywords/top_words/" folder.
Open the "top_correctly_spelled_words_deducting_the_min_number_of_stop_words_folded_lettercase_log" file.
Q: How many significant English words exist in my content?
See the "[ U.S. ENGLISH WORDS REPORT SECTION ]" in the following file:
SOURCE: ALL SELECTED READABLE DOCUMENTS EXCLUDING PDF FILES
In the "reports/" folder, open the "report_summary_log" file.
Tip: This is similar to a word count function, but better because insignificant words such as "the" are excluded.
Q: How can I get answers to the following questions?
How many document files were processed by your service?
How many document files were found on my website?
How many web pages were found on my website?
How many PDF files were found on my website?
How many zero size document files were found on my website?
How many Microsoft Word document files were found on my website?
How many rich text document files were found on my website?
How many ASCII text document files were found on my website?
This information is located in the "reports/" folder.
Open the "report_summary_log" file.
Q: How can I get report summary information about major SEO HTML META tags on my website?
SOURCE: ALL SELECTED READABLE WEB PAGE DOCUMENTS
In the "reports/document_content/HTML_META_tags/titles/" folder...
Open the "report_summary_log" file.
In the "reports/document_content/HTML_META_tags/descriptions/" folder...
Open the "report_summary_log" file.
In the "reports/document_content/HTML_META_tags/keywords/" folder...
Open the "report_summary_log" file.
Q: How many adult / foul language words were found on my website?
SOURCE: BODY CONTENT OF ALL SELECTED READABLE DOCUMENTS EXCLUDING PDF FILES
This information is located in the "reports/document_content/document_body/" folder.
Open the "possible_foul_language_words_csv" file.
SOURCE: ALL SELECTED READABLE PDF FILES
This information is located in the "reports/document_content/PDF_content/" folder.
Open the "possible_foul_language_words_csv" file.
SOURCE: ALL SELECTED READABLE WEB PAGE DOCUMENTS
This information is located in the "reports/document_content/HTML_META_tags/titles/" folder.
Open the "possible_foul_language_words_csv" file.
SOURCE: ALL SELECTED READABLE WEB PAGE DOCUMENTS
This information is located in the "reports/document_content/HTML_META_tags/descriptions/" folder.
Open the "possible_foul_language_words_csv" file.
SOURCE: ALL SELECTED READABLE WEB PAGE DOCUMENTS
This information is located in the "reports/document_content/HTML_META_tags/keywords/" folder.
Open the "possible_foul_language_words_csv" file.
Tip: Identify offensive language on your website for removal.
Tip: Offensive language may cause your website to be blocked by public access points such as public libraries.
Q: How many foreign language words were found on my website?
See the "[FOREIGN LANGUAGE WORDS REPORT SECTION]" in the following files:
SOURCE: ALL SELECTED READABLE DOCUMENTS EXCLUDING PDF FILES
In the "reports/" folder...
Open the "report_summary_log" file.
SOURCE: ALL SELECTED READABLE WEB PAGE DOCUMENTS
In the "reports/document_content/HTML_META_tags/titles/" folder...
Open the "report_summary_log" file.
SOURCE: ALL SELECTED READABLE WEB PAGE DOCUMENTS
In the "reports/document_content/HTML_META_tags/descriptions/" folder...
Open the "report_summary_log" file.
SOURCE: ALL SELECTED READABLE WEB PAGE DOCUMENTS
In the "reports/document_content/HTML_META_tags/keywords/" folder...
Open the "report_summary_log" file.
SOURCE: ALL SELECTED READABLE PDF FILES
In the "reports/document_content/PDF_content/" folder...
Open the "report_summary_log" file.
Q: How can I identify which documents contain numerous foreign language words?
This information is located in the "reports/website_file_management/document_lists/" folder.
Open the "list_of_all_selected_document_files_containing_numerous_foreign_language_words_log" file.
Tip: Group foreign language content in its own set of files and folders for easier website maintenance.
Tip: This service can currently identify French, German and Spanish words.
Q: How can I identify which documents contain XML data but have a web page file name extension?
This information is located in the "reports/website_file_management/document_lists/" folder.
Open "list_of_all_selected_document_files_containing_xml_data_but_have_a_web_page_file_name_extensions_log".
Q: What if I have a question not found in this guide?
Visit the FAQs page at allankirsch.tech.officelive.com,
Send an e-mail to allankirsch@hotmail.com,
Skype: allan.kirsch

Q: Are we done yet?
Congratulations for working smarter instead of harder to improve your website content. You should now have a new level of confidence about the accuracy of your documents. You should have also gained new insights about your content to improve the experience of your website's readers and your Internet search engine ranking. Use this service again after making all of your corrections to ensure you didn't miss anything. Your corrections will also update the top keyword phrases and top words information with additional entries. Your MD5 file checksums will be updated allowing you to monitor your website for unauthorized changes to your documents. Be sure to perform another full website backup to preserve your changes and archive all of the information provided by this service for future reference including the custom dictionary entries you imported into your local document editor.
This service is constantly being improved. Look for improved context based spell checking, improved grammar checks, more SEO support and additional improvements in the future. This guide will also be updated in the future with new information to assist in your content quality assurance operations.
Thank you for using this service and please spread the word to others who may find this service useful.