Character array command to find # of specific word in a paragraph

In summary, the conversation discusses loading a 1-by-1452 char array containing the Gettysburg Address and using a command to find the number of times specific words, such as 'that', 'for', 'we', and 'We', occur in the array. The command 'strfind' is used to find the starting index of a specified string in a larger string, and the output of this command indicates that 'that' appears 13 times in the array. Further analysis of the output can also determine the starting character position of each occurrence.
  • #1
gfd43tg
Gold Member
950
50

Homework Statement


Download the mat file Gettysburg.mat. View in a new window and load it in the workspace using the command

clear;
load Gettysburg;

This loads a 1-by-1452 char array GA that is the Gettysburg Address in English. Display the first 100 or so characters for yourself to see this.

In this array, how many times does the char array 'that' occur?

Hint: You do not need to count the occurrences of 'that' by hand. What command can be used to find the starting index of a specified string (char array) in a larger string (char array)?

How many times does the char array 'for' occur?

How many times does the char array 'we' occur?

How many times does the char array 'We' occur?

Homework Equations


The Attempt at a Solution


I don't know what command to use to find the number of words in the character array, and the hint went over my head, so I am unsure what to do with it.

I did
Code:
 char([GA])

to get the statement to show up, but to show how many times I get ''that'' in the statement I keep trying variations of

Code:
 Char([GA, 'that')]

The command
Code:
 numel([GA, 'that'])
gives me the number of characters, 1456. I guess a word is not an element, but a character is. So I don't know where to go from here.

EDIT: I found the command

Code:
 strfind(GA, 'that'])

and it gives me

Code:
 ans =

  Columns 1 through 6

         145         234         346         394         467         472

  Columns 7 through 12

         532        1096        1155        1220        1248        1292

  Column 13

        1359

and I don't know how to interpret this.

EDIT2: I guess the columns is the number of times the word shows up. Is the number supposed to be the beginning character number where that word is?
 
Last edited:
Physics news on Phys.org
  • #2
It makes more sense to say that the word 'that' starts at character position 146 and then again at 234 and again 346...

Since there are 13 elements then you could conclude that 'that' appears 13 times.
 
  • Like
Likes 1 person

FAQ: Character array command to find # of specific word in a paragraph

How do I use a character array command to find the number of occurrences of a specific word in a paragraph?

The most common way to do this is to use a combination of string and array functions. First, you will need to convert the paragraph into an array of individual words. Then, you can use a loop and the strcmp function to compare each word to the specific word you are looking for. If there is a match, you can increase a counter variable by one. After the loop is complete, the counter will hold the number of occurrences of the specific word in the paragraph.

Can I use a character array command to find a specific phrase instead of just a single word?

Yes, the same approach can be used to find a specific phrase in a paragraph. Instead of using the strcmp function, you can use the strstr function to check if the phrase is present in each word of the array.

What happens if the specific word or phrase appears more than once in the paragraph?

The counter variable will increase by one for each occurrence of the word or phrase. For example, if the word "apple" appears twice in the paragraph, the counter will have a value of 2 at the end of the loop.

Is there a way to make the character array command case-insensitive?

Yes, you can use the strcasecmp function instead of strcmp or strstr. This function compares strings without considering case, so it will count both "Apple" and "apple" as occurrences of the same word.

What happens if the paragraph contains punctuation or special characters?

The character array command will still work, but it is important to consider how these characters may affect the comparison. For example, if the word "apple" is followed by a comma, it will not match with the word "apple" in the array. It may be necessary to remove or account for punctuation and special characters before using the character array command.

Similar threads

Back
Top