lnternational Journal ofScientific Computing, Vol 4, No. 1, January-June 2010, pp 5-9

.Analyzing Relationship between Title Length of Research

Papers and Number of Authors
Jatinderkumar R. Saini

T'he pttrsnit of research h,as increased in recent times' The results of endeavors to analyze a relationship between the number of
scientiJic research works are published in the form of research papers authors signing apaper and the length ofthe title ofresearch
in.jonrnals. There is no slandardized specific value for factors like paper.
title length, nuntber ofpages and ruunber ofauthors ofthe research
paper. The dttention of scientific communitt' rs positively more on II. RELATED WORKS
i nnoval i o n i n re s e a rc h l/'te n lhe s e -fa c t o r s. T h e t i I I e of re s e a rc h
pape r
is inportanl. herttg the,f,rst point cf i.nteraclion ben"een the vrilet The title of a research paper is important because it acts as
and the retder./'ite ctrznt pa!e!'tt.s?ti:s a ietoiled tnaltsis belveett the first point of contact between writer and potential reader
rhe title :etgiir o.: r:t.Jrch peDer .ri ihe n,tlnier o-l'ettihors signing [2]. Through the revierv ofrelated past literature, it has been
it. Tirc outhor te!:e',es thai thts is the iirst lbrnttti attempt to protide found that attempts, similar to the one presented here, have
such a cietai!ed rntestigation o-f the inter-plat o-[the ntofactors. The
also been made by researchers. The contemporary research
paper elaborates on Ihe analysis ofnore than 17000 research popers
works also have discussed the existence of some association
fron 39 indexed internationaljournals. Based on the analysis, the
paper also rnak.es an attempt to Jbrecast the futttre trends. and relationship between the factors listed above.
Keywords: Author, Reseqrch Paper, Title Length, Title, Trend-line Specifically, Alimohammadi et al., [] have elaborated on
A na lys is. correlation between references and citations ofthe research
works. They have also highlighted the deviations in count
I. INTRODUGTION of citations based on the type of research works like review
THE current times have seen an increase in the pursuit of paper, consolidation papei and discovery paper. In another
research. Compared to times around a century back, this work, Yitzhaki [7] has tried to find relation between length
increase in recent times, owes to provision of more formal of title of a journal article to the length of the article. He
research paths offered by research institutions and academia found that a moderate positive correlation existed in most
bodies. Most often the researchers publicize the research scientific journals, as far as relationship between title length
works in the form of publications in journals. Scientific and article length was concerned. Smith et al., 14) have
journals communicate and document the results of research presented an examination ofthe relationship between author-
carried out in universities and various other research editor connections and subsequent citations of auditing
institutions, serving as an archival record ofscience [5]. The research articles. Their work does not throw light on the
scientific and research community has not adopted any relationship between the length of title, length of article and
standard value for a number of factors related to such number of authors, which are all areas of focus of this paper.
publications of research works. The number of pages, title Consequently, this and other similar research works have
length, and the number of authors are a few instances of been considered statistically irrelevant in context ofcurrent
such factors. But we need to take this statement with a grain work and so not given due considerations.
ofsalt since the lack ofany standard value for these factors Yitzhaki [8] in his another paper, similar to the one
has inversely contributed to quality research works. This has presented here, has highlighted on the relation oftitle length
been possible because the research comtnunity applauds the of journal articles to number of authors. But his work is
innovation and originality in the research work and novel different in many respects from the work presented here.
research is independent ofabove listed factors. A scientific First of all the number of research journals referred to by
article has standardized structure and contents are more him is only l4 while the current work has made use of 39
important than the format [6]. But since the more number of internationaljournals. Secondly, for calculation of title Iength
authors signing a research work do influence a research paper of research paper, we have made use of character-count in
more than the one singed by a single author, this paper the title of the research paper unlike Yitzhaki who has based
calculations on word-count in the title of the research paper.
I R. Saini" is u,ith tlre Narnrada Ilducation and Scientiflc Researclt The analysis made by him is segregated based on the time
\ -r.r\'s Narrnada College of Computer Application, Bharuch, Gujarat. slots while this paper presents an aggregate analysis of all
::: -.. -i: \ssistant Prof'essor. He is PhD fiom Veer Narmad South Gujarat the research papers under consideration over a period of
'. r-: it\. S rrral. Gu.iarat. India. E -ma i l : sa iril,expe r t(tjyaho o. c otn. time. A striking similarity in current work and Yitzhaki's
work is that both papers entertain the analysis of research Table 1

papers which belong to varied disciplines. A major source Terms Used for Selecting Research paper Titles
of difference in the two works being compared is that the active, algorithm, analysis, artificial, association, audio, Bayesian,
previous work was published in the year 1993 while the behaviour, categorization, classical, classification, commerce, control,
current work also takes into consideration the changes in c.yptography, cyber, data, database, design, document, domain, element,
email, ensemble, equation, evaluation, feature, framework, genetic, gram,
the research world which also happened during the graphics, heuristic, image, index, information, intelligence, intemet, KNN,
subsequent 17 years, till 2010. The author believes that this knowledge, language, learning, machine, measurement, mining, model,
difference in period is long enough to highlight evolving multimedia, naive, network. neural, object, observation, ontology,
inclinations in stylometry of research paper writing. It also optimization, order, pattem, pixel. precision, process, programming, recail,
retrieval, rule, secure. semantic. software, spam, spatial, supervised,
provides a major source of analysis for digging any new
support, system, taxonomy, temporal. text, vector, video, vision, web,
trends during this time. The current work provides a much wireless. XML
more detailed analysis of the data compared to the analysis
provided by the previous works. Moving along these lines,
The initial number of research papers in the corpus
another most significant disparity between the two works
created for the present work was I't348. But since the size
lies in the formation of pool of research papers chosen for
of corpus was very large, it was natural for it to include many
the task of actual analysis. Yitzhaki has drawn research
entries more than once. Hence, the corpus was refined by
papers randomly from the research arena whereas the current
removal of repeating entries. The total number of research
work has formed the corpus by selecting only those research
papers in the corpus yielded by the refinement was li.0g3.
papers whose title contained specific words. In this regard,
Based on this data, a two dimensional vector with 17093
the current work presents a more specific examination of
records was created. The tirst dimension of the vector was
the research papers containing precise key terms.
populated with the length of the paper-title while the second
III. DESIGN METHODOLOGY dimension of the vector was populated with the number of
authors. This two dimensional vector was then ordered in
A listing of terms used for selecting research paper titles is
ascending manner by sorting it on the number of authors. It
presented in Thble l. The design methodology adopted by
was technically in-feasible to anaiyze the 17093-rowed two-
the author could be delineated as follows:
(a) Create a corpus of research papers dimensional vector of title-length and author-count.
Therefore, the data of the vector was plotted in a graphical
(b) Refine the corpus by removing duplicated entries
(c) format for its easier comprehension. This graphical
Re-model the corpus data for preparing it for representation of data is presented in Fig. 1.
analysis (e.g. vector representation)
(d) Trend-line is defined as a graphic representation of
Represent the data graphically
(e) trends in data series, such as a line sloping upward to
Analyze the data and its graphical presentations
(f) represent increased sales over a period of months. Trend_
Forecast the future trends
lines are used for the study of problems of prediction. This

ir 'r4l I


{-r 20[

S rcn
= lllt-l
J& 5tl


\"\ \ \ \ \ \ \ + + + {t,T cy+ + + t +,} r} .b h t{ 1t,t

hlr. af Authsrs

Figure 1: Title Length of Research Papers plotted Against No. of Authors

Analyzing Relationship between Title Length of Research Papers and Number of Authors

(J 25n




rr \ \ + + *V I *V + + t .}} h b+.
Ho. *f Authors

Figure 2: Forwarded rrend-Line for Title Length, Tiend-Line Equation & R-squared value

type of analysis is also called regression analysis. According Table 2

Range of Title Length and No. of Research Papers in Each Range
to Louizos [3], trend-line is a clear indication ofreversal or
continuation of the trend. A linear trend-line for the title Range of Title Length No. of Research Papers Percentage of (2)
length of the research papers and plotted against the number (r) (2) (3)
of authors is presented in Fig. l. Further, in order to forecast 1 -50 2988 t] .48
a future value and predict the future trend, the trend-line 51-100 I 1316 66.20
was plotted again with a forwarded value of 10000 units. 101-150 262'7 15.37
This is graphically presented in Fig. 2. The regression 151-200 139 00.8 i
equation for the trend-line ofthe trend-line label in the chart 201-250 18 00.1 1

of Fig. I is also shown in Fig. 2. An un-adjusted R-squared 25 1 -300 5 00.03

value for the trend-line of title length could also be seen in
r'7093 100.00
Fig. 2. The next section sheds more light on the findings
and interpretations based on analysis of these charts.
The exploded pie-chart in Fig. 3 presents data ofTable
IV. RESULTS AND FINDINGS II graphically. It provides a clear depiction of the ranges to
Instead of contemplating the bulky 17093-rowed two which the research papers with different title lengths belong
dimensional vector, the author has concentrated directly on to. For instance, Fig. 3 can be interpreted as the percentage
analysis of Fig. i as it not only provides the actual gist of of research papers with title length between 51 and 100
data being analyzed but also provides a convenient and easy characters is 66.207o (depicted as label A in Fig. 3).
to comprehend representation of large data under Similarly, label B in Fig. 3 depicts that out of the total
consideration. Fig. 1 contains the 'columnar histogram chart' research papers analyzed,17.48Vo research papers had title
but since the number of entries for this chart is almost 17000, length between I and 50 characters.
the entries have been scaled very near-by, in order to The percentage of research papers in a particular range
completely fit the data in the chart. This has provided the of number of characters of their title is presented in Table II
effect of 'sea-wave chart'. and Fig. 3. But this data does not consider the effect of
As is evident from the chafi of Fig. 1, the number of number of authors of the research paper and is aggregate of
research papers with title length of more than 200 is quite the entire corpus analyzed. Hence, the break-up of this data
rare while those with more than 250 are even rarer. The with respect to number of authors for each range is now
instances of research papers with title length between 100 presented in Table 3. In Table 3, the header row depicts the
and 150 are very high. Similarly, research instances with number of authors and the first column depicts the range of
:::1e length of up-to 100 are highest. This is evident from title length (in number of characters). As Table 3 is break-up
ii-.1 fact that the area between the marks of 0 and 100 on the of Table 2, it can be seen that the last column of Table 3
':'-.ris is almost completely dark. The summarized data of matches with the column (2) of Table 2. The last row of
- t93 records is tabulated in Thble 2. Table 3 depicts the number of research papers authored by
respective number of authors in that column. The maximum

value (i.e. the highest number of research papers) in each
row corresponding to each range is under-lined and italicized
to differentiate it from the remaining values. The data of
Table 3, for better comprehension, is also presented in terms
of percentage of the values in Table 4. For sake of simplicity'
the zero values have been replaced with blank cells.
Further, a linear trend-line for the title length of the
research papers was plotted against the number of authors'
The upward direction of the trend-line shows that there is a
positive correlation between the title length ofresearch paper
and the number of authors signing a research paper, i'e. as
there is an increase in the number of authors, there is also
an increase in the length of the title of research paper
authored by them. The R-squared value is found out to be
0.018. This value is also depicted in chart of Fig. 2.
R-squared value is an indicator that reveals how closeiy the A ss.Stls {
estimated values for the trend-line correspond to the actual
data. It is also known as the coefficient of determination.
B 1?.4SS {
t-s{t )
An important finding is derived from the comparative
f; 15.37S {141*15r}
analysis ofthe trend lines ofcharts in Fig. 1 and Fig' 2. The D [0.sl 96 {1 S1-400}
trend-line corresponding to data under consideration is E {t0.1 196 t2r11-25{r}
presented in chart ofFig. 1 whereas chart ofFig. 2 presents F 08.n3% t2s1-300]
a forecasting trend-line for title length which is forwarded
by 10000 units. Both these lines are plotted against the Figure 3: Percentage of Research Papers in Various
number of authors of research papers. Ranges of TtIe Lengths

Table 3
Author-Count-Wise Break-Up of No. of Research Papers in Different Ranges

Range t0 t1 T2 I4 19 Total

1-50 I 3021013 476 133 50 8 4 2 0 0 0 0 0 0 2988

5 1-100 32t8 4338 2355 921 320 88 47 t2 9 0 0 4 4 0 11316
101-150 6t2 1007 565 267 92 40 18 13 2 I 3 4 0 3 2627
151-200 22 63 30 t7 4 2 1 0 0 0 0 0 0 0 139
201-250 3 813 0 0 3 0 0 0 0 0 0 018
25 1 -300 0 000 3 0 2 0 0 0 U 0 0 05
Total sts7 6429 342'l t34l 469 138 27 ll 3 t'7093

The interesting thing here is that the forecasted trend of this, we can say that as the number of authors increases,
the regression line shows that it is more close to the mark of there is also an increase in the title length of the research
100 compared to the trend-line of chart in Fig. 1. Based on paper authored by them.

Thble 4
Percentage Representation of Values of Thble 3

10 t2 1l iq

1 -50 43.57 33.90 15.93 4.45 t.67 0.21 0.13 0.07

5 1-100 28.44 38.34 20.81 8.14 2.83 0.78 0.42 0.11 0.08 0.M 0.Gl
101 - 150 23.30 38.33 21.51 10.16 3.50 1.52 0.69 o.49 0.08 0.04 0.it 0. 15

15 1-200 rs.83 45.32 21.58 12.23 2.88 t.44 0.72

20t-250 t6.67 44.44 5.56 t6.61 16.67

25 1-300 60.00 40.00

Total 30.17 _77.61 20.05 7.85 2.'t4 0.81 0.44 0. i6 0.06 0.0t 0.05 0.02 0.02
Analyzing Relationship between Title Length of Research Papers and Number of Authors

Table 5 researchers. Further, ifthe title length ofthe research paper

No. ofAuthors and Forecasted Title Length is in the range of 51-100, 101-150, 151-200 or 207-250,
Sr \/o No. of authors (x) Tirle lengrh (y) then the maximum chance is that the research paper is
authored by 2 researchers (owing to maximum underlined
I 69.059',7
and italicized values in column for 2 researchers). Similarly'
t0 69. 1 06
2 if the title length is in the range 1-50 and 251-300, then it
l i00 69.169
can be predicted to be signed by 1 author and 5 authors,
respectively. The next highest probability for the number of
4 I 0c0 69.799
5 I 0000 authors for the title length ranges of l-50, 5l-100, 101-i50
and 151-200 respectively is 2, l, 1 and 3 authors. Based on
the fact that most of the values are concentrating densely
Moreover, the least squares fit for a line represented by
near values of 1, 2 or 3 authors, it is concluded that most of
the equation -y : 0.0007x + 69.099, was also plotted' This
the research papers are authored by I to 3 authors only.
equation is of the form;r : mx + b, wh€re m is the slope and b
The linear regression equation of the best fit line has
is the intercept. From the linear trend-line's equation, also
also been analyzed. An attempt for comparison and analysis
depicted in Fig. 2, some values of title lengths corresponding
ofthe actual trend-line for the graph and forwarded trend-
to values of number of authors are found and tabulated in
line too has been made. Based on the interpretations of Fig.
Tabte 5. There are two major interpretations of data in Table 5:
(a) there is an increase in the title length as the number 2 and the data calcuiated for Table V, it is concluded that
there is a definite but slow increase in the title length with
ol authors increases: this is evident from the
positive upuard slope of the line an increase in the number of authors signing the research
paper. The slope value of 0.0007 for the trend-line makes
(b) the increase in title length with an increase in the
number of authors is very-very less: this is evident
the author conclude that with an increase in the number of
authors, an increase in the title length is negligible and the
from the slope value of m :0.0007, which is very-
very less title length value settles around the mark of 70 characters.
The current paper does not intend to state that research
According to data available in Table 5, it can be seen
papers with particular title length are better than others, nor
that the approximate title length remains to be around 69
does it compare the quality of research papers signed by
even ifthe number ofauthors ranges from 1 to l000 Ifthe
different number of authors. The current work is an analysis
number ofauthors increases beyond 1000, still there is no
of the relationship between the title length and the number
voluminous increase in the title length. The reason for this
could be the less number of records corresponding to large
of authors. lt is best reported on the data being analyzed
and the current results are best forecasted for the trends based
number of authors, in the two dimensional vector. lt seems
on this data.
technically in-feasible for a research paper to have number
ofauthors as many as 100 or so, but this data has been REFERENGES
considered here keeping in view the statistical importance
of the analysis of this data. [l] Alimohammadi D.. and Sajjadi M., "Correlation between
References and Citations", Webology, 6(2), Article No. 71.
Available: http //wwtv.w eb o lo gy. ir/2 00 9 /v 6n2 /a7 l . htm I

This paper prescnts an analysis of the relationship between

[2 f Haggan M., " J ourna l of P r agm a tic s", 36(2) (May 2003) 293 -3 17'
DOI : I 0. I 0 I 6/50 37 8-2 I 66 (03)00090-0
the title length of the research paper and the number of Louizos L. A. "Harness the Power of Trendline Analysis",
authors signing the research paper' For the purpose of Available: htlp //www.easylradeJbrex com

calculation of research paper's title length, the author has [4] Smith K. J.. and Dombror.r'ski R. F., "An Examination of the
made use of the number of characters instead of number of Relationship between Author-Editor Connections and Subsequent
rvords in title of the paper. The intended analysis of the data Citations ofAuditing Research Articles", ln Proceedings of Jottnal
under consideration has been done by plotting it graphically' ofAccotnting Education, l6(3-4) (September 1998) 497-506' DOI:
'Iable ll and labels A and t o. I 0 I 6/so7 48-5 7 5 I (98)000 I 9-0
Frorn the first two records of
B of Fig. 3. it is concluded that, if number of authors is not [5] Wikipedia, The Free Encyclopedia, "Science", Wikimedia
Foundation Inc.. Available: http //e n.w ikipe d ia. org/w iki/Sc ie nce
considered, altnost 84%o research papers have title lengths

t6l Wikipedia, The Free Encyclopedia, "Scientific Literature",

in the range of.l to 100 characters. It can also be said that
Wikimedia Foundation Inc., Available: http //en.w ikipe d ia. o rg/ :

irrespective ol number of authors, about one-fourth of the w iki /S c ie ntifi c,l i te ralure
r"esearch papers have title lengths between l-50 and 101- Yitzhaki M.. "Relation of the Title Length of a Journal Article
i 50 characters. Ifthe number ofauthors is also considered, to the Length ol the Article", In Proceedings of Journal
::,:n value ol37 .61%, below third column ofTable IV derives Scientotiletrics, 5a(3) (July, 2002) 435-44'7, ISSN: 0138-9130
::.e conclusion that for a given research paper the probability [8] Yitzhaki M., "Relation of Title Length of Journal Articles to
: it being signed by 2 researcliers is more than the Number of Authors", In Proceedings of Journal Scientometrics,
::-'cability of it being signed by any other number of 30(1) (January 1991) 321-332, ISSN: 0138-9130

